Unconfigured Ad

**Richard Finney** · 05-12-2015, 08:14 AM

Show us your alignment and sorting commands.

**fanli** · 05-12-2015, 08:28 AM

Multiple alignments...

**pkMyt1** · 05-12-2015, 08:33 AM

Both come from iterating the files in Python. The FASTQ files are read in in blocks of four lines each which is one read. This example is a MiSeq run so 30 million (15 million in each FASTQ) seems realistic. The BAM count is from

bamfile = pysam.AlignmentFile(o['bamfile'], "rb")
bamfile.count()
or
bamfile_reads = functools.reduce(lambda x, y: x + y, [eval('+'.join(l.rstrip('\n').split('\t')[2:])) for l in pysam.idxstats(o['bamfile'])])

Just a moment...

https://www.biostars.org/p/1890/

or simply counting the reads as I iterate the BAM file to do my analysis.

**pkMyt1** · 05-12-2015, 08:39 AM

Originally posted by fanli View Post

Multiple alignments...

So....
Would this imply my alignment settings are keeping things I should not?

bwa mem -a -T 25 -L '(100, 100)'

**dpryan** · 05-12-2015, 09:47 AM

I would imagine that the -a flag is to blame.

**Brian Bushnell** · 05-12-2015, 11:35 AM

Originally posted by pkMyt1 View Post

So....
Would this imply my alignment settings are keeping things I should not?

That depends on the goal of your experiment. What are you trying to do?

**pkMyt1** · 05-13-2015, 04:43 AM

Originally posted by Brian Bushnell View Post

That depends on the goal of your experiment. What are you trying to do?

This is duplex exome sequencing. Very deep but only about 80 kb of capture. I did not want to lose any alignments where one read aligned and the other did not either due to a translocation or simply a sequencing error. This is why I did the -a option. Each read is uniquely tagged so I had been able to filter things in the end. This is the first time I have seen this but it is also the first time I have run a sample that I know contains many chromosomal rearrangements in the way of translocations, duplications, deletions. I will need to try and pull out some of these multiple alignments and have a look at them so I can understand what they are better.

**Brian Bushnell** · 05-13-2015, 09:43 AM

In that case, it sounds like considering all good alignments of the reads is probably best. The reason for all the multiple alignments is presumably that you're targeting a repetitive region.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Sorted BAM read count >2x total FASTQ count

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News