This might be a silly question but it is bugging me and I need the answer. I have a set of paired end FASTQ files that contain about 30 million reads total. After aligning them with BWA and sorting the output with samtools, the resulting BAM file now has about 72 million reads. Why????
Unconfigured Ad
Collapse
X
-
Both come from iterating the files in Python. The FASTQ files are read in in blocks of four lines each which is one read. This example is a MiSeq run so 30 million (15 million in each FASTQ) seems realistic. The BAM count is from
bamfile = pysam.AlignmentFile(o['bamfile'], "rb")
bamfile.count()
or
bamfile_reads = functools.reduce(lambda x, y: x + y, [eval('+'.join(l.rstrip('\n').split('\t')[2:])) for l in pysam.idxstats(o['bamfile'])])
or simply counting the reads as I iterate the BAM file to do my analysis.Last edited by pkMyt1; 05-12-2015, 08:36 AM.
Comment
-
-
This is duplex exome sequencing. Very deep but only about 80 kb of capture. I did not want to lose any alignments where one read aligned and the other did not either due to a translocation or simply a sequencing error. This is why I did the -a option. Each read is uniquely tagged so I had been able to filter things in the end. This is the first time I have seen this but it is also the first time I have run a sample that I know contains many chromosomal rearrangements in the way of translocations, duplications, deletions. I will need to try and pull out some of these multiple alignments and have a look at them so I can understand what they are better.Originally posted by Brian Bushnell View PostThat depends on the goal of your experiment. What are you trying to do?
Comment
-
Latest Articles
Collapse
-
by GATTACATLove this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
-
Channel: Articles
07-01-2026, 11:43 AM -
-
by SEQadmin2
I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.
Here are nine questions we think about, in roughly the order they matter, before...-
Channel: Articles
-
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 07-02-2026, 11:08 AM
|
0 responses
7 views
0 reactions
|
Last Post
by SEQadmin2
07-02-2026, 11:08 AM
|
||
|
Started by SEQadmin2, 06-30-2026, 05:37 AM
|
0 responses
12 views
0 reactions
|
Last Post
by SEQadmin2
06-30-2026, 05:37 AM
|
||
|
Started by SEQadmin2, 06-26-2026, 11:10 AM
|
0 responses
20 views
0 reactions
|
Last Post
by SEQadmin2
06-26-2026, 11:10 AM
|
||
|
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population
by SEQadmin2
Started by SEQadmin2, 06-17-2026, 06:09 AM
|
0 responses
54 views
0 reactions
|
Last Post
by SEQadmin2
06-17-2026, 06:09 AM
|
Comment