Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pkMyt1
    Junior Member
    • Dec 2013
    • 4

    Sorted BAM read count >2x total FASTQ count

    This might be a silly question but it is bugging me and I need the answer. I have a set of paired end FASTQ files that contain about 30 million reads total. After aligning them with BWA and sorting the output with samtools, the resulting BAM file now has about 72 million reads. Why????
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    Show us your alignment and sorting commands.

    Comment

    • fanli
      Senior Member
      • Jul 2014
      • 197

      #3
      Multiple alignments...

      Comment

      • pkMyt1
        Junior Member
        • Dec 2013
        • 4

        #4
        Both come from iterating the files in Python. The FASTQ files are read in in blocks of four lines each which is one read. This example is a MiSeq run so 30 million (15 million in each FASTQ) seems realistic. The BAM count is from

        bamfile = pysam.AlignmentFile(o['bamfile'], "rb")
        bamfile.count()
        or
        bamfile_reads = functools.reduce(lambda x, y: x + y, [eval('+'.join(l.rstrip('\n').split('\t')[2:])) for l in pysam.idxstats(o['bamfile'])])



        or simply counting the reads as I iterate the BAM file to do my analysis.
        Last edited by pkMyt1; 05-12-2015, 08:36 AM.

        Comment

        • pkMyt1
          Junior Member
          • Dec 2013
          • 4

          #5
          Originally posted by fanli View Post
          Multiple alignments...
          So....
          Would this imply my alignment settings are keeping things I should not?

          bwa mem -a -T 25 -L '(100, 100)'

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            I would imagine that the -a flag is to blame.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              Originally posted by pkMyt1 View Post
              So....
              Would this imply my alignment settings are keeping things I should not?
              That depends on the goal of your experiment. What are you trying to do?

              Comment

              • pkMyt1
                Junior Member
                • Dec 2013
                • 4

                #8
                Originally posted by Brian Bushnell View Post
                That depends on the goal of your experiment. What are you trying to do?
                This is duplex exome sequencing. Very deep but only about 80 kb of capture. I did not want to lose any alignments where one read aligned and the other did not either due to a translocation or simply a sequencing error. This is why I did the -a option. Each read is uniquely tagged so I had been able to filter things in the end. This is the first time I have seen this but it is also the first time I have run a sample that I know contains many chromosomal rearrangements in the way of translocations, duplications, deletions. I will need to try and pull out some of these multiple alignments and have a look at them so I can understand what they are better.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  In that case, it sounds like considering all good alignments of the reads is probably best. The reason for all the multiple alignments is presumably that you're targeting a repetitive region.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...