Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools pileup of millions of reads from a single amplicon

    Hi all,


    We would like to pileup millions of reads from a single amplicon for ultra-sensitive mutation detection.

    Considering that SAMtools pileup is limited to several thousand reads at a given position I am wondering if you could suggest us any alternative approach or workaround.


    Any feedback is highly appreciated!

  • #2
    Is that limit documented somewhere or based on personal experience?

    Heng Li has referred to pileup being able to use 200GB BAM's before (albeit not for one amplicon) http://seqanswers.com/forums/showthread.php?t=6680

    Comment


    • #3
      I use
      samtools mpileup -BQ60 -d500000 -D -f

      for our low-variant detection. The "-d" is "-d INT At a position, read maximally INT reads per input BAM. [250]" which limits the depth of the pileup. I turn off the BAQ calculation as I find it depresses scores of any variant, and while we only allow quality scores of 60 that is because our method greatly improves the quality scores so if you are looking at normal reads you might skip that or set -Q to 30.
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment


      • #4
        Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.
          Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.
          Last edited by Bukowski; 02-19-2014, 04:19 PM.

          Comment


          • #6
            Originally posted by Bukowski View Post
            Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.
            Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
            But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.

            Comment


            • #7
              Originally posted by svos View Post
              Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
              But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.
              It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.
                Again, you're right, but thats another problem... Hopefully we will have control settings allowing us to perform such an analysis.

                The simple question is, is this kind of variant detection possible in respect to its technical / bioinformatic setting using e.g. (m)pileup or an alternative? Or will we face the problems already here (without thinking about the biological and sequencing background)?

                Comment


                • #9
                  Perhaps one solution is to compute it in sections (say 1000 reads at a time), computing a vector of ACGT- at each point along with confidences, and then combining those vectors together in a second round of mpileup.

                  It's not possible with the current code, but in principle the "reduced-reads" style notation (done formally) could yield a way to compute extreme depth pileups in a memory-tractable manner.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    Yesterday, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 06:57 AM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 07:17 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-02-2024, 08:06 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X