Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • twu
    Developer of GMAP and GSNAP
    • Oct 2011
    • 17

    #46
    Heng, thanks for your comments about GSNAP. I will think more about how to get more informative mapping quality results, and would welcome any further suggestions you might have. Actually, one of the reasons I haven't done much with the mapping quality calculations, is that my colleagues here have used BWA+GATK for SNP calling, and they told me that GSNAP had similar behavior to BWA on its mapping quality calculations. But perhaps they were wrong.

    I also noted your timing results where the GSNAP paired-end algorithm is more than 2 times slower than the single-end algorithm. One of the reasons is that for paired-end data, GSNAP looks deeper at suboptimal results on each of the two ends in order to get a concordant result. In some cases, GSNAP may need to do its own version of a Smith-Waterman alignment in the neighborhood of a good alignment for the other end. Instead of using Smith-Waterman, though, GSNAP uses its GMAP algorithm, which is good for finding splicing, because our main application so far has been RNA-Seq, rather than DNA-Seq.

    GSNAP is also like BWA in that it does not use base quality scores for alignment. We also do not use base quality scores for trimming, but just pass the information on to the SNP caller.
    Last edited by twu; 12-06-2011, 12:40 PM.

    Comment

    • zee
      NGS specialist
      • Apr 2008
      • 249

      #47
      Hi Heng,

      Would you mind sharing the parameters you used for Bowtie2-beta4 on 100bp Illumina reads?

      Thanks.

      Originally posted by lh3 View Post
      Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

      In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.

      Comment

      • cjp
        Member
        • Jun 2011
        • 58

        #48
        I think they are here (and other aligners' parameters too):



        Chris

        Comment

        • wlangdon
          Member
          • Nov 2012
          • 15

          #49
          I just did a speed accuracy v test on Cancer Institue paired end sequences. GP tweak to
          Bowtie2 came out fastest (4 times speed of BWA) took less than half the memory and
          had almost the same accuracy (82.1% v 83.1%)
          See http://arxiv.org/abs/1301.5187
          Bill

          Comment

          • wlangdon
            Member
            • Nov 2012
            • 15

            #50
            ps: Bowtie2 has a --very-sensitive command line option which can increase its
            accuracy at the expense of increased run time. (In one case by 0.8% to 83.4% but
            run time increased by 53%).
            Bill

            Comment

            • ka90
              Junior Member
              • Mar 2013
              • 1

              #51
              Sorry for bringing this thread back to life. I was wondering how they compared now that bowtie2 has a stable release and is not in beta anymore. Has anyone been able to compare the latest versions of bowtie2 with other mappers? If so, can you provide your observations.

              Comment

              • shi
                Wei Shi
                • Feb 2010
                • 236

                #52
                Hi ka90,

                We have recently made comparisons for a few aligners. Please see



                Cheers
                Wei

                Comment

                • oiiio
                  Senior Member
                  • Jan 2011
                  • 105

                  #53
                  Since it appears this excellent thread has been resurrected recently already-

                  I'd like to show you all a comparison of bowtie2+GATK and other pipelines for variant calling on the NA12878 illumina exome with 150x coverage. These variant calling reports are generated by the GCAT resource on bioplanet.com:



                  You can use the check box menu at left to choose other pipelines to compare to.

                  Comment

                  • zee
                    NGS specialist
                    • Apr 2008
                    • 249

                    #54
                    GCAT is great because it allows you to run and submit your own datasets for public scrutiny. We are going to make good use of it.

                    Comment

                    • lh3
                      Senior Member
                      • Feb 2008
                      • 686

                      #55
                      Very impressive website. There is though a question: when evaluating alignments, how do you tell if an alignment is correct? If there is a clipping in an alignment, how do you deal with that?

                      I ask this because bwa-sw is surprisingly bad. While bwa-sw is less accurate than bwa-mem, it should be similar to bowtie2. A possible cause is that among the mappers evaluated on your website, bwa-sw reports the most soft clipping. If you do not correct clipping, bwa-sw will be easily the worst.
                      Last edited by lh3; 04-14-2013, 06:22 AM.

                      Comment

                      • oiiio
                        Senior Member
                        • Jan 2011
                        • 105

                        #56
                        Thanks Heng -

                        It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/report...l-indel/bwa_sw. I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                        Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available

                        Comment

                        • adaptivegenome
                          Super Moderator
                          • Nov 2009
                          • 436

                          #57
                          Originally posted by oiiio View Post
                          Thanks Heng -

                          It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/report...l-indel/bwa_sw. I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                          Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available
                          I would be interested in hearing what suggestions Heng might have for better metrics. GCAT is awesome but we need to think of more ways to evaluate pipelines...

                          Comment

                          • adaptivegenome
                            Super Moderator
                            • Nov 2009
                            • 436

                            #58
                            Here I started a new thread since we have sort of highjacked this one...

                            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                            Comment

                            • lh3
                              Senior Member
                              • Feb 2008
                              • 686

                              #59
                              Well, fewer people will see if I comment there. Perhaps the developers might consider to open a thread here.

                              I am looking at the ROC-like curves. For all data sets, BWA-SW quickly picks up high mapQ wrong alignments. But as you have considered clipping, maybe that is really the fault of bwa-sw. I don't know for sure. Anyway, for typical illumina/454/iontorrent reads, bwa-sw is now deprecated by bwa-mem.

                              For exome variant calling, it would be better to give statistics in the target regions only.

                              Comment

                              • oiiio
                                Senior Member
                                • Jan 2011
                                • 105

                                #60
                                I started to reply in adaptivegenome's purposed thread here: http://seqanswers.com/forums/showthr...688#post101688

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                22 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                32 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...