Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Heng, thanks for your comments about GSNAP. I will think more about how to get more informative mapping quality results, and would welcome any further suggestions you might have. Actually, one of the reasons I haven't done much with the mapping quality calculations, is that my colleagues here have used BWA+GATK for SNP calling, and they told me that GSNAP had similar behavior to BWA on its mapping quality calculations. But perhaps they were wrong.

    I also noted your timing results where the GSNAP paired-end algorithm is more than 2 times slower than the single-end algorithm. One of the reasons is that for paired-end data, GSNAP looks deeper at suboptimal results on each of the two ends in order to get a concordant result. In some cases, GSNAP may need to do its own version of a Smith-Waterman alignment in the neighborhood of a good alignment for the other end. Instead of using Smith-Waterman, though, GSNAP uses its GMAP algorithm, which is good for finding splicing, because our main application so far has been RNA-Seq, rather than DNA-Seq.

    GSNAP is also like BWA in that it does not use base quality scores for alignment. We also do not use base quality scores for trimming, but just pass the information on to the SNP caller.
    Last edited by twu; 12-06-2011, 12:40 PM.

    Comment


    • #47
      Hi Heng,

      Would you mind sharing the parameters you used for Bowtie2-beta4 on 100bp Illumina reads?

      Thanks.

      Originally posted by lh3 View Post
      Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

      In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.

      Comment


      • #48
        I think they are here (and other aligners' parameters too):



        Chris

        Comment


        • #49
          I just did a speed accuracy v test on Cancer Institue paired end sequences. GP tweak to
          Bowtie2 came out fastest (4 times speed of BWA) took less than half the memory and
          had almost the same accuracy (82.1% v 83.1%)
          See http://arxiv.org/abs/1301.5187
          Bill

          Comment


          • #50
            ps: Bowtie2 has a --very-sensitive command line option which can increase its
            accuracy at the expense of increased run time. (In one case by 0.8% to 83.4% but
            run time increased by 53%).
            Bill

            Comment


            • #51
              Sorry for bringing this thread back to life. I was wondering how they compared now that bowtie2 has a stable release and is not in beta anymore. Has anyone been able to compare the latest versions of bowtie2 with other mappers? If so, can you provide your observations.

              Comment


              • #52
                Hi ka90,

                We have recently made comparisons for a few aligners. Please see



                Cheers
                Wei

                Comment


                • #53
                  Since it appears this excellent thread has been resurrected recently already-

                  I'd like to show you all a comparison of bowtie2+GATK and other pipelines for variant calling on the NA12878 illumina exome with 150x coverage. These variant calling reports are generated by the GCAT resource on bioplanet.com:

                  http://www.bioplanet.com/gcat/report...x/bowtie-atlas

                  You can use the check box menu at left to choose other pipelines to compare to.

                  Comment


                  • #54
                    GCAT is great because it allows you to run and submit your own datasets for public scrutiny. We are going to make good use of it.

                    Comment


                    • #55
                      Very impressive website. There is though a question: when evaluating alignments, how do you tell if an alignment is correct? If there is a clipping in an alignment, how do you deal with that?

                      I ask this because bwa-sw is surprisingly bad. While bwa-sw is less accurate than bwa-mem, it should be similar to bowtie2. A possible cause is that among the mappers evaluated on your website, bwa-sw reports the most soft clipping. If you do not correct clipping, bwa-sw will be easily the worst.
                      Last edited by lh3; 04-14-2013, 06:22 AM.

                      Comment


                      • #56
                        Thanks Heng -

                        It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                        Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available

                        Comment


                        • #57
                          Originally posted by oiiio View Post
                          Thanks Heng -

                          It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                          Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available
                          I would be interested in hearing what suggestions Heng might have for better metrics. GCAT is awesome but we need to think of more ways to evaluate pipelines...

                          Comment


                          • #58
                            Here I started a new thread since we have sort of highjacked this one...

                            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                            Comment


                            • #59
                              Well, fewer people will see if I comment there. Perhaps the developers might consider to open a thread here.

                              I am looking at the ROC-like curves. For all data sets, BWA-SW quickly picks up high mapQ wrong alignments. But as you have considered clipping, maybe that is really the fault of bwa-sw. I don't know for sure. Anyway, for typical illumina/454/iontorrent reads, bwa-sw is now deprecated by bwa-mem.

                              For exome variant calling, it would be better to give statistics in the target regions only.

                              Comment


                              • #60
                                I started to reply in adaptivegenome's purposed thread here: http://seqanswers.com/forums/showthr...688#post101688

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Developments in Metagenomics
                                  by seqadmin





                                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                  09-23-2024, 06:35 AM
                                • seqadmin
                                  Understanding Genetic Influence on Infectious Disease
                                  by seqadmin




                                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                  09-09-2024, 10:59 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 10-02-2024, 04:51 AM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-01-2024, 07:10 AM
                                0 responses
                                14 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-30-2024, 08:33 AM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-26-2024, 12:57 PM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X