Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Heng, thanks for your comments about GSNAP. I will think more about how to get more informative mapping quality results, and would welcome any further suggestions you might have. Actually, one of the reasons I haven't done much with the mapping quality calculations, is that my colleagues here have used BWA+GATK for SNP calling, and they told me that GSNAP had similar behavior to BWA on its mapping quality calculations. But perhaps they were wrong.

    I also noted your timing results where the GSNAP paired-end algorithm is more than 2 times slower than the single-end algorithm. One of the reasons is that for paired-end data, GSNAP looks deeper at suboptimal results on each of the two ends in order to get a concordant result. In some cases, GSNAP may need to do its own version of a Smith-Waterman alignment in the neighborhood of a good alignment for the other end. Instead of using Smith-Waterman, though, GSNAP uses its GMAP algorithm, which is good for finding splicing, because our main application so far has been RNA-Seq, rather than DNA-Seq.

    GSNAP is also like BWA in that it does not use base quality scores for alignment. We also do not use base quality scores for trimming, but just pass the information on to the SNP caller.
    Last edited by twu; 12-06-2011, 12:40 PM.

    Comment


    • #47
      Hi Heng,

      Would you mind sharing the parameters you used for Bowtie2-beta4 on 100bp Illumina reads?

      Thanks.

      Originally posted by lh3 View Post
      Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

      In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.

      Comment


      • #48
        I think they are here (and other aligners' parameters too):



        Chris

        Comment


        • #49
          I just did a speed accuracy v test on Cancer Institue paired end sequences. GP tweak to
          Bowtie2 came out fastest (4 times speed of BWA) took less than half the memory and
          had almost the same accuracy (82.1% v 83.1%)
          See http://arxiv.org/abs/1301.5187
          Bill

          Comment


          • #50
            ps: Bowtie2 has a --very-sensitive command line option which can increase its
            accuracy at the expense of increased run time. (In one case by 0.8% to 83.4% but
            run time increased by 53%).
            Bill

            Comment


            • #51
              Sorry for bringing this thread back to life. I was wondering how they compared now that bowtie2 has a stable release and is not in beta anymore. Has anyone been able to compare the latest versions of bowtie2 with other mappers? If so, can you provide your observations.

              Comment


              • #52
                Hi ka90,

                We have recently made comparisons for a few aligners. Please see



                Cheers
                Wei

                Comment


                • #53
                  Since it appears this excellent thread has been resurrected recently already-

                  I'd like to show you all a comparison of bowtie2+GATK and other pipelines for variant calling on the NA12878 illumina exome with 150x coverage. These variant calling reports are generated by the GCAT resource on bioplanet.com:

                  http://www.bioplanet.com/gcat/report...x/bowtie-atlas

                  You can use the check box menu at left to choose other pipelines to compare to.

                  Comment


                  • #54
                    GCAT is great because it allows you to run and submit your own datasets for public scrutiny. We are going to make good use of it.

                    Comment


                    • #55
                      Very impressive website. There is though a question: when evaluating alignments, how do you tell if an alignment is correct? If there is a clipping in an alignment, how do you deal with that?

                      I ask this because bwa-sw is surprisingly bad. While bwa-sw is less accurate than bwa-mem, it should be similar to bowtie2. A possible cause is that among the mappers evaluated on your website, bwa-sw reports the most soft clipping. If you do not correct clipping, bwa-sw will be easily the worst.
                      Last edited by lh3; 04-14-2013, 06:22 AM.

                      Comment


                      • #56
                        Thanks Heng -

                        It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                        Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available

                        Comment


                        • #57
                          Originally posted by oiiio View Post
                          Thanks Heng -

                          It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

                          Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available
                          I would be interested in hearing what suggestions Heng might have for better metrics. GCAT is awesome but we need to think of more ways to evaluate pipelines...

                          Comment


                          • #58
                            Here I started a new thread since we have sort of highjacked this one...

                            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                            Comment


                            • #59
                              Well, fewer people will see if I comment there. Perhaps the developers might consider to open a thread here.

                              I am looking at the ROC-like curves. For all data sets, BWA-SW quickly picks up high mapQ wrong alignments. But as you have considered clipping, maybe that is really the fault of bwa-sw. I don't know for sure. Anyway, for typical illumina/454/iontorrent reads, bwa-sw is now deprecated by bwa-mem.

                              For exome variant calling, it would be better to give statistics in the target regions only.

                              Comment


                              • #60
                                I started to reply in adaptivegenome's purposed thread here: http://seqanswers.com/forums/showthr...688#post101688

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X