Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST to Sourceforge.net

    We have decided to move BFAST to the sourceforge.net website. We invite people to submit questions, discussions, or other input either to the BFAST sourceforge mailing lists, to the BFAST Bug Tracker, or to this seqanswers.com.

    BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

    * Speed: enables billions of short reads to be mapped quickly.
    * Accuracy: A priori probabilities for mapping reads with defined set of variants.
    * An easy way to measurably tune accuracy at the expense of speed.

    Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance.

    BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

    Nils Homer

  • #2
    How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

    Comment


    • #3
      Originally posted by maasha View Post
      How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?
      I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

      My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

      I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


      Nils

      Comment


      • #4
        Originally posted by nilshomer View Post
        I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

        My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

        I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


        Nils
        Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

        Comment


        • #5
          Originally posted by ech View Post
          Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?
          I also put up a BFAST Server version, where you can have a local web-server running BFAST and an interactive web page (inspired by the UCSC BLAT). It handles both Illumina and ABI SOLiD data natively. I put up a BFAST Server for you to see (click here), since our normal BFAST Server website is down (click here).

          For >10bp indels, it can be tuned to have any power depending on the error and polymorphism rate, with the power increasing obviously for longer reads (more room for the indel, especially insertions). Compared to BWA, which states it should be used on data with <%2 error, it performs similarly (>95% power) with <4bp indels, but excels in scenarios where there is a non-trivial error-rate (>2%) and/or when there is an indel and a SNP. In our own human reseq experiments, we found a 10bp deletion and a SNP 4bp downstream, which was validated with sanger sequencing etc. The biggest increase in robustness/sensitivity is with ABI SOLiD data due to the complete gapped local alignment (see Paper)

          I think there is still room for micro-reassembly. For example, although the reads may be mapped to the correct location, their local alignment may be wrong given an insertion or a deletion breakpoint near the either end of the read. I will let you ponder over why this is the case.

          Comment


          • #6
            I have a question about how BFAST/BFAST-BWA handles SNPs vs. read errors for AB-SOLiD (CS) reads.

            On viewing the resulting aligned mappings (in base space), do single base differences to the reference represent SNPs? That is, are they a result of detecting an appropriate 2-color mismatch, with single (or more) color mismatches identified as read errors and appropriately "corrected"?

            Comment


            • #7
              Yes, see the accompanying papers for information.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Yes, see the accompanying papers for information.
                Thanks. I assume you mean the paper linked to a few posts back? I'm looking at this now.

                I read through the original paper (SHRiMP: Accurate Mapping of Short Color-space Reads), which has been the only one I've looked at so far that is specifically concerned with the notion that the read color space is degenerate (i.e. reads could in theory map to 4 alternative sequences in the reference). However, although the theory/method is presented I was confused how the actual reads in base space are finally output. For example, are corrected read errors marked in some way. Or if base inserts are ever discarded.
                Last edited by Guidobot; 03-31-2011, 09:56 AM. Reason: Never mind

                Comment


                • #9
                  Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.
                    Cheers. I understand how BFAST could use the adapter (base) to define a specific (nt) read sequence, although BWA and MAQ appear to ignore this in translation to base space (and reduce the effective read length by 2 in the process). As a programmer I get curious about some of the implementation details but will continue reading.

                    Edit: I originally read your paper (BFAST: An Alignment Tool for Large Scale Genome Resequencing) and misinterpreted the statement "...each genomic read offset is artificially started with an A base to mimic the process of decoding...", thinking this meant the adapter base (e.g. in the csfasta file) was ignored.

                    Btw, in an experiment I did with the Streptococcus suis genome and SOLiD SE reads I found that BFAST mapped 2.34% more reads than BWA, which includes a correction for the reads BWA mapped to repeated regions. (I used the recommended 10 seeds but because my PC had only 2Gb RAM I used a index word size of 12.)
                    Last edited by Guidobot; 04-01-2011, 07:11 AM. Reason: Added edit note

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    47 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X