Header Leaderboard Ad

Collapse

BFAST to Sourceforge.net

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST to Sourceforge.net

    We have decided to move BFAST to the sourceforge.net website. We invite people to submit questions, discussions, or other input either to the BFAST sourceforge mailing lists, to the BFAST Bug Tracker, or to this seqanswers.com.

    BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

    * Speed: enables billions of short reads to be mapped quickly.
    * Accuracy: A priori probabilities for mapping reads with defined set of variants.
    * An easy way to measurably tune accuracy at the expense of speed.

    Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance.

    BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

    Nils Homer

  • #2
    How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

    Comment


    • #3
      Originally posted by maasha View Post
      How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?
      I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

      My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

      I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


      Nils

      Comment


      • #4
        Originally posted by nilshomer View Post
        I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

        My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

        I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.


        Nils
        Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

        Comment


        • #5
          Originally posted by ech View Post
          Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?
          I also put up a BFAST Server version, where you can have a local web-server running BFAST and an interactive web page (inspired by the UCSC BLAT). It handles both Illumina and ABI SOLiD data natively. I put up a BFAST Server for you to see (click here), since our normal BFAST Server website is down (click here).

          For >10bp indels, it can be tuned to have any power depending on the error and polymorphism rate, with the power increasing obviously for longer reads (more room for the indel, especially insertions). Compared to BWA, which states it should be used on data with <%2 error, it performs similarly (>95% power) with <4bp indels, but excels in scenarios where there is a non-trivial error-rate (>2%) and/or when there is an indel and a SNP. In our own human reseq experiments, we found a 10bp deletion and a SNP 4bp downstream, which was validated with sanger sequencing etc. The biggest increase in robustness/sensitivity is with ABI SOLiD data due to the complete gapped local alignment (see Paper)

          I think there is still room for micro-reassembly. For example, although the reads may be mapped to the correct location, their local alignment may be wrong given an insertion or a deletion breakpoint near the either end of the read. I will let you ponder over why this is the case.

          Comment


          • #6
            I have a question about how BFAST/BFAST-BWA handles SNPs vs. read errors for AB-SOLiD (CS) reads.

            On viewing the resulting aligned mappings (in base space), do single base differences to the reference represent SNPs? That is, are they a result of detecting an appropriate 2-color mismatch, with single (or more) color mismatches identified as read errors and appropriately "corrected"?

            Comment


            • #7
              Yes, see the accompanying papers for information.

              Comment


              • #8
                Originally posted by nilshomer View Post
                Yes, see the accompanying papers for information.
                Thanks. I assume you mean the paper linked to a few posts back? I'm looking at this now.

                I read through the original paper (SHRiMP: Accurate Mapping of Short Color-space Reads), which has been the only one I've looked at so far that is specifically concerned with the notion that the read color space is degenerate (i.e. reads could in theory map to 4 alternative sequences in the reference). However, although the theory/method is presented I was confused how the actual reads in base space are finally output. For example, are corrected read errors marked in some way. Or if base inserts are ever discarded.
                Last edited by Guidobot; 03-31-2011, 09:56 AM. Reason: Never mind

                Comment


                • #9
                  Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Read some more . I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.
                    Cheers. I understand how BFAST could use the adapter (base) to define a specific (nt) read sequence, although BWA and MAQ appear to ignore this in translation to base space (and reduce the effective read length by 2 in the process). As a programmer I get curious about some of the implementation details but will continue reading.

                    Edit: I originally read your paper (BFAST: An Alignment Tool for Large Scale Genome Resequencing) and misinterpreted the statement "...each genomic read offset is artificially started with an A base to mimic the process of decoding...", thinking this meant the adapter base (e.g. in the csfasta file) was ignored.

                    Btw, in an experiment I did with the Streptococcus suis genome and SOLiD SE reads I found that BFAST mapped 2.34% more reads than BWA, which includes a correction for the reads BWA mapped to repeated regions. (I used the recommended 10 seeds but because my PC had only 2Gb RAM I used a index word size of 12.)
                    Last edited by Guidobot; 04-01-2011, 07:11 AM. Reason: Added edit note

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                      by seqadmin


                      ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                      01-24-2023, 01:19 PM
                    • seqadmin
                      Introduction to Single-Cell Sequencing
                      by seqadmin
                      Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                      The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                      ...
                      01-09-2023, 03:10 PM
                    • seqadmin
                      AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                      by seqadmin
                      Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                      Read type and length
                      AVITI is a short-read benchtop sequencer that also offers an innovative...
                      12-29-2022, 10:44 AM

                    ad_right_rmr

                    Collapse
                    Working...
                    X