Unconfigured Ad

**maasha** · 08-21-2009, 12:13 AM

How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

**nilshomer** · 08-21-2009, 08:06 AM

Originally posted by maasha View Post

How does Bfast compare to other mapping tools like Bowtie, BWA, Maq, Zoom, etc?

I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.

Nils

**ech** · 08-21-2009, 08:59 AM

Originally posted by nilshomer View Post

I have compared BFAST to all of the above (including Zoom, which is a commercial product) as well as others (BLAT, SHRiMP, SOAP), and it is much more sensitive/robust to errors and variants, especially indels (>10bp), while having comparable or better accuracy (paper in review). If you don't search for variants you will never find them. The high sensitivity has benefits with ABI SOLiD data, where the color error rate can be greater than 10%, so to properly identify the errors as well as find variants, sensitivity is of the utmost importance. Although BFAST can be flexibly tuned, trading off speed for sensitivity, it is slower than say Bowtie (no ABI support) or BWA when the sensitivity is at the recommended settings, but does find variants (based on empirical and simulated data). In the speed regard, if we ask what aligner is the fastest when searching for SNPS and indels all in the presence of errors, then in my (biased) opinion, it is BFAST.

My point really is, if you want to find only perfect matches to the genome, then you can design a fast algorithm for that. If you want to find only SNPs where the data has <2% error, it is clear what shortcuts can be taken. If you want to align any type of data searching for SNPs and indels and make the aligner tunable, then you arrive at BFAST.

I would be happy to share my results to you in private (as the paper is in review) so PM me if desired.

Nils

Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

**nilshomer** · 08-21-2009, 11:24 AM

Originally posted by ech View Post

Is BFAST good enough for calling >10bp indels or local assembly is still preferred? Also, how does it compare with bwa for <4bp indels?

I also put up a BFAST Server version, where you can have a local web-server running BFAST and an interactive web page (inspired by the UCSC BLAT). It handles both Illumina and ABI SOLiD data natively. I put up a BFAST Server for you to see (click here), since our normal BFAST Server website is down (click here).

For >10bp indels, it can be tuned to have any power depending on the error and polymorphism rate, with the power increasing obviously for longer reads (more room for the indel, especially insertions). Compared to BWA, which states it should be used on data with <%2 error, it performs similarly (>95% power) with <4bp indels, but excels in scenarios where there is a non-trivial error-rate (>2%) and/or when there is an indel and a SNP. In our own human reseq experiments, we found a 10bp deletion and a SNP 4bp downstream, which was validated with sanger sequencing etc. The biggest increase in robustness/sensitivity is with ABI SOLiD data due to the complete gapped local alignment (see Paper)

I think there is still room for micro-reassembly. For example, although the reads may be mapped to the correct location, their local alignment may be wrong given an insertion or a deletion breakpoint near the either end of the read. I will let you ponder over why this is the case.

**Guidobot** · 03-30-2011, 04:16 PM

I have a question about how BFAST/BFAST-BWA handles SNPs vs. read errors for AB-SOLiD (CS) reads.

On viewing the resulting aligned mappings (in base space), do single base differences to the reference represent SNPs? That is, are they a result of detecting an appropriate 2-color mismatch, with single (or more) color mismatches identified as read errors and appropriately "corrected"?

**nilshomer** · 03-30-2011, 04:35 PM

Yes, see the accompanying papers for information.

**Guidobot** · 03-30-2011, 05:51 PM

Originally posted by nilshomer View Post

Yes, see the accompanying papers for information.

Thanks. I assume you mean the paper linked to a few posts back? I'm looking at this now.

I read through the original paper (SHRiMP: Accurate Mapping of Short Color-space Reads), which has been the only one I've looked at so far that is specifically concerned with the notion that the read color space is degenerate (i.e. reads could in theory map to 4 alternative sequences in the reference). However, although the theory/method is presented I was confused how the actual reads in base space are finally output. For example, are corrected read errors marked in some way. Or if base inserts are ever discarded.

**nilshomer** · 03-30-2011, 07:17 PM

Read some more

. I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

**Guidobot** · 03-31-2011, 09:28 AM

Originally posted by nilshomer View Post

Read some more

. I have published two papers with descriptions and you can also take a look at the BWA (short) paper. Note that the adapter will reduce the four to one.

Cheers. I understand how BFAST could use the adapter (base) to define a specific (nt) read sequence, although BWA and MAQ appear to ignore this in translation to base space (and reduce the effective read length by 2 in the process). As a programmer I get curious about some of the implementation details but will continue reading.

Edit: I originally read your paper (BFAST: An Alignment Tool for Large Scale Genome Resequencing) and misinterpreted the statement "...each genomic read offset is artificially started with an A base to mimic the process of decoding...", thinking this meant the adapter base (e.g. in the csfasta file) was ignored.

Btw, in an experiment I did with the Streptococcus suis genome and SOLiD SE reads I found that BFAST mapped 2.34% more reads than BWA, which includes a correction for the reads BWA mapped to repeated regions.

(I used the recommended 10 seeds but because my PC had only 2Gb RAM I used a index word size of 12.)

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 108 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

BFAST to Sourceforge.net

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News