Unconfigured Ad

**ECO** · 11-11-2009, 12:04 AM

Congrats Nils.

**dawe** · 11-11-2009, 08:05 AM

Congratulations!
I've downloaded it and if it runs as stated in the paper I'm ready to use it especially for resequencing and SNP projects. I'm a bit worried about speed (and memory usage) but, as said, if the error rate is so low...
SAM output is a great feature, I'm running almost aligner-independent now :-)

**lh3** · 11-11-2009, 08:55 AM

I did not read into details, but something is a little bit odd in Figure 3. How can bowtie and bwa produce wrong alignments even if the read can be perfectly mapped? For perfectly aligned reads, both of them should always give correct results. Also note that 6% error is really a lot. If an aligner was that bad, it would not survive these days. Nonetheless, I agree that the performance of bwa degrades with higher error/mutation rate.

In SNP calling given short reads, alignment global to reads has some advantage. Suppose the 3rd base in a read is a true mutation. Local alignment will clip the read and as a result you see more reads having the reference allele than having the non-reference allele, which leads to strong reference bias. To alleviate this bias, we need to postprocess local alignments.

**nilshomer** · 11-11-2009, 12:26 PM

Originally posted by lh3 View Post

I did not read into details, but something is a little bit odd in Figure 3. How can bowtie and bwa produce wrong alignments even if the read can be perfectly mapped? For perfectly aligned reads, both of them should always give correct results. Also note that 6% error is really a lot. If an aligner was that bad, it would not survive these days. Nonetheless, I agree that the performance of bwa degrades with higher error/mutation rate.

What panel are you looking at (sensitivity or false mapping)? Obviously the sensitivity cannot be 100% due to the repetitiveness of the human genome. For false-mapping, with no errors I don't see a >0% false mapping rate in Figure 3.

In SNP calling given short reads, alignment global to reads has some advantage. Suppose the 3rd base in a read is a true mutation. Local alignment will clip the read and as a result you see more reads having the reference allele than having the non-reference allele, which leads to strong reference bias. To alleviate this bias, we need to postprocess local alignments.

We don't clip the reads. The "local alignment" we use is a global alignment to a window the candidate location. In other words, we enforce that the entire read must align to some subset of bases in the reference.

**nilshomer** · 11-11-2009, 12:30 PM

Originally posted by dawe View Post

Congratulations!
I've downloaded it and if it runs as stated in the paper I'm ready to use it especially for resequencing and SNP projects. I'm a bit worried about speed (and memory usage) but, as said, if the error rate is so low...
SAM output is a great feature, I'm running almost aligner-independent now :-)

@dawe
The current version allows the user to get rid of the memory problem at the cost of some speed. You use the "-d" option to split the index during index creation. This allows you to run BFAST in 4GB environments etc.

@dawe and @lh3
The assumption that the error rate is below 6% is not valid for ABI SOLiD data. With BFAST's ability to be very sensitive, we regularly observe runs with >10% (color/raw) error rates towards the ends of reads, even though after alignment the base error rate is well below 1% (kudos to 2-base encoding).

**lh3** · 11-11-2009, 12:33 PM

Originally posted by nilshomer View Post

What panel are you looking at (sensitivity or false mapping)? Obviously the sensitivity cannot be 100% due to the repetitiveness of the human genome. For false-mapping, with no errors I don't see a >0% false mapping rate in Figure 3.

Unless sensitivity is too bad, I do not care too much about false negatives. I am looking at false positives (2nd column). At 0-mismatch/indel, both bwa and bowtie have ~6% error rate while others have 0%.

We don't clip the reads. The "local alignment" we use is a global alignment to a window the candidate location. In other words, we enforce that the entire read must align to some subset of bases in the reference.

This is reassuring. Thanks for the explanation.

**NGSfan** · 11-20-2009, 02:09 AM

Congratulations Nils Homer!

I really enjoyed your paper - I also must thank you for the in-depth supplement that really explained the history behind BFAST and other aligners. It shows you put a lot of thought into this.

My only question in regards to BFAST performance, is if you have recently compared it to SSAHA2 ? I know you did not intend to be comprehensive in this paper when comparing algorithms (it is impossible to be comprehensive now with 30+ aligners available), but I feel that SSAHA2 might be a real benchmark to compare against for speed and accuracy. Would you say that BFAST would run faster than SSAHA2 given similar accuracy settings?

Thanks for releasing your software. I will be trying it now.

**Michael.James.Clark** · 11-20-2009, 03:43 AM

Congratulations, Nils!

Originally posted by lh3 View Post

I did not read into details, but something is a little bit odd in Figure 3. How can bowtie and bwa produce wrong alignments even if the read can be perfectly mapped? For perfectly aligned reads, both of them should always give correct results. Also note that 6% error is really a lot. If an aligner was that bad, it would not survive these days. Nonetheless, I agree that the performance of bwa degrades with higher error/mutation rate.

It makes sense to me that they would have higher false positive rates at zero errors because they identify significantly more true positives as well (as 3A is basically gauging true positives and 3B is gauging false positives). My guess is that they are aligning questionable reads that the other aligners are not aligning at all in order to have a higher yield of true positives. I would pose the question of whether one can get BWA and Bowtie to generate fewer false positives, and how that would thereby affect their true positive yield at zero errors. (I have not used them personally, so it's an open question.) Then again, I think it's rather a moot point. The figure is basically demonstrating to me that all of these aligners do quite well at zero-two nucleotide errors.

What I find more striking in that particular figure is how they perform with high error reads, but even more strikingly, reads containing deletions (even deletions of a relatively large size). This is important to me because a lot of studies, particularly whole genome, are searching for novel variants and deletions with relatively low coverage.

Anyway, we have had great success using BFAST on whole genome data from the SOLiD platform, so I am really happy to see it finally published and encourage others to try it.

**lh3** · 11-20-2009, 04:59 AM

Originally posted by Michael.James.Clark View Post

It makes sense to me that they would have higher false positive rates at zero errors because they identify significantly more true positives as well (as 3A is basically gauging true positives and 3B is gauging false positives). My guess is that they are aligning questionable reads that the other aligners are not aligning at all in order to have a higher yield of true positives. I would pose the question of whether one can get BWA and Bowtie to generate fewer false positives, and how that would thereby affect their true positive yield at zero errors. (I have not used them personally, so it's an open question.) Then again, I think it's rather a moot point. The figure is basically demonstrating to me that all of these aligners do quite well at zero-two nucleotide errors.

Both bowtie and bwa are able to find all the occurrences of perfect matches and they always know if a perfect alignment is exact repeat or not. They should not produce false positives in theory. In practice, I evaluated them and they do not, at least not as high as 6%. In addition, I am curious why in 3A aligners do not do equally well for perfect hits. Again, maq and soap guarantee to find all perfect hits.

What I find more striking in that particular figure is how they perform with high error reads, but even more strikingly, reads containing deletions (even deletions of a relatively large size). This is important to me because a lot of studies, particularly whole genome, are searching for novel variants and deletions with relatively low coverage.

I completely agree that Bfast is a great tool for hard reads. The performance of other aligners look reasonable, which implies bfast is very capable. I just think bowtie and bwa should be posed in the right way to avoid confusion.

**nilshomer** · 11-20-2009, 11:40 AM

Originally posted by NGSfan View Post

Congratulations Nils Homer!

I really enjoyed your paper - I also must thank you for the in-depth supplement that really explained the history behind BFAST and other aligners. It shows you put a lot of thought into this.

My only question in regards to BFAST performance, is if you have recently compared it to SSAHA2 ? I know you did not intend to be comprehensive in this paper when comparing algorithms (it is impossible to be comprehensive now with 30+ aligners available), but I feel that SSAHA2 might be a real benchmark to compare against for speed and accuracy. Would you say that BFAST would run faster than SSAHA2 given similar accuracy settings?

Thanks for releasing your software. I will be trying it now.

I have not tested SSAHA2 lately, although I think SSAHA2 is built for longer reads if I am correct. Also note that since the release of the paper, BFAST's speed has improved by an order of magnitude (submitted as another paper).

**nilshomer** · 11-20-2009, 11:42 AM

Originally posted by Michael.James.Clark View Post

Congratulations, Nils!

It makes sense to me that they would have higher false positive rates at zero errors because they identify significantly more true positives as well (as 3A is basically gauging true positives and 3B is gauging false positives). My guess is that they are aligning questionable reads that the other aligners are not aligning at all in order to have a higher yield of true positives. I would pose the question of whether one can get BWA and Bowtie to generate fewer false positives, and how that would thereby affect their true positive yield at zero errors. (I have not used them personally, so it's an open question.) Then again, I think it's rather a moot point. The figure is basically demonstrating to me that all of these aligners do quite well at zero-two nucleotide errors.

What I find more striking in that particular figure is how they perform with high error reads, but even more strikingly, reads containing deletions (even deletions of a relatively large size). This is important to me because a lot of studies, particularly whole genome, are searching for novel variants and deletions with relatively low coverage.

Anyway, we have had great success using BFAST on whole genome data from the SOLiD platform, so I am really happy to see it finally published and encourage others to try it.

At zero errors and no variants, Heng Li is saying that there should be zero false positives, although there could be false-negatives.

**HTS** · 11-20-2009, 02:28 PM

Performing true local alignment with BFAST?

Hi Nils,

It is nice to read your paper, and I hope that my question is not completely out of the context (admittedly I haven't got time to dig into the details of your implementation yet). Basically I am thinking about applications other than resequencing, where I just want to find all local alignments passing a certain criterion (sure, you can put an upper-limit on the total number of matches for practical reasons, but I can use any of the tools out there to filter out heavy repeats first). An example is that my reference contains 10 cDNA sequences (different splice isoforms) belonging to the same gene with their own header, and for a given read, I want to find out how many transcripts this read is mapped to. I can do this straightforwardly with BLAT (by parsing the PSL file since I have gene/transcript information in the header) but the reference itself becomes ill-defined for all the global alignment tools out there. I am wondering if your tool can be easily adapted to handle this situation (while still making use of the base quality information as opposed to BLAT). If yes, probably I can finally stop using the old favorite BLAT as well

. Please feel free to give me any suggestions/comments (also include all people who read this). Thanks a lot!

-- Leo

**nilshomer** · 11-20-2009, 02:55 PM

Originally posted by HTS View Post

Hi Nils,

It is nice to read your paper, and I hope that my question is not completely out of the context (admittedly I haven't got time to dig into the details of your implementation yet). Basically I am thinking about applications other than resequencing, where I just want to find all local alignments passing a certain criterion (sure, you can put an upper-limit on the total number of matches for practical reasons, but I can use any of the tools out there to filter out heavy repeats first). An example is that my reference contains 10 cDNA sequences (different splice isoforms) belonging to the same gene with their own header, and for a given read, I want to find out how many transcripts this read is mapped to. I can do this straightforwardly with BLAT (by parsing the PSL file since I have gene/transcript information in the header) but the reference itself becomes ill-defined for all the global alignment tools out there. I am wondering if your tool can be easily adapted to handle this situation (while still making use of the base quality information as opposed to BLAT). If yes, probably I can finally stop using the old favorite BLAT as well

. Please feel free to give me any suggestions/comments (also include all people who read this). Thanks a lot!

-- Leo

Absolutely! A similar example is suppose you have prior evidence of RNA splicing between two novel exons in which you only know the approximate boundaries (say within 100bp). You can create all possible joins of the two 100bp windows and then align your reads to the artificial reference to see which join is the most likely.

**HTS** · 11-20-2009, 03:12 PM

Originally posted by nilshomer View Post

Absolutely! A similar example is suppose you have prior evidence of RNA splicing between two novel exons in which you only know the approximate boundaries (say within 100bp). You can create all possible joins of the two 100bp windows and then align your reads to the artificial reference to see which join is the most likely.

Thanks a lot for your quick reply, Nils! Could you give me a bit more details about how to do this? I remember in your paper you said when the best alignment of a read map to more than one locations on the reference equally well (in my example would be more than one cDNA sequences, which usually correspond to the same locations on the underlying genome) then you would say the read is not mapped, but I would like BFAST to output all the mappings regardless. I haven't downloaded your tool yet but will do so as soon as I finish the work at hand!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

BFAST published

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News