Seqanswers Leaderboard Ad

**Cole Trapnell** · 06-02-2010, 07:14 PM

Originally posted by anecsulea View Post

Hi,

I've recently discovered a strange behaviour in TopHat: it can sometimes give incomplete (or even incorrect) results, due to an error while running Bowtie on the junction sequence database.

I'm using TopHat 1.0.13, with Bowtie 0.12.5 (or 0.12.3), on a Linux x86_64 computation cluster with a Lustre filesystem. The data I'm using are single-end, 76bp long reads. I'm running TopHat with the following parameters:

-p 1 -a 8 -i 40 -m 1 -I 1000000 -F 0 --coverage-search --microexon-search

For one of the runs where I get incomplete results, I had noticed this weird thing in the output:

[Thu May 27 17:25:49 2010] Mapping reads against segment_juncs with Bowtie
[Thu May 27 17:25:50 2010] Mapping reads against segment_juncs with Bowtie
[Thu May 27 17:25:51 2010] Mapping reads against segment_juncs with Bowtie

The weird thing is that mapping the reads against segment_juncs should take a lot more time, since I have about 20 million reads. So I thought that there might be an error in building the bowtie index for the splice junctions, but the bowtie_build.log shows no error. However, I find the following type of errors in some other log files from the run:

############################################

filebd4xji.log

Error reading ebwt array: returned 41750080, length was 168445184
Your index files may be corrupt; please try re-building or re-downloading.
A complete index consists of 6 files: XYZ.1.ebwt, XYZ.2.ebwt, XYZ.3.ebwt,
XYZ.4.ebwt, XYZ.rev.1.ebwt, and XYZ.rev.2.ebwt. The XYZ.1.ebwt and
XYZ.rev.1.ebwt files should have the same size, as should the XYZ.2.ebwt and
XYZ.rev.2.ebwt files.

############################################

So it seems that even though the Bowtie index for the junction sequences was built correctly, the alignment of reads on the junction index fails. I've run several series of tests, and I found that this Bowtie error does not occur all the times (it seems to be more or less random), but it does seem to be quite frequent for large datasets. It is not clear yet why this happens - it might be OS-specific or filesystem-specific - so I am currently testing several solutions to fix this problem (see also parallel thread "Bowtie can't read index files").

However, the bigger issue here is that TopHat does not catch the error thrown by Bowtie, and finishes with apparent success, while giving only an incomplete set of exon-exon junctions. This is quite dangerous, since most users will not search for "Error" messages in the log files if TopHat has finished successfully. So I would advise TopHat users to check the log files for Bowtie errors before proceeding with their analyses.

Any comments or suggestions on how to solve this problem would be much appreciated.

Best wishes,

Anamaria

This is an interesting bug - thanks for reporting it. There is code to check that the call to bowtie-build succeeded and that the index is good (or at least passes bowtie-build's internal checks), but for some reason that code is not catching the exception. I'll look into it further.

Can you re-run this with --keep-tmp enabled, and then try to run the bowtie-build step listed in run.log manually? If that step is failing (some or all of the time), you might want to check the size of the juncs_db.fa file that TopHat generates and feeds to bowtie-build. I'm curious as to how big it is and/or whether it's corrupt in some way.

**anecsulea** · 06-06-2010, 09:07 AM

Originally posted by Cole Trapnell View Post

This is an interesting bug - thanks for reporting it. There is code to check that the call to bowtie-build succeeded and that the index is good (or at least passes bowtie-build's internal checks), but for some reason that code is not catching the exception. I'll look into it further.

Can you re-run this with --keep-tmp enabled, and then try to run the bowtie-build step listed in run.log manually? If that step is failing (some or all of the time), you might want to check the size of the juncs_db.fa file that TopHat generates and feeds to bowtie-build. I'm curious as to how big it is and/or whether it's corrupt in some way.

As far as I can see, there is no reason why the code that checks that bowtie-build succeeded should catch this exception, since the error does not come from bowtie-build, but from the bowtie aligner. Indeed, as I explained above, the index is built correctly and is definitely not corrupt, yet bowtie fails to read it into memory when trying to align the reads. This issue is discussed into more detail in a parallel thread in this forum ("Bowtie fails to read index files"), and I have managed to find a solution that works on the computation cluster that I'm using. However, I still believe that the fact that TopHat does not catch this error is a serious problem, and needs to be corrected in future versions of the software.

Best wishes,

Anamaria

**Cole Trapnell** · 06-06-2010, 10:54 AM

Originally posted by anecsulea View Post

As far as I can see, there is no reason why the code that checks that bowtie-build succeeded should catch this exception, since the error does not come from bowtie-build, but from the bowtie aligner. Indeed, as I explained above, the index is built correctly and is definitely not corrupt, yet bowtie fails to read it into memory when trying to align the reads. This issue is discussed into more detail in a parallel thread in this forum ("Bowtie fails to read index files"), and I have managed to find a solution that works on the computation cluster that I'm using. However, I still believe that the fact that TopHat does not catch this error is a serious problem, and needs to be corrected in future versions of the software.

Best wishes,

Anamaria

OK - I see where things are going awry. It sounds like from the parallel thread that your filesystem/OS is interacting with Bowtie in a way that's producing the failure. A recent version of TopHat streamlined the way Bowtie is called, and it looks like I failed to put back some of the exception handling code. It's there now and will be present in the next release.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

TopHat fails to catch error thrown by Bowtie, gives incomplete results

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News