Unconfigured Ad

**nilshomer** · 04-13-2010, 01:03 PM

Originally posted by jmartin View Post

I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.

I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:

bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1

Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:

************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
Validating tmpDir path ./.
**** Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
mainIndexes [Auto-recognizing]
secondaryIndexes [Not Using]
readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
offsets: [Using All]
loadAllIndexes: [Not Using]
compression: [Not Using]
space: [NT Space]
startReadNum: 1
endReadNum: 2147483647
keySize: [Not Using]
maxKeyMatches: 8
maxNumMatches: 384
whichStrand: [Both Strands]
numThreads: 1
queueLength: 10000
tmpDir: ./
timing: [Not Using]
************************************************************
Searching for main indexes...
Found 10 index (40 total files).
Not using secondary indexes.
************************************************************
Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
In total read 14657 contigs for a total of 3103051358 bases
************************************************************
Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
Segmentation fault (core dumped)
15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w

I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).

Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?

Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

Nils

**blu78** · 04-13-2010, 01:08 PM

Originally posted by nilshomer View Post

Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

Nils

Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

Thanks again to Nils for his help with this

**nilshomer** · 04-13-2010, 01:23 PM

Originally posted by blu78 View Post

Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

Thanks again to Nils for his help with this

If you figure it out let me know. I am always grateful for feedback as it will better inform me on how to help other users (a skill in constant training).

**blu78** · 04-13-2010, 01:28 PM

In the machine that gives the problem I get some warnings at compilation time which I think might be related to zlib...

If I get some more info I will let you know.

**jmartin** · 04-13-2010, 02:06 PM

I just realized that I was mistakenly feeding it a fasta file instead of a fastq file. So this is just a case of user error, my apologies for this post & thanks for the (very) fast reply.

**Chipper** · 04-14-2010, 12:21 AM

I had a related problem recently when I accidentally tried to ailign a bwa-fastq file (CS), then the match step worked but not the localalign. BFAST will also crash if there is a corrupt read in the fastq file, a sanity check on the read would make the life much easier for both the user and developer...

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 106 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

bfast match segmentation fault

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News