I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.
I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:
bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1
Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:
************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
Validating tmpDir path ./.
**** Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
mainIndexes [Auto-recognizing]
secondaryIndexes [Not Using]
readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
offsets: [Using All]
loadAllIndexes: [Not Using]
compression: [Not Using]
space: [NT Space]
startReadNum: 1
endReadNum: 2147483647
keySize: [Not Using]
maxKeyMatches: 8
maxNumMatches: 384
whichStrand: [Both Strands]
numThreads: 1
queueLength: 10000
tmpDir: ./
timing: [Not Using]
************************************************************
Searching for main indexes...
Found 10 index (40 total files).
Not using secondary indexes.
************************************************************
Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
In total read 14657 contigs for a total of 3103051358 bases
************************************************************
Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
Segmentation fault (core dumped)
15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w
I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).
Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?
I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:
bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1
Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:
************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
Validating tmpDir path ./.
**** Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
mainIndexes [Auto-recognizing]
secondaryIndexes [Not Using]
readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
offsets: [Using All]
loadAllIndexes: [Not Using]
compression: [Not Using]
space: [NT Space]
startReadNum: 1
endReadNum: 2147483647
keySize: [Not Using]
maxKeyMatches: 8
maxNumMatches: 384
whichStrand: [Both Strands]
numThreads: 1
queueLength: 10000
tmpDir: ./
timing: [Not Using]
************************************************************
Searching for main indexes...
Found 10 index (40 total files).
Not using secondary indexes.
************************************************************
Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
In total read 14657 contigs for a total of 3103051358 bases
************************************************************
Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
Segmentation fault (core dumped)
15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w
I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).
Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?
Comment