Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jmartin
    Member
    • Dec 2009
    • 78

    bfast match segmentation fault

    I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.

    I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:

    bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1

    Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
    Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
    Validating tmpDir path ./.
    **** Input arguments look good!
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
    mainIndexes [Auto-recognizing]
    secondaryIndexes [Not Using]
    readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
    offsets: [Using All]
    loadAllIndexes: [Not Using]
    compression: [Not Using]
    space: [NT Space]
    startReadNum: 1
    endReadNum: 2147483647
    keySize: [Not Using]
    maxKeyMatches: 8
    maxNumMatches: 384
    whichStrand: [Both Strands]
    numThreads: 1
    queueLength: 10000
    tmpDir: ./
    timing: [Not Using]
    ************************************************************
    Searching for main indexes...
    Found 10 index (40 total files).
    Not using secondary indexes.
    ************************************************************
    Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
    In total read 14657 contigs for a total of 3103051358 bases
    ************************************************************
    Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
    Segmentation fault (core dumped)
    15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w



    I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).

    Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by jmartin View Post
    I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.

    I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:

    bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1

    Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
    Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
    Validating tmpDir path ./.
    **** Input arguments look good!
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
    mainIndexes [Auto-recognizing]
    secondaryIndexes [Not Using]
    readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
    offsets: [Using All]
    loadAllIndexes: [Not Using]
    compression: [Not Using]
    space: [NT Space]
    startReadNum: 1
    endReadNum: 2147483647
    keySize: [Not Using]
    maxKeyMatches: 8
    maxNumMatches: 384
    whichStrand: [Both Strands]
    numThreads: 1
    queueLength: 10000
    tmpDir: ./
    timing: [Not Using]
    ************************************************************
    Searching for main indexes...
    Found 10 index (40 total files).
    Not using secondary indexes.
    ************************************************************
    Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
    In total read 14657 contigs for a total of 3103051358 bases
    ************************************************************
    Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
    Segmentation fault (core dumped)
    15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w



    I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).

    Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?
    Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

    To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

    Nils

    Comment

    • blu78
      Member
      • Apr 2010
      • 20

      #3
      Originally posted by nilshomer View Post
      Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

      To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

      Nils
      Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

      Thanks again to Nils for his help with this

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        Originally posted by blu78 View Post
        Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

        Thanks again to Nils for his help with this
        If you figure it out let me know. I am always grateful for feedback as it will better inform me on how to help other users (a skill in constant training).

        Comment

        • blu78
          Member
          • Apr 2010
          • 20

          #5
          In the machine that gives the problem I get some warnings at compilation time which I think might be related to zlib...

          If I get some more info I will let you know.

          Comment

          • jmartin
            Member
            • Dec 2009
            • 78

            #6
            I just realized that I was mistakenly feeding it a fasta file instead of a fastq file. So this is just a case of user error, my apologies for this post & thanks for the (very) fast reply.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              I had a related problem recently when I accidentally tried to ailign a bwa-fastq file (CS), then the match step worked but not the localalign. BFAST will also crash if there is a corrupt read in the fastq file, a sanity check on the read would make the life much easier for both the user and developer...

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              12 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              46 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              106 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              125 views
              0 reactions
              Last Post SEQadmin2  
              Working...