Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast match segmentation fault

    I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.

    I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:

    bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1

    Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
    Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
    Validating tmpDir path ./.
    **** Input arguments look good!
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
    mainIndexes [Auto-recognizing]
    secondaryIndexes [Not Using]
    readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
    offsets: [Using All]
    loadAllIndexes: [Not Using]
    compression: [Not Using]
    space: [NT Space]
    startReadNum: 1
    endReadNum: 2147483647
    keySize: [Not Using]
    maxKeyMatches: 8
    maxNumMatches: 384
    whichStrand: [Both Strands]
    numThreads: 1
    queueLength: 10000
    tmpDir: ./
    timing: [Not Using]
    ************************************************************
    Searching for main indexes...
    Found 10 index (40 total files).
    Not using secondary indexes.
    ************************************************************
    Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
    In total read 14657 contigs for a total of 3103051358 bases
    ************************************************************
    Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
    Segmentation fault (core dumped)
    15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w



    I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).

    Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?

  • #2
    Originally posted by jmartin View Post
    I'm investigating bfast for the purpose of identifying human contaminant reads out of metagenomic bacterial samples. But I'm having some problems getting bfast working.

    I'm trying to run the bfast match component of the alignment, and I keep getting segmentation faults. The fasta2brg & index steps were done successfully, but then I try:

    bfast match -f <my indexed reference fasta> -r <pool of 100mer reads> -A 0 -n 1

    Originally I tried this with a about a full lane's worth of data (44 million 100mer reads), and that segfaulted each time I tried. Then I built an artificial set of just 5000 reads and that also segfaulted. Here is the error message I got with the small subset:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName HUMAN_SCREENING_DB.current_plus_novelBGI.fna.
    Validating readsFileName flowcell61BKElane1end1.5000_random.100mer.fna.
    Validating tmpDir path ./.
    **** Input arguments look good!
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: HUMAN_SCREENING_DB.current_plus_novelBGI.fna
    mainIndexes [Auto-recognizing]
    secondaryIndexes [Not Using]
    readsFileName: flowcell61BKElane1end1.5000_random.100mer.fna
    offsets: [Using All]
    loadAllIndexes: [Not Using]
    compression: [Not Using]
    space: [NT Space]
    startReadNum: 1
    endReadNum: 2147483647
    keySize: [Not Using]
    maxKeyMatches: 8
    maxNumMatches: 384
    whichStrand: [Both Strands]
    numThreads: 1
    queueLength: 10000
    tmpDir: ./
    timing: [Not Using]
    ************************************************************
    Searching for main indexes...
    Found 10 index (40 total files).
    Not using secondary indexes.
    ************************************************************
    Reading in reference genome from HUMAN_SCREENING_DB.current_plus_novelBGI.fna.nt.brg.
    In total read 14657 contigs for a total of 3103051358 bases
    ************************************************************
    Reading flowcell61BKElane1end1.5000_random.100mer.fna into temp files.
    Segmentation fault (core dumped)
    15.930u 5.440s 0:45.24 47.2% 0+0k 0+3033976io 0pf+0w



    I'm running this on a blade with 8Gb of memory reserved for the process. I was not watching the memory usage, but I didn't see any memory related errors. My db size is 3.0Gb. I used the seed masks from the bfast manual (the set listed for >40bp reads...my reads are 100bp).

    Is there anything obvious that I could be doing wrong? Would the fact that I am trying to run all this on a single thread be causing problems? I noticed in another thread that somebody had problems with -n 8, but at -n 4 it was working. I wonder if a similar problem happens when using only a single thread?
    Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

    To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

    Nils

    Comment


    • #3
      Originally posted by nilshomer View Post
      Are you reads in the FASTQ format (post a sample here to make sure)? For illumina data, BFAST comes with a handy script "ill2fastq.pl" that will convert the raw data of a sequencer to the proper FASTQ format. Once the inputs are validated, and if the problem persists, I can suggest some ways to debug

      To be clear, the thread problem another user had is an isolated case from my perspective. In fact, BFAST has hundreds of users and genome centers using it with no problems (just like BWA/MAQ etc). I say this since such isolated reported problems get hyped (and thus deem an infective software tool), when in fact it has been successful most elsewhere.

      Nils
      Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

      Thanks again to Nils for his help with this

      Comment


      • #4
        Originally posted by blu78 View Post
        Hi, to be fair to BFAST I have to say that just a few moments ago I tried the same dataset I tried a few days ago but on a different machine and it worked with 8 cores. Therefore I completely agree that my case might have been an isolated one. Perhaps I have different versions of some libraries in the two different machines... I will investigate this further as soon as I will have a bit more free time.

        Thanks again to Nils for his help with this
        If you figure it out let me know. I am always grateful for feedback as it will better inform me on how to help other users (a skill in constant training).

        Comment


        • #5
          In the machine that gives the problem I get some warnings at compilation time which I think might be related to zlib...

          If I get some more info I will let you know.

          Comment


          • #6
            I just realized that I was mistakenly feeding it a fasta file instead of a fastq file. So this is just a case of user error, my apologies for this post & thanks for the (very) fast reply.

            Comment


            • #7
              I had a related problem recently when I accidentally tried to ailign a bwa-fastq file (CS), then the match step worked but not the localalign. BFAST will also crash if there is a corrupt read in the fastq file, a sanity check on the read would make the life much easier for both the user and developer...

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X