Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa aln Segmentation fault

    Hi

    I'd like to use bwa alignment tool for aligning SOLiD color space reads (SREK)onto reference sequences (human miRNAs).
    After converting the csfasta und qual files to fastq by using solid2fastq that was provided in the bwa-software, I have run bwa aln but got a segmentation fault. The fastq files contains 15'643'846 reads. When I use gdb, I got the following output:

    [SREK] gdb /apps/bi/bwa-0.5.7/bwa
    GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.2rh)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

    (gdb) run aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    Starting program: /apps/bi/bwa-0.5.7/bwa aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    [Thread debugging using libthread_db enabled]
    [New Thread 182894247456 (LWP 30893)]
    [bwa_aln_core] calculate SA coordinate... 5.42 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 262144 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.55 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 524288 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.83 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 786432 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 4.93 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1048576 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.62 sec
    [bwa_aln_core] write to the disk... 0.04 sec
    [bwa_aln_core] 1310720 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 6.11 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1572864 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.84 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1835008 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate...
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 182894247456 (LWP 30893)]
    0x00000000004038f6 in bwt_cal_width (rbwt=0x532a20, len=0, str=0x532ec0 "", width=0x0) at bwtaln.c:76
    76 bwtaln.c: No such file or directory.
    in bwtaln.c
    (gdb)


    Source code from bwtaln.c :
    ####################
    // width must be filled as zero
    static int bwt_cal_width(const bwt_t *rbwt, int len, const ubyte_t *str, bwt_width_t *width)
    {
    bwtint_t k, l, ok, ol;
    int i, bid;
    bid = 0;
    k = 0; l = rbwt->seq_len;
    for (i = 0; i < len; ++i) {
    ubyte_t c = str[i];
    if (c < 4) {
    bwt_2occ(rbwt, k - 1, l, c, &ok, &ol);
    k = rbwt->L2[c] + ok + 1;
    l = rbwt->L2[c] + ol;
    }
    if (k > l || c > 3) { // then restart
    k = 0;
    l = rbwt->seq_len;
    ++bid;
    }
    width[i].w = l - k + 1;
    width[i].bid = bid;
    }
    width[len].w = 0;
    width[len].bid = ++bid; // ###### line 76 #####
    return bid;
    }

    I have no clue how to solve the problem.
    Any help is greatly appreciated!

    Many thanks!

  • #2
    Please check out the latest SVN. Someone has spotted that 0.5.7 may use excessive memory due to a bug/typo.

    Comment


    • #3
      Thanks for the advice!
      I've tried it but it didn't change, either using 1.5 Mio reads.

      The memory usage was always < 70 MB

      Comment


      • #4
        Lately I had been encountering inexplicable segmentation faults during the 'aln' command for SOLiD reads, and I came across this thread.

        This is identical to the problems I've been experiencing, and I found a solution.

        The problem occurs when the first read of a 262144 block has a length of zero. This is why it's so rare and so hard to reproduce. The w[0] and w[1] structures in the bwa_cal_sa_reg_gap function are only allocated memory when the current sequence length strictly exceeds the current maximum, which is initialized to 0. If the first read encountered is of length zero, it will not be allocated memory and thus the segfault occurs in the bwt_cal_width function as described by the original poster of this thread.

        I was able to fix this by initializing the max_l variable at the beginning of the bwa_cal_sa_reg_gap function to -1 instead of 0.
        Last edited by dp05yk; 02-26-2011, 08:14 AM.

        Comment


        • #5
          I had experienced segmentation fault too - upon investigation of the input file(s) around the line number where the seg fault occurs, I found some commandline text (e.g., 0.11 sec) in the input files (generated from a previous script). Removing those texts solved the problem for me.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            The Impact of AI in Genomic Medicine
            by seqadmin



            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
            02-26-2024, 02:07 PM
          • seqadmin
            Multiomics Techniques Advancing Disease Research
            by seqadmin


            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

            A major leap in the field has
            ...
            02-08-2024, 06:33 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 02-28-2024, 06:12 AM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-23-2024, 04:11 PM
          0 responses
          74 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2024, 08:52 AM
          0 responses
          82 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-20-2024, 08:57 AM
          0 responses
          69 views
          0 likes
          Last Post seqadmin  
          Working...
          X