Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa aln Segmentation fault

    Hi

    I'd like to use bwa alignment tool for aligning SOLiD color space reads (SREK)onto reference sequences (human miRNAs).
    After converting the csfasta und qual files to fastq by using solid2fastq that was provided in the bwa-software, I have run bwa aln but got a segmentation fault. The fastq files contains 15'643'846 reads. When I use gdb, I got the following output:

    [SREK] gdb /apps/bi/bwa-0.5.7/bwa
    GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.2rh)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

    (gdb) run aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    Starting program: /apps/bi/bwa-0.5.7/bwa aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    [Thread debugging using libthread_db enabled]
    [New Thread 182894247456 (LWP 30893)]
    [bwa_aln_core] calculate SA coordinate... 5.42 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 262144 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.55 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 524288 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.83 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 786432 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 4.93 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1048576 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.62 sec
    [bwa_aln_core] write to the disk... 0.04 sec
    [bwa_aln_core] 1310720 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 6.11 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1572864 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.84 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1835008 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate...
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 182894247456 (LWP 30893)]
    0x00000000004038f6 in bwt_cal_width (rbwt=0x532a20, len=0, str=0x532ec0 "", width=0x0) at bwtaln.c:76
    76 bwtaln.c: No such file or directory.
    in bwtaln.c
    (gdb)


    Source code from bwtaln.c :
    ####################
    // width must be filled as zero
    static int bwt_cal_width(const bwt_t *rbwt, int len, const ubyte_t *str, bwt_width_t *width)
    {
    bwtint_t k, l, ok, ol;
    int i, bid;
    bid = 0;
    k = 0; l = rbwt->seq_len;
    for (i = 0; i < len; ++i) {
    ubyte_t c = str[i];
    if (c < 4) {
    bwt_2occ(rbwt, k - 1, l, c, &ok, &ol);
    k = rbwt->L2[c] + ok + 1;
    l = rbwt->L2[c] + ol;
    }
    if (k > l || c > 3) { // then restart
    k = 0;
    l = rbwt->seq_len;
    ++bid;
    }
    width[i].w = l - k + 1;
    width[i].bid = bid;
    }
    width[len].w = 0;
    width[len].bid = ++bid; // ###### line 76 #####
    return bid;
    }

    I have no clue how to solve the problem.
    Any help is greatly appreciated!

    Many thanks!

  • #2
    Please check out the latest SVN. Someone has spotted that 0.5.7 may use excessive memory due to a bug/typo.

    Comment


    • #3
      Thanks for the advice!
      I've tried it but it didn't change, either using 1.5 Mio reads.

      The memory usage was always < 70 MB

      Comment


      • #4
        Lately I had been encountering inexplicable segmentation faults during the 'aln' command for SOLiD reads, and I came across this thread.

        This is identical to the problems I've been experiencing, and I found a solution.

        The problem occurs when the first read of a 262144 block has a length of zero. This is why it's so rare and so hard to reproduce. The w[0] and w[1] structures in the bwa_cal_sa_reg_gap function are only allocated memory when the current sequence length strictly exceeds the current maximum, which is initialized to 0. If the first read encountered is of length zero, it will not be allocated memory and thus the segfault occurs in the bwt_cal_width function as described by the original poster of this thread.

        I was able to fix this by initializing the max_l variable at the beginning of the bwa_cal_sa_reg_gap function to -1 instead of 0.
        Last edited by dp05yk; 02-26-2011, 08:14 AM.

        Comment


        • #5
          I had experienced segmentation fault too - upon investigation of the input file(s) around the line number where the seg fault occurs, I found some commandline text (e.g., 0.11 sec) in the input files (generated from a previous script). Removing those texts solved the problem for me.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Multiomics Techniques Advancing Disease Research
            by seqadmin


            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

            A major leap in the field has
            ...
            02-08-2024, 06:33 AM
          • seqadmin
            The 3D Genome: New Technologies and Emerging Insights
            by seqadmin


            The study of three-dimensional (3D) genomics explores the spatial structure of genomes and their role in processes like gene expression and DNA replication. By employing innovative technologies, researchers can study these arrangements to discover their role in various biological processes. Scientists continue to find new ways in which the organization of DNA is involved in processes like development1 and disease2.

            Basic Organization and Structure
            Understanding...
            01-22-2024, 03:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:57 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-14-2024, 09:19 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-12-2024, 03:37 PM
          0 responses
          410 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-09-2024, 03:36 PM
          0 responses
          649 views
          0 likes
          Last Post seqadmin  
          Working...
          X