Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rfriedman22
    Junior Member
    • Feb 2016
    • 1

    Samtools rmdup segmentation fault

    I'm running into an issue trying to remove duplicate reads with Samtools and think it might be a bug in the latest version.

    I have a pipeline to process my aligned data before putting it into Freebayes to call variants that works as such:
    Code:
    samtools view -S -b foo.sam > foo.bamnorg       # convert from sam to bam
    bamaddrg -b foo.bamnorg > foo.bam           # ensure read groups are attached
     samtools sort foo.bam foo.sorted .bam		# sorts
     samtools index foo.sorted.bam 				# index the sorted bam
     samtools rmdup foo.sorted.bam foo.sorted.1.bam  # remove duplicate reads
     freebayes ...
    The entire pipeline works fine up until rmdup where it immediately results in a segmentation fault.

    What's particularly interesting is that this seg fault only occurs in some files. I originally thought this had something to do with size or memory allocation, since rmdup worked fine with RNA seq data but not with this genome seq data. However, the seg fault still occurred even when I allocated more memory.

    Furthermore, I think this might be a bug. The lab group I work with recently switched to a new cluster system. I run into the seg fault on the new cluster, which is running samtools 1.3, but the seg fault does not occur on the old cluster, which is running version 0.1.18.

    The error has been narrowed down to a line number in the bam.c file, with debug output below:
    Code:
    $ gdb samtools core
    GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
    Copyright (C) 2015 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from samtools...done.
    
    [New LWP 11878]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    Core was generated by `samtools rmdup NAT_B_9.sorted.bam NAT_B_9.sorted.1.bam'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  bam_get_library (h=h@entry=0x8bb1c0, b=b@entry=0x8db9c0) at bam.c:112
    112	        for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
    (gdb) list
    107	        if (strncmp(rg, ID, strlen(rg)) != 0 || ID[strlen(rg)] != '\t')
    108	            continue;
    109	
    110	        // Valid until next query
    111	        static char LB_text[1024];
    112	        for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++)
    113	            ;
    114	        strncpy(LB_text, LB, MIN(cp-LB, 1023));
    115	        LB_text[MIN(cp-LB, 1023)] = 0;
    116	
    (gdb)
    Does anyone have any suggestions/potential fixes?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    File a bug report with the developers: http://www.htslib.org/support/

    Comment

    • Richard Finney
      Senior Member
      • Feb 2009
      • 701

      #3
      The variable LB is declared on line 83 in file bam.c in the function bam_get_library()
      It is initialized to null.

      It may be set on line on line 99 (to point after the string "LB:" [ string means array of characters ; i.e. stuff you can actually read in the SAM version of the BAM]

      If it does not find the string "LB:", then it remains null.

      The code on line 112 uses the pointer variable "cp" to access what's at "LB", which if it did not find "LB:" will be null.
      Accessing what null points to will result in a segfault on most operating systems.

      Not sure what the solution is .... trick by hacking the bam header? commenting it out? Using old samtools? Is there a flag to disable this check?

      If you have enough expertise to crank up gdb, note that you can run it not on a core but straight on the samtools binary. Looks like the default "make" includes the really helpful debug flag : "-g" so no need to rerun "make".
      set the args, then "run". You'll segfault at line 112 and you can print the value of LB to see that it is null.
      Last edited by Richard Finney; 02-16-2016, 04:48 PM.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        This should hopefully be fixed in PR #539 from jkbonfield.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        24 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        29 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        23 views
        0 reactions
        Last Post SEQadmin2  
        Working...