I'm running into an issue trying to remove duplicate reads with Samtools and think it might be a bug in the latest version.
I have a pipeline to process my aligned data before putting it into Freebayes to call variants that works as such:
The entire pipeline works fine up until rmdup where it immediately results in a segmentation fault.
What's particularly interesting is that this seg fault only occurs in some files. I originally thought this had something to do with size or memory allocation, since rmdup worked fine with RNA seq data but not with this genome seq data. However, the seg fault still occurred even when I allocated more memory.
Furthermore, I think this might be a bug. The lab group I work with recently switched to a new cluster system. I run into the seg fault on the new cluster, which is running samtools 1.3, but the seg fault does not occur on the old cluster, which is running version 0.1.18.
The error has been narrowed down to a line number in the bam.c file, with debug output below:
Does anyone have any suggestions/potential fixes?
I have a pipeline to process my aligned data before putting it into Freebayes to call variants that works as such:
Code:
samtools view -S -b foo.sam > foo.bamnorg # convert from sam to bam bamaddrg -b foo.bamnorg > foo.bam # ensure read groups are attached samtools sort foo.bam foo.sorted .bam # sorts samtools index foo.sorted.bam # index the sorted bam samtools rmdup foo.sorted.bam foo.sorted.1.bam # remove duplicate reads freebayes ...
What's particularly interesting is that this seg fault only occurs in some files. I originally thought this had something to do with size or memory allocation, since rmdup worked fine with RNA seq data but not with this genome seq data. However, the seg fault still occurred even when I allocated more memory.
Furthermore, I think this might be a bug. The lab group I work with recently switched to a new cluster system. I run into the seg fault on the new cluster, which is running samtools 1.3, but the seg fault does not occur on the old cluster, which is running version 0.1.18.
The error has been narrowed down to a line number in the bam.c file, with debug output below:
Code:
$ gdb samtools core GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from samtools...done. [New LWP 11878] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `samtools rmdup NAT_B_9.sorted.bam NAT_B_9.sorted.1.bam'. Program terminated with signal SIGSEGV, Segmentation fault. #0 bam_get_library (h=h@entry=0x8bb1c0, b=b@entry=0x8db9c0) at bam.c:112 112 for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++) (gdb) list 107 if (strncmp(rg, ID, strlen(rg)) != 0 || ID[strlen(rg)] != '\t') 108 continue; 109 110 // Valid until next query 111 static char LB_text[1024]; 112 for (cp = LB; *cp && *cp != '\t' && *cp != '\n'; cp++) 113 ; 114 strncpy(LB_text, LB, MIN(cp-LB, 1023)); 115 LB_text[MIN(cp-LB, 1023)] = 0; 116 (gdb)
Comment