Hi everyone,
In trying what might be a bit of premature optimization, I'm hitting an unfortunate segmentation fault in samtools.
This is with an illumina GAIIx. The original pipeline ungzipped the illumina fastQ files and converted them to Sanger format and then fed it to bwa and all was well.
Since bwa (0.5.9) is able to read gzipped illumina fastQ files, I decided to update the pipeline with it and then hit a segfault down the line with samtools (0.1.13) when trying to index one of the bam files (samtools index file.bam).
In looking at previous posts, I tried to see if I could go around it:
0) samtools index file.bam
This is what I'm trying to do, leading to a segfault.
1) samtools sort file.bam newfile
This works, but the resulting file still has the same issue if I try to index it. I don't know if this is relevant, but the sorted file is 58 bytes shorter (out of 8.6Gb).
2) samtools view -h file.bam > file.sam
This segfaults also, after creating a file.sam, which is truncated at 3.5Mb (8452 lines).
The last few lines are:
This is the bam file that is created by "samtools merge" as we have run 2 lanes with this sample. The file is then modified by "samtools reheader" to add a the readgroup information about the 2nd lane. As I inherited this pipeline and it was using older versions of various tools, this step might not be ideal, but it does work if I start the pipeline using Sanger fastQ files. The segfault happens right after I call "samtools index"
This is not a big deal as I can work around this, but I'm generally of the opinion that segfaults are bad things that should be fixed. I'll be happy to provide some more information if it can help. Please tell me also if I should be posting instead on the mailing for those tools instead.
P.S. In the off-chance that this might be an issue:
The fastQ conversion was done in a simple c program, the important bit being:
In trying what might be a bit of premature optimization, I'm hitting an unfortunate segmentation fault in samtools.
This is with an illumina GAIIx. The original pipeline ungzipped the illumina fastQ files and converted them to Sanger format and then fed it to bwa and all was well.
Since bwa (0.5.9) is able to read gzipped illumina fastQ files, I decided to update the pipeline with it and then hit a segfault down the line with samtools (0.1.13) when trying to index one of the bam files (samtools index file.bam).
In looking at previous posts, I tried to see if I could go around it:
0) samtools index file.bam
This is what I'm trying to do, leading to a segfault.
1) samtools sort file.bam newfile
This works, but the resulting file still has the same issue if I try to index it. I don't know if this is relevant, but the sorted file is 58 bytes shorter (out of 8.6Gb).
2) samtools view -h file.bam > file.sam
This segfaults also, after creating a file.sam, which is truncated at 3.5Mb (8452 lines).
The last few lines are:
Code:
HWI-EAS412_0004:1:53:1742:17429#0 99 1 132888 0 60M = 132919 91 GTCCCTCCCAACACTAAGGCTTTCCTAGGCAGGAGCTGGGCTGAGCCACCCGGGGGGCAG A=CEEEDFEBADBDD?@CD?DDDDFE@CEFBDF@D>EGFEFEG@AFG=EF=@<E5A??;< X0:i:13 X1:i:14 MD:Z:60 RG:Z:sample_tumor1 XG:i:0AM:i:0 NM:i:0 SM:i:0 XM:i:0 XO:i:0 OQ:Z:GGGGGGGGGGFGGGGGDGGBFGGGGGGGFGGGGGGBGGGFGGGGDFGBFFBGBG5DAFEF XT:A:R GA2_0011:8:71:6879:4088#0 69 1 132891 0 * = 132891 0 ACCCCGAAAGCCCAGTCAGTTTCTCTTCAGGCTCTGCCCCCCGGGTGGCTCAGCCCAGCTCCTGCCTAGGAAAGCCTTAGTGTTGGGAGGAGATCGGAAGA ACDEE??BBCEDDAC=CAC=CBDCDCCDACDECDCEEEEEEE?AA?FDEDAB@FEECAFEBEEGFGF@A:?AB=GFDC4@AIAEH?@6<;9<76639855# RG:Z:sample_tumor2 OQ:Z:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEEEGFEGFEEEFFEEEEFFEEEEEEE:EEE@ECAB5CBCBDC@A0A?9C<AAAC4=A# GA2_0011:8:71:6879:4088#0 137 1 132891 0 92M2I7M = 132891 0 CCTCCCAACACTAAGGCTTTCCTAGGCAGGAGCTGGGCTGAGCCACCCGGGGGGCAGAGCCTGAAGAGAAACTGACTGGGCTTTCGGGGTAGATCGGAAGA :>>@BCAACADC?BCDECCCDDC?CDEBCE?DEDFDEEDF?DEEBDEE@EEEBEECB?BGGDF>AD?A?BBFEH<FEG@CGFEEB<BB<58<866495675 X0:i:7 X1:i:0 MD:Z:90C1G2C3 RG:Z:sample_tumor2 XG:i:2 AM:i:0 NM:i:5 SM:i:0 XM:i:3 XO:i:1 OQ:Z:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGFGGGGGGGGGGGGEG
This is not a big deal as I can work around this, but I'm generally of the opinion that segfaults are bad things that should be fixed. I'll be happy to provide some more information if it can help. Please tell me also if I should be posting instead on the mailing for those tools instead.
P.S. In the off-chance that this might be an issue:
The fastQ conversion was done in a simple c program, the important bit being:
Code:
q33[i] = (char)((int) q33[i] - 31);
Comment