Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fpepin
    replied
    No, I haven't been able to figure it out and I got side-tracked by other problems. I'll see if I can spend a bit more time to go to the bottom of it.

    Leave a comment:


  • stefano
    replied
    Hi,

    was wondering if you have figured out where the problem is, as I think I have
    run into a similar problem, using a combination of samtools 'merge' and 'reheader'.

    I think there is a bug in 'samtools reheader', i.e. if I do
    samtools reheader <in.header.sam> <in.bam>
    the header in <in.bam> are replaced with the header in <in.header.sam> as
    expected but somehow the alignment lines are also modified (which in my
    understanding is not what you want). And then if try and index the resulting .bam
    file, I do get a segmentation fault as well.

    Leave a comment:


  • fpepin
    replied
    Any suggestion on how to find the reads that are causing the problem?

    samtools view does give out the beginning of the sam file so it's not every read that's problematic, but I'm not quite sure how to locate it/them. After all the individual bam files before the merge can each be indexed without problems.

    Leave a comment:


  • n00c
    replied
    I just created a small BAM file from the three lines you posted, and there was no segfault on executing "samtools view -h file.bam > out.sam," with out.sam being the same as input SAM. This seems to suggest that the problem is not with FASTQ quality conversion.

    I did notice that there are two optional tags concatenated in the first line: "XG:i:0AM:i:0" -- those should be separated with a tab: "XG:i:0 AM:i:0". This could cause a problem, though in my case the AM field was simply ignored in the SAM->BAM step.

    Perhaps you could generate a small BAM file (suitable for email attachment) with which the segfault could be reproduced, and then post it either here or file a Samtools bug report.

    Leave a comment:


  • fpepin
    started a topic bwa bug (?) leading to samtools segfault

    bwa bug (?) leading to samtools segfault

    Hi everyone,

    In trying what might be a bit of premature optimization, I'm hitting an unfortunate segmentation fault in samtools.

    This is with an illumina GAIIx. The original pipeline ungzipped the illumina fastQ files and converted them to Sanger format and then fed it to bwa and all was well.

    Since bwa (0.5.9) is able to read gzipped illumina fastQ files, I decided to update the pipeline with it and then hit a segfault down the line with samtools (0.1.13) when trying to index one of the bam files (samtools index file.bam).

    In looking at previous posts, I tried to see if I could go around it:
    0) samtools index file.bam
    This is what I'm trying to do, leading to a segfault.

    1) samtools sort file.bam newfile
    This works, but the resulting file still has the same issue if I try to index it. I don't know if this is relevant, but the sorted file is 58 bytes shorter (out of 8.6Gb).

    2) samtools view -h file.bam > file.sam
    This segfaults also, after creating a file.sam, which is truncated at 3.5Mb (8452 lines).

    The last few lines are:
    Code:
    HWI-EAS412_0004:1:53:1742:17429#0       99      1       132888  0       60M     =       132919  91      GTCCCTCCCAACACTAAGGCTTTCCTAGGCAGGAGCTGGGCTGAGCCACCCGGGGGGCAG   A=CEEEDFEBADBDD?@CD?DDDDFE@CEFBDF@D>EGFEFEG@AFG=EF=@<E5A??;<    X0:i:13 X1:i:14 MD:Z:60 RG:Z:sample_tumor1 XG:i:0AM:i:0   NM:i:0  SM:i:0  XM:i:0  XO:i:0  OQ:Z:GGGGGGGGGGFGGGGGDGGBFGGGGGGGFGGGGGGBGGGFGGGGDFGBFFBGBG5DAFEF       XT:A:R
    GA2_0011:8:71:6879:4088#0       69      1       132891  0       *       =       132891  0       ACCCCGAAAGCCCAGTCAGTTTCTCTTCAGGCTCTGCCCCCCGGGTGGCTCAGCCCAGCTCCTGCCTAGGAAAGCCTTAGTGTTGGGAGGAGATCGGAAGA  ACDEE??BBCEDDAC=CAC=CBDCDCCDACDECDCEEEEEEE?AA?FDEDAB@FEECAFEBEEGFGF@A:?AB=GFDC4@AIAEH?@6<;9<76639855#  RG:Z:sample_tumor2 OQ:Z:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEEEGFEGFEEEFFEEEEFFEEEEEEE:EEE@ECAB5CBCBDC@A0A?9C<AAAC4=A#
    GA2_0011:8:71:6879:4088#0       137     1       132891  0       92M2I7M =       132891  0       CCTCCCAACACTAAGGCTTTCCTAGGCAGGAGCTGGGCTGAGCCACCCGGGGGGCAGAGCCTGAAGAGAAACTGACTGGGCTTTCGGGGTAGATCGGAAGA  :>>@BCAACADC?BCDECCCDDC?CDEBCE?DEDFDEEDF?DEEBDEE@EEEBEECB?BGGDF>AD?A?BBFEH<FEG@CGFEEB<BB<58<866495675  X0:i:7  X1:i:0  MD:Z:90C1G2C3   RG:Z:sample_tumor2 XG:i:2  AM:i:0  NM:i:5  SM:i:0  XM:i:3  XO:i:1  OQ:Z:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGFGGGGGGGGGGGGEG
    This is the bam file that is created by "samtools merge" as we have run 2 lanes with this sample. The file is then modified by "samtools reheader" to add a the readgroup information about the 2nd lane. As I inherited this pipeline and it was using older versions of various tools, this step might not be ideal, but it does work if I start the pipeline using Sanger fastQ files. The segfault happens right after I call "samtools index"

    This is not a big deal as I can work around this, but I'm generally of the opinion that segfaults are bad things that should be fixed. I'll be happy to provide some more information if it can help. Please tell me also if I should be posting instead on the mailing for those tools instead.

    P.S. In the off-chance that this might be an issue:
    The fastQ conversion was done in a simple c program, the important bit being:
    Code:
        q33[i] = (char)((int) q33[i] - 31);

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X