Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing error in BAM header

    Hi~ I am pretty new to this area, having a hard time with these huge files..

    I wanted to use IndelGenotyperV2 (from GATK) with my newly built BAM files. When I executed the command I found an error like below..

    java.lang.RuntimeException: net.sf.samtools.SAMFormatException: Error parsing SAM header. Problem parsing @PG key:value pair.

    And the @PG line is like below:

    @PG ID:illumina_export2sam.pl VN:2.0.0 CL:/opt/GOAT/CASAVA_1.7.0a6/bin/illumina_export2sam.pl --read1=s_7_1_export.txt --read2=s_7_2_export.txt

    I don't figure out what the problem is here.. There are all three tags (ID, VN, and CL).

    One hint is that I can find a warning when I check the header part with samtools (samtools view myfile.bam -H) like below:

    The tag '--' present (at least) twice on line [@PG ID:illumina_export2sam.pl VN:2.0.0 CL:/opt/GOAT/CASAVA_1.7.0a6/bin/illumina_export2sam.pl --read1=s_7_1_export.txt --read2=s_7_2_export.txt]

    Is this a cause of this error? or there's any other problem in my file?

    Thanks,

  • #2
    I would recommend that you "reheader" your sam file with "samtools reheader" if you would like to try and isolate the cause of this problem.
    Basically use "samtools view -H file.bam > header.txt". Edit header.txt and perhaps remove the "--" from all your header lines, then use "samtools reheader" with your new header file and the bam file.

    Comment


    • #3
      The hint from samtools view is telling you that '--' is being interpreted as a tag, which means that it is immediately following a tab character each time it appears.

      SAM header fields are delimited by tabs, so header field values of course cannot themselves contain tabs. Your CL: value has the words of the command line separated by tabs rather than spaces, leading to parsing confusion.

      If illumina_export2sam.pl is generating this CL: value with tabs inside it, then that is a bug in illumina_export2sam.pl -- it should be replacing tabs with spaces (or doing something similar) to ensure that it is outputting a valid SAM header.

      You may be able to reheader your BAM file so as to replace these spurious tabs with spaces yourself. Or when you produce the SAM file it would be easy to replace them with sed or a text editor. Or it should be easy to fix illumina_export2sam.pl yourself if it is a Perl script -- just search for /@PG/ and/or /CL:/ which most likely appear exactly once in the script.

      Comment


      • #4
        Thank you zee and jmarshall.
        As in your postings, I extracted original headers from my BAM files using "samtools view -H myfile.bam > output.file". And using picard "ReplaceSamHeader", I successfully replaced the modified header.

        So, now the IndelGenotyperV2 does not arise an error message about bam headers.
        Now it is complaining about memory.. haha (althogh I gave 2g to him)

        Thank you both anyway

        Comment


        • #5
          Originally posted by zee View Post
          I would recommend that you "reheader" your sam file with "samtools reheader" if you would like to try and isolate the cause of this problem.
          Basically use "samtools view -H file.bam > header.txt". Edit header.txt and perhaps remove the "--" from all your header lines, then use "samtools reheader" with your new header file and the bam file.
          I also have a problem of the BAM file header
          "Error parsing SAM header. Problem parsing @PG key:value pair. Line:
          @PG TopHat VN:1.0.13"
          when i use "samtools view -H file.bam > header.txt" and Edit header.txt then use "samtools reheader" ,it comes out gibberish.how to use the "samtools reheader "??

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM
          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 01:32 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-24-2024, 07:15 AM
          0 responses
          199 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-23-2024, 10:28 AM
          0 responses
          221 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-23-2024, 07:35 AM
          0 responses
          231 views
          0 likes
          Last Post seqadmin  
          Working...
          X