Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM file without a header. Can't convert to BAM

    Hi, guys! I'm a complete noobie in computational staff and just trying to learn thing. I want to convert SAM file to BAM and my SAM file does not have a header.
    So, I goggle the problem and found one possible solution with -t options from samtools. I created the reference.fa.fai with
    Code:
    samtools faidx reference.fa
    command and then tried to run

    Code:
    samtools view -Sb sam.file -t reference.fa.fai  >  bam.file
    However, I'm getting the error message

    Code:
    [sam_header_read2] 35 sequences loaded.
    Parse error at line 2: missing colon in auxiliary data
    /var/spool/sge//node091/job_scripts/1441097: line 10: 19514 Aborted                 (core dumped) samtools view -Sb $in_file.sam -t /$
    I've tried to use advices from here http://seqanswers.com/forums/showthread.php?t=9650, but still keep getting the same error message.

    Please, help!

  • #2
    Maybe try ReplaceSamHeader in Picard http://picard.sourceforge.net/comman...placeSamHeader
    Petri Dish Talk

    Comment


    • #3
      Try googling/duckducking your error message ...
      "missing colon in auxiliary data"

      See previous discussions:
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      Sounds like "aux" means the stuff after QUAL(quality filed) in a SAM line (i.e. a text version of the bam record).
      (See line 367 of samtools-0.1.18/bam_import.c ). There's probably invalid OPT (aux) data. That is, the additional annotation information is not properly formatted. Some extra data got a colonectomy or something. (sorry). Just cut out all of the the OPT (aux) data if you don't need it ... something like this ...

      cat original.sam | cut -f1-12 > new.tmp.sam

      or somesuch.

      Comment


      • #4
        As Richard mentioned, this means that one of the "tags" in the auxiliary section (the stuff following the quality scores) is malformed. This error is thrown if a tag, including the colons, is less than 6 characters long, or is lacking a colon as the third or fifth character. You might get lucky and just be able to fix the second line of your SAM file in a text editor. Otherwise, you'll need to either cut out the auxiliary info or do a search and replace on the whole file, replacing the affected tag(s).

        Comment


        • #5
          Originally posted by FractalExpression View Post
          Maybe try ReplaceSamHeader in Picard http://picard.sourceforge.net/comman...placeSamHeader
          Thank you very much! Looks like it worked... the resulting file is slightly smaller, that's strange, but end of my old file without header and end of this file are identical, so I'll try to proceed with it further and see what will happen.

          Thanks for your help.

          Comment


          • #6
            Originally posted by Richard Finney View Post
            See previous discussions:
            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            http://seqanswers.com/forums/showthread.php?t=14902
            I tried to follow these advices, but it did not help me and I was still getting the same error message. However, it looks like that the Picardtools suggested by FractalExpression worked, so I'll try to proceed the file.

            Comment


            • #7
              Try

              Code:
              samtools view -Sbh sam.file   >  bam.file
              instead

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X