Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amango
    Member
    • Dec 2009
    • 17

    SAM (from bowtie2) to BAM problem using samtools

    I generated a de novo assembly of illumina hiseq reads using Abyss & trans-Abyss, aligned the reads back to the assembly using bowtie2, and am now trying to view the resulting SAM alignment in IGV. IGV recommends BAM over SAM, but I'm getting errors in converting to BAM using samtools.

    The command I'm using is:
    Code:
    samtools view -bhS Ua_1.0_pn.sam > Ua_1.0_pn.bam
    Though I get a resulting bam file, I also get this output in my terminal:
    [samopen] SAM header is present: 200079 sequences.
    [sam_read1] reference '323' is recognized as '*'.
    Parse error at line 60897629: sequence and quality are inconsistent
    Abort trap
    When I ignored the error and tried to sort the bam file with the command:
    Code:
    samtools sort Ua_1.0_pn.bam Ua_1.0_pn_sort.bam
    I got the error:
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_sort_core] truncated file. Continue anyway.
    Bus error
    Here are the first few lines of my SAM file:
    @HD VN:1.0 SO:unsorted
    @SQ SN:k48:100000u LN:120
    @SQ SN:k48:100008 LN:199
    @SQ SN:k48:100019u LN:95
    @SQ SN:k48:100020 LN:195

    Can anyone help? I haven't been able to find relevant help on seqanswers or anywhere else.
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Originally posted by amango View Post
    Though I get a resulting bam file, I also get this output in my terminal:
    Code:
    [samopen] SAM header is present: 200079 sequences.
    [sam_read1] reference '323' is recognized as '*'.
    Parse error at line 60897629: sequence and quality are inconsistent
    Abort trap
    I think samtools only likes the start of your SAM file, and something goes wonky at that line quoted. My guess is a data corruption of some sort at that point, throwing the tab separated SAM fields out of sync.

    You should check that region of the SAM file to confirm this hunch. Can you show us that chunk of the data (say 5 lines either side)?

    Or just regenerate the SAM file and hope it works second time round

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      The first error mentioned a problem on line 60897629. Have you looked at that line to see if something is amiss in the SAM file? I expect that when it's aborting early that it's not finishing the file, thereby causing the second error you saw.

      Comment

      • amango
        Member
        • Dec 2009
        • 17

        #4
        Line 60897629 is indeed different. It's the middle one below, starting with "165...". I looked a hundred lines before this section, and it seems to be the first that appears this way, the some other similar lines appear further down the file.

        I read about the SAM format specification (for the first time--I'm 1 month into the world of bioinformatics) and from what I can tell, this line is missing a QNAME (query name; col 1), has a map quality of 0 (col 5), has no sequence (col 10) or quality (col 11), and has two extra columns, with values: YT:Z:UP & YF:Z:LN.

        Does this make sense to anyone? Any advice on how I can get past this?

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          Easy answer...just delete the offending lines. Maybe your bam got garbled somewhere along the line, or maybe the fatsq was garbled at that point. It's just one read out of millions.
          Last edited by swbarnes2; 03-02-2012, 11:26 AM.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            You might want to look back at the original fastq file and try to determine which read that's supposed to be. It's possible that the fastq file is screwy (maybe it's missing that read?). The YT and YF values are optional fields.

            A fix purely for the creating a BAM file issue is to delete the screwed up line. Just eye-balling things, it looks like you have paired-end data, so make sure you get both pairs.

            If the fastq file seems ok, then you might try to track down the bug and submit a bug report to the bowtie2 developers.

            Comment

            • aaronrjex
              Junior Member
              • Aug 2012
              • 7

              #7
              I had this happen to me recently with bowtie2. I don't know if this is still an issue for you, but I found that if I regenerated the bowtie2 library, reran bowtie2 and then converted to bam and sorted etc, I no longer had a problem. I don't know if there is some weird little occasional glitch with the program or it is was some other factor (I used the same commands each time). We had a substantial lightning storm here the night of the original runs that had the error you mentioned. Whether that affected the run I have no idea. Although other differences from this run versus the previous run that might be just as logical explanations are that I am wearing shorts today rather than jeans and this morning I had toast with jam.

              Comment

              • Lilith-Elina
                Junior Member
                • Oct 2011
                • 4

                #8
                Hey,
                did any of you get IGV to open/sort/index a BAM file generated with samtools from a SAM file generated with bowtie2? igvtools refuses to take my file...

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Yes, if you're getting an error message on the console, then go ahead and post it. I've only ever used IGV to open BAM files. I use samtools for sorting and indexing.

                  Comment

                  • Lilith-Elina
                    Junior Member
                    • Oct 2011
                    • 4

                    #10
                    Originally posted by dpryan View Post
                    I use samtools for sorting and indexing.
                    I tried that after I posted my question and it worked. I guess I won't use igvtools from now on, samtools is faster anyway.

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Originally posted by Lilith-Elina View Post
                      I tried that after I posted my question and it worked. I guess I won't use igvtools from now on, samtools is faster anyway.
                      Yeah, it's quite useful for other formats, but samtools and picard are the de facto standards for all things SAM/BAM related.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 11:10 AM
                      0 responses
                      7 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      43 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      104 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      125 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...