Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM (from bowtie2) to BAM problem using samtools

    I generated a de novo assembly of illumina hiseq reads using Abyss & trans-Abyss, aligned the reads back to the assembly using bowtie2, and am now trying to view the resulting SAM alignment in IGV. IGV recommends BAM over SAM, but I'm getting errors in converting to BAM using samtools.

    The command I'm using is:
    Code:
    samtools view -bhS Ua_1.0_pn.sam > Ua_1.0_pn.bam
    Though I get a resulting bam file, I also get this output in my terminal:
    [samopen] SAM header is present: 200079 sequences.
    [sam_read1] reference '323' is recognized as '*'.
    Parse error at line 60897629: sequence and quality are inconsistent
    Abort trap
    When I ignored the error and tried to sort the bam file with the command:
    Code:
    samtools sort Ua_1.0_pn.bam Ua_1.0_pn_sort.bam
    I got the error:
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_sort_core] truncated file. Continue anyway.
    Bus error
    Here are the first few lines of my SAM file:
    @HD VN:1.0 SO:unsorted
    @SQ SN:k48:100000u LN:120
    @SQ SN:k48:100008 LN:199
    @SQ SN:k48:100019u LN:95
    @SQ SN:k48:100020 LN:195

    Can anyone help? I haven't been able to find relevant help on seqanswers or anywhere else.

  • #2
    Originally posted by amango View Post
    Though I get a resulting bam file, I also get this output in my terminal:
    Code:
    [samopen] SAM header is present: 200079 sequences.
    [sam_read1] reference '323' is recognized as '*'.
    Parse error at line 60897629: sequence and quality are inconsistent
    Abort trap
    I think samtools only likes the start of your SAM file, and something goes wonky at that line quoted. My guess is a data corruption of some sort at that point, throwing the tab separated SAM fields out of sync.

    You should check that region of the SAM file to confirm this hunch. Can you show us that chunk of the data (say 5 lines either side)?

    Or just regenerate the SAM file and hope it works second time round

    Comment


    • #3
      The first error mentioned a problem on line 60897629. Have you looked at that line to see if something is amiss in the SAM file? I expect that when it's aborting early that it's not finishing the file, thereby causing the second error you saw.

      Comment


      • #4
        Line 60897629 is indeed different. It's the middle one below, starting with "165...". I looked a hundred lines before this section, and it seems to be the first that appears this way, the some other similar lines appear further down the file.

        I read about the SAM format specification (for the first time--I'm 1 month into the world of bioinformatics) and from what I can tell, this line is missing a QNAME (query name; col 1), has a map quality of 0 (col 5), has no sequence (col 10) or quality (col 11), and has two extra columns, with values: YT:Z:UP & YF:Z:LN.

        Does this make sense to anyone? Any advice on how I can get past this?

        Comment


        • #5
          Easy answer...just delete the offending lines. Maybe your bam got garbled somewhere along the line, or maybe the fatsq was garbled at that point. It's just one read out of millions.
          Last edited by swbarnes2; 03-02-2012, 11:26 AM.

          Comment


          • #6
            You might want to look back at the original fastq file and try to determine which read that's supposed to be. It's possible that the fastq file is screwy (maybe it's missing that read?). The YT and YF values are optional fields.

            A fix purely for the creating a BAM file issue is to delete the screwed up line. Just eye-balling things, it looks like you have paired-end data, so make sure you get both pairs.

            If the fastq file seems ok, then you might try to track down the bug and submit a bug report to the bowtie2 developers.

            Comment


            • #7
              I had this happen to me recently with bowtie2. I don't know if this is still an issue for you, but I found that if I regenerated the bowtie2 library, reran bowtie2 and then converted to bam and sorted etc, I no longer had a problem. I don't know if there is some weird little occasional glitch with the program or it is was some other factor (I used the same commands each time). We had a substantial lightning storm here the night of the original runs that had the error you mentioned. Whether that affected the run I have no idea. Although other differences from this run versus the previous run that might be just as logical explanations are that I am wearing shorts today rather than jeans and this morning I had toast with jam.

              Comment


              • #8
                Hey,
                did any of you get IGV to open/sort/index a BAM file generated with samtools from a SAM file generated with bowtie2? igvtools refuses to take my file...

                Comment


                • #9
                  Yes, if you're getting an error message on the console, then go ahead and post it. I've only ever used IGV to open BAM files. I use samtools for sorting and indexing.

                  Comment


                  • #10
                    Originally posted by dpryan View Post
                    I use samtools for sorting and indexing.
                    I tried that after I posted my question and it worked. I guess I won't use igvtools from now on, samtools is faster anyway.

                    Comment


                    • #11
                      Originally posted by Lilith-Elina View Post
                      I tried that after I posted my question and it worked. I guess I won't use igvtools from now on, samtools is faster anyway.
                      Yeah, it's quite useful for other formats, but samtools and picard are the de facto standards for all things SAM/BAM related.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Exploring the Dynamics of the Tumor Microenvironment
                        by seqadmin




                        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                        07-08-2024, 03:19 PM
                      • seqadmin
                        Exploring Human Diversity Through Large-Scale Omics
                        by seqadmin


                        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                        06-25-2024, 06:43 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 07-10-2024, 07:30 AM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-03-2024, 09:45 AM
                      0 responses
                      201 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-03-2024, 08:54 AM
                      0 responses
                      211 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-02-2024, 03:00 PM
                      0 responses
                      193 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X