Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • unmapped reads in Bowtie causing problems in SAMtools?

    Hi, I am having some trouble running some SOLiD data through bowtie. I've converted from csfasta/qual files to fastq using solid2fastq (v0.6.3c) and get this:

    @424_1953_1910
    T23311000033011331110003320301300033203123032220022
    +
    <99>=9=::=7<8;:5,,4/<<77@37-52=5.):.7$91450)=4:%&:


    Now I know the first base is the adaptor, but I assume the difference in length of the sequence and qual data is to be expected.

    Then I run through bowtie (v0.12.2) using just the -S -C options. But when I run 'samtools import' on the resulting SAM file, I get:

    Parse error at line 96: sequence and quality are inconsistent

    It's only the unmapped reads that cause this problem, the mapped ones are ok. Here's an example:

    424_1953_1812 4 * 0 0 * * 0 0 TAGGACAAGAGCATACTCTGCTAGCAAAATCTAGATGCCAGATCTGGAG 948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+^@ XM:i:0

    The '^@' is present in the unmapped reads but not the mapped ones.

    So, (a) is this a bug in bowtie or samtools? and (b) is there a way to suppress the unmapped reads in the bowtie SAM output, which would work around this problem.

    Thanks!

    Will

  • #2
    Originally posted by wimufi View Post
    Hi, I am having some trouble running some SOLiD data through bowtie. I've converted from csfasta/qual files to fastq using solid2fastq (v0.6.3c) and get this:

    @424_1953_1910
    T23311000033011331110003320301300033203123032220022
    +
    <99>=9=::=7<8;:5,,4/<<77@37-52=5.):.7$91450)=4:%&:


    Now I know the first base is the adaptor, but I assume the difference in length of the sequence and qual data is to be expected.

    Then I run through bowtie (v0.12.2) using just the -S -C options. But when I run 'samtools import' on the resulting SAM file, I get:

    Parse error at line 96: sequence and quality are inconsistent

    It's only the unmapped reads that cause this problem, the mapped ones are ok. Here's an example:

    424_1953_1812 4 * 0 0 * * 0 0 TAGGACAAGAGCATACTCTGCTAGCAAAATCTAGATGCCAGATCTGGAG 948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+^@ XM:i:0

    The '^@' is present in the unmapped reads but not the mapped ones.

    So, (a) is this a bug in bowtie or samtools? and (b) is there a way to suppress the unmapped reads in the bowtie SAM output, which would work around this problem.

    Thanks!

    Will
    I would use the conversion script available in bowtie (is there one?) rather than BFAST. The BFAST conversion script was designed for BFAST and I have not tested it with BWA/bowtie etc.

    It looks like bowtie keeps the adaptor sequence in the base space representation. This is incorrect since it is not part of the DNA fragment being sequenced. You should send a bug report to the bowtie authors.
    Last edited by nilshomer; 02-11-2010, 10:25 PM. Reason: spelling

    Comment


    • #3
      FYI, the aligned read in question looks like this in the fastq file:

      @424_1953_1812
      T03022010020210301313213021000031302032110203132202
      +
      =;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+

      and looks like this in the csfasta/qual file:

      >424_1953_1812_F3
      T03022010020210301313213021000031302032110203132202

      >424_1953_1812_F3
      28 26 24 19 23 26 27 27 25 19 25 29 25 27 27 26 29 23 25 26 25 28 20 25 29 27 16 26 26 27 24 20 15 23 24 25 17 17 14 23 25 18 21 26 17 16 24 23 26 10

      So I agree, the adaptor sequence looks like it is retained in the SAM file. There's no fastq conversion tool with Bowtie as far as I know. I know there were some bugs about trimming in the 0.12.0 and 0.12.1 versions so maybe some remain. Will report it, thanks.

      Will

      Comment


      • #4
        Hi Will,

        Nils is right that the problem is coming from mixing BFAST's tools with Bowtie's. The " 2" at the end of the color sequence seems to be BFAST-specific, and Bowtie doesn't know what to do with it. Please use e.g. Galaxy to convert your reads, as is recommended in the manual.

        Re: "looks like bowtie keeps the adaptor sequence in the base space representation" - Bowtie trims the primer base automatically along with the first color. See manual for details. What are you seeing that makes you think otherwise?

        Thanks,
        Ben

        Comment


        • #5
          For some odd reason the SeqAnswers formatting is screwing up the stuff I've been posting. There's no space in this fastq sequence between the last 0 and 2 (despite what it looks like below...).

          @424_1953_1812
          T03022010020210301313213021000031302032110203132202
          +
          =;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+


          But I will try Galaxy, too, thanks.

          Will
          Last edited by wimufi; 02-11-2010, 10:08 AM.

          Comment


          • #6
            " [ code ] [ /code ] " tags are your friend when formatting is to be preserved.

            Code:
            [FONT=Courier New]@424_1953_1812
            T03022010020210301313213021000031302032110203132202
            +
            =;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+[/FONT]

            Comment


            • #7
              This is a bug in bowtie - it seems it trims the adaptor T and first color and likewise the first two quals which gives read and qual different length (adaptor T has no quality since it is not sequenced).

              You can get around this by removing unaligned reads:
              awk '$2 != 4 {print $0}' reads.sam > aligned_reads_only.sam

              It is nice that you can get unaligned reads in a new fastq (to align with BFAST...) but it would be good to have an option to report only aligned reads as well to save space.

              Comment


              • #8
                Originally posted by Chipper View Post
                It is nice that you can get unaligned reads in a new fastq (to align with BFAST...) but it would be good to have an option to report only aligned reads as well to save space.
                You might as well just align with BFAST from the start ->

                Comment


                • #9
                  I am having the original issue of converting a SAM to BAM file, as produced by BWA:
                  Code:
                  SAM header is present: 1 sequences.
                  Parse error at line 9829: sequence and quality are inconsistent
                  Aborted
                  (Note, the result of color2fasta uses ACGT to encode colors.)

                  The problem appears to be a bug in BWA occurring after an untypical CIGAR string is output, e.g. "2S3M2D10M2I26M". For such line lines the quality string was partially or completely missing.

                  To proceed, I simply removed the offending read lines, i.e. delete line 9829 (using head/tail/cat).
                  However, I wouldn't recommend this solution as you have to repeat the cycle for each re-attempt of 'samtools view ...'.

                  Comment


                  • #10
                    Parse error at line x, sequence and quality are inconsistent

                    Hi,

                    Error as follows:

                    [samopen] SAM header is present: 84 sequences.
                    Parse error at line 86: sequence and quality are inconsistent

                    There have been a few people coming across this error when trying to convert a SAM file to a BAM file, but from searching there doesn't seem to be a good solution yet.

                    I originally ran bwa aln on my SOLiD paired end reads with -q 0 & did not get this error.
                    But from the alignment we realised that we need to do read trimming, so then ran the bwa aln command using -q 20
                    Unfortunately this means I get the sequence and quality are inconsistent error & cannot progress.
                    My sam file is very large, 12.5 Gb gzipped so it's not feasible to just remove the offending line, & I have a feeling that there will be many more lines with this error.

                    Can someone help please

                    Thanks alig

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X