Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jameslz
    Member
    • Nov 2009
    • 20

    tophat 1.3.0 sam output quality string problem

    When I mapped the RNA-seq data to reference genome using tophat 1.3.0 ,I met a problem in the sam output. but the version 1.2.0 did not have such situation.
    Here is the result:
    1. The tophat 1.3.0 result:
    HWI-ST_0101:7:8:14864:67306#0 99 scaffold_8 3915632 255 95M = 3915887 420 GGACCGGTAGAAATTTTCCAATGAGAGATCATGTGAAGATTGAAAAGAAGAGTCCATGACAAATTTACATTGGCTGCTGCAATAGCTGAGGAGCG HHHHHHHFHHHHBHHHHHHFHHHHHFHFFHFGGEGFEGEGGGCFECHHHHHHEHHHFHEEBBEFG@AFFFDFFFFF54;.9;4*.:>>8@B#### NM:i:1 NH:i:1
    HWI-ST_0101:7:8:14864:67306#0 147 scaffold_8 3915887 255 68M70N27M = 3915632 420 TTTCCAAGTCATCCTCGTTGCCAATCGGTGCTTGACCGTCTTGCTGGGCCTCATGGATGCGACGATGTTGTGCCAGGTTGTCTGATCGAGAAAAG * NM:i:0 XS:A:- NH:i:1
    2. The tophat 1.2.0 result:
    HWI-ST_0101:7:8:14864:67306#0 99 scaffold_8 3915632 255 95M = 3915887 0 GGACCGGTAGAAATTTTCCAATGAGAGATCATGTGAAGATTGAAAAGAAGAGTCCATGACAAATTTACATTGGCTGCTGCAATAGCTGAGGAGCG HHHHHHHFHHHHBHHHHHHFHHHHHFHFFHFGGEGFEGEGGGCFECHHHHHHEHHHFHEEBBEFG@AFFFDFFFFF54;.9;4*.:>>8@B#### NM:i:1 NH:i:1
    HWI-ST_0101:7:8:14864:67306#0 147 scaffold_8 3915887 255 68M70N27M = 3915632 0 TTTCCAAGTCATCCTCGTTGCCAATCGGTGCTTGACCGTCTTGCTGGGCCTCATGGATGCGACGATGTTGTGCCAGGTTGTCTGATCGAGAAAAG *?EFEGGGEGD>GGEGGDFHHHEHEHHHHFHHHHFGGFFHDFHHGGHHHHHHHHHHHHGHHFHHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHHF NM:i:0 XS:A:- NH:i:1

    whe I use htseq-count , It gives error report.
    python -m HTSeq.scripts.count accepted_hits.unique.sam ../../../pde.release.v3.gff
    39609 GFF lines processed.
    Error occured in line 876 of file accepted_hits.unique.sam.
    Error: ("'seq' and 'qualstr' do not have the same length.", 'line 876 of file accepted_hits.unique.sam')
    [Exception type: ValueError, raised in _HTSeq.pyx:765]

    is it a bug?
  • jameslz
    Member
    • Nov 2009
    • 20

    #2
    A more question:
    Tophat can't handle reads with different size of length ?

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      The SAM files look OK to me. Notice that in tophat 1.3.0 one of the quality strings is a single * which means data not available, while with tophat 1.2.0 you do get a full quality string which happens to start with a * character (which is valid).

      So, that could be a bug in tophat 1.3.0 (using * for missing qualities when it probably does know them), and a separate bug in htseq-count failing to accept * for missing qualities.

      What version of htseq-count are you using? I had a quick look at HTSeq-0.5.1p2.tar.gz file src/HTSeq/_HTSeq.pyx and there is no obvious sign that they cope with this situation (but I didn't fully explore their code).

      P.S. I've emailed Simon Anders about this possible HTSeq issue.
      Last edited by maubp; 06-16-2011, 02:05 AM.

      Comment

      • jameslz
        Member
        • Nov 2009
        • 20

        #4
        Originally posted by maubp View Post
        The SAM files look OK to me. Notice that in tophat 1.3.0 one of the quality strings is a single * which means data not available, while with tophat 1.2.0 you do get a full quality string which happens to start with a * character (which is valid).

        So, that could be a bug in tophat 1.3.0 (using * for missing qualities when it probably does know them), and a separate bug in htseq-count failing to accept * for missing qualities.

        What version of htseq-count are you using? I had a quick look at HTSeq-0.5.1p2.tar.gz file src/HTSeq/_HTSeq.pyx and there is no obvious sign that they cope with this situation (but I didn't fully explore their code).

        P.S. I've emailed Simon Anders about this possible HTSeq issue.
        Thanks for your answer.
        I use the latest version HTSeq-0.5.1p2.

        Comment

        • jameslz
          Member
          • Nov 2009
          • 20

          #5
          Originally posted by jameslz View Post
          A more question:
          Tophat can't handle reads with different size of length ?
          if I trim the low quality base from the 3' end , I can map 80% paired reads to the reference, if not, just 70% paired reads can be mapped to genome.

          can anyone help me?

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          13 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          24 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          28 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Working...