Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

TopHat 1.1 failing on colorspace SE reads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat 1.1 failing on colorspace SE reads

    I'm trying to analyze a single end dataset from SRA with the brand new version of TopHat (1.1). TopHat crashes with the below error message, and looking at this & the code it appears that even with single ends it is trying to run a validation check on the 2nd set of reads


    Code:
      File "/usr/local/bin/tophat", line 2093, in main
        params.read_params = check_reads(params.read_params, left_reads_list + "," + right_reads_list)
    TypeError: cannot concatenate 'str' and 'NoneType' objects

    If I replace line 2093 with

    Code:
            if params.skip_check_reads == False:
                if right_reads_list !=None:
                    params.read_params = check_reads(params.read_params, left_reads_list + "," + right_reads_list)
                else:
                    params.read_params = check_reads(params.read_params, left_reads_list)
    Then I get farther but hit a new error

    Code:
    Length mismatch between sequence and quality strings for SRR040290.1 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1231_49_36 length=50 (51 vs 51).
    I'm too worn out to puzzle how to get past that one -- my best guess is this is related to the "extra" colorspace value which bowtie option "--col-keepends" deals with

  • #2
    Yep, I just got the same error message (the first one; haven't tried to modify the code). I'm also using single-end color space reads (.csfasta + .qual)

    Comment


    • #3
      Yes, Just to confirm that you are not the only one, I'm getting this error too, but on standard single end reads (not color-space).

      Comment


      • #4
        Google search the same error message leads me here. Same problem with single-end color space reads.

        Comment


        • #5
          same here.

          Comment


          • #6
            Hi guys,

            I'm Daehwan, who made this bug and fixed it, you can grab a fixed version at http://tophat.cbcb.umd.edu/index.html

            Thanks

            Comment


            • #7
              Thanks for the rapid fix!!

              Comment


              • #8
                great, i will try it right now! thx for the great support!

                Comment


                • #9
                  Great. Thanks for the speedy fix. It looks like its working fine so far (fingers crossed )

                  Comment


                  • #10
                    Hello. I can now successfully start and get passed the first error encountered here. However, I still run into the next error mentioned above:

                    Code:
                    Tue Oct  5 16:10:32 2010] Beginning TopHat run (v1.1.0)
                    -----------------------------------------------
                    [Tue Oct  5 16:10:32 2010] Preparing output location /home/schaefer/tophat/RBM20/Sample14//
                    [Tue Oct  5 16:10:32 2010] Checking for Bowtie index files
                    [Tue Oct  5 16:10:32 2010] Checking for reference FASTA file
                    [Tue Oct  5 16:10:32 2010] Checking for Bowtie
                    	Bowtie version:			 0.12.3.0
                    [Tue Oct  5 16:10:32 2010] Checking for Samtools
                    	Samtools version:		 0.1.8.0
                    [Tue Oct  5 16:10:39 2010] Checking reads
                    
                    Error encountered parsing file ...fastq:
                     Length mismatch between sequence and quality strings for 853_8_25/1 (49 vs 49).
                    When I check the fastq file, everything seems fine:

                    Code:
                    @853_8_25/1
                    GNNGTGNTNCANNCGTNNGAGNNCACNNACANCCGANNACGNAAAGNAN
                    +
                    *""%%%"%"%%""%%%""%)&""%%%""%%+"&'%&""'(%"'))'"&"
                    @853_8_35/1
                    CNNACGNANACNNACCNNCCGNNTAANNNNGNGAACNNCNANCNCNNTN
                    +
                    :""=54"@"=+""A98""745"";98""""2"[email protected]>8""<"4"<"6"";"
                    @853_8_75/1
                    GNNACCNCNTCNNAACNNTACNNCGANNGTGNGGACNNGTCNCGAGNCN
                    +
                    ="";<7"0";4""=;:"">;=""94<"".,5".;26""9%)"(%(("("
                    @853_8_96/1
                    ...
                    Is this error still occuring to someone else?

                    Comment


                    • #11
                      DerSeb, what's your command?

                      Comment


                      • #12
                        This is my command:

                        Code:
                        tophat -G /data/genetics/datasets/genome-annotation/ensembl-56/Rattus_norvegicus.RGSC3.4.56.gtf -o /home/schaefer/tophat/Sample14 -C rn4_c /home/schaefer/tophat/fastq/Sample_14_Qual.fastq

                        Comment


                        • #13
                          Since you are using -C, which is for colorspace read, you need to use colorspace reads instead of nucleotide reads.

                          Comment


                          • #14
                            I see... I converted my CS reads to fastq, using scripts supplied with MAQ (solid2fastq.pl or fq_all2std.pl csfa2std). They convert the cs values to letters mimicking a "pseudo" genetic sequence.

                            I will look into this and see how I can change this.

                            Thx!

                            Comment


                            • #15
                              I have now started a thread dedicated to reformatting SOLiD reads for TopHat:

                              http://seqanswers.com/forums/showthread.php?p=26692

                              Comment

                              Working...
                              X