Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat - accepted_hits.sam file is empty?

    Hi all.

    I've recently installed TopHat and the test files ran without problems, thus I assume the installation went OK.
    Now applying my own data things seems not to go so smoothly. I ran a subset (1000000 sequences) of my paired-end Illumina GA2 reads to test my data. I don't get any junction (which I also wouldn't expect with only 1000000 reads on a mammalian genome) but it surprised me that the accepted_hits.sam file is empty. If I understand correctly this file should contain the position and sequence of the aligned reads to the genome? Since I thought that the problem could be caused by a wrong fastq format I also aligned my subset with bowtie against my reference genome. This seems to go OK. The reason for my suspicion is that Tophat indicate a seed length of 52bp but my sequences are 51bp.
    Thus, does anyone have any idea what is going wrong and is it somehow possible to control the seed length in tophat (as in bowtie with the -l option).

    Regards, Ole

    Some information:

    example of my fastq format:
    @HWI-EA332:5:13596#0/2
    GCTGATCCGGGACTGCCGGCCTGTGAGGCTGCCCACCTGCGCGGCGGGGGC
    +HWI-EA332:5:13596#0/2
    `aa__]ZHZ_]\]V[]NXX_[FJFSJTY]R\\]VWHZFQ][JOWMZ\[_BB

    The tophat screen:
    [Wed Sep 30 09:29:55 2009] Preparing output location ./tophat_out/
    [Wed Sep 30 09:29:55 2009] Checking for Bowtie index files
    [Wed Sep 30 09:29:55 2009] Checking for reference FASTA file
    [Wed Sep 30 09:29:55 2009] Checking for Bowtie
    Bowtie version: 0.10.1.0
    [Wed Sep 30 09:29:55 2009] Checking reads
    seed length: 52bp
    format: fastq
    quality scale: --solexa1.3-quals
    [Wed Sep 30 09:30:20 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:34:15 2009] Joining segment hits
    Splitting reads into 2 segments
    [Wed Sep 30 09:34:23 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:39:36 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:44:53 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:48:42 2009] Joining segment hits
    Splitting reads into 2 segments
    [Wed Sep 30 09:48:49 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:54:02 2009] Mapping reads against RefGenome with Bowtie
    [Wed Sep 30 09:59:22 2009] Searching for junctions via segment mapping
    Warning: junction database is empty!
    [Wed Sep 30 10:01:08 2009] Joining segment hits
    [Wed Sep 30 10:01:08 2009] Joining segment hits
    [Wed Sep 30 10:01:08 2009] Reporting output tracks
    -----------------------------------------------
    Run complete [00:31:12 elapsed]

    My command:
    ./tophat --solexa1.3-quals RefGenome part10_1.ma.fq part10_2.ma.fq

  • #2
    My fastq format seems to have changed during upload (the GGGG C at the end),
    thus here it is again:

    @HWI-EA332:5:13596#0/2
    GCTGATCCGGGACTGCCGGCCTGTGAGGCTGCCCACCTGCGCGGCGGGGGC
    +HWI-EA332:5:13596#0/2
    `aa__]ZHZ_]\]V[]NXX_[FJFSJTY]R\\]VWHZFQ][JOWMZ\[_BB

    Comment


    • #3
      Hmmm, didn't help. when opening my data with any texteditor etc. I don't see the GGGG C thus I presume this is not the problem?

      Comment


      • #4
        Hi, Can you verify that your Bowtie index's record names contain no spaces, by typing bowtie-inspect --names <your_index>

        There is a known interoperability bug between TopHat and Bowtie (which is fixed in the upcoming Bowtie 0.10.2) which results in behavior like this when the index has spaces in the names.

        If your index has simple names, and you are still seeing this, can you email me your logs from the run?

        Comment


        • #5
          Dear Cole

          Seem like the indexes have spaces. I'll send the log files.

          ole

          1 dna:chromosome chromosome:Sscrofa9:1:1:295534705:1
          2 dna:chromosome chromosome:Sscrofa9:2:1:140138492:1
          3 dna:chromosome chromosome:Sscrofa9:3:1:123604780:1
          4 dna:chromosome chromosome:Sscrofa9:4:1:136259946:1
          5 dna:chromosome chromosome:Sscrofa9:5:1:100521970:1
          6 dna:chromosome chromosome:Sscrofa9:6:1:123310171:1
          7 dna:chromosome chromosome:Sscrofa9:7:1:136414062:1
          8 dna:chromosome chromosome:Sscrofa9:8:1:119990671:1
          9 dna:chromosome chromosome:Sscrofa9:9:1:132473591:1
          10 dna:chromosome chromosome:Sscrofa9:10:1:66741929:1
          11 dna:chromosome chromosome:Sscrofa9:11:1:79819395:1
          12 dna:chromosome chromosome:Sscrofa9:12:1:57436344:1
          13 dna:chromosome chromosome:Sscrofa9:13:1:145240301:1
          14 dna:chromosome chromosome:Sscrofa9:14:1:148515138:1
          15 dna:chromosome chromosome:Sscrofa9:15:1:134546103:1
          16 dna:chromosome chromosome:Sscrofa9:16:1:77440658:1
          17 dna:chromosome chromosome:Sscrofa9:17:1:64400339:1
          18 dna:chromosome chromosome:Sscrofa9:18:1:54314914:1
          X dna:chromosome chromosome:Sscrofa9:X:1:125876292:1

          Comment


          • #6
            I have run tophat on a set of 454 runs of mouse transcripts. Oddly it produced no junctions. Somebody else here installed tophat a couple weeks ago and got the exact same result but from a completely different data set (solexa data from a different lab) but also against mouse. I can't find any support on this problem, can anybody please help, we really need this to work! Thank you, Greg (Univ of Pennsylvania)

            [Mon Oct 12 15:15:08 2009] Preparing output location ./tophat_out/
            [Mon Oct 12 15:15:08 2009] Checking for Bowtie index files
            [Mon Oct 12 15:15:08 2009] Checking for reference FASTA file
            Warning: Could not find FASTA file /Applications/bowtie-0.10.0/indexes/m_musculus.fa
            [Mon Oct 12 15:15:08 2009] Reconstituting reference FASTA file from Bowtie index
            [Mon Oct 12 15:32:45 2009] Checking for Bowtie
            Bowtie version: 0.10.0.0
            [Mon Oct 12 15:32:45 2009] Checking reads
            Warning: found a read < 20bp in 4.TCA.454Reads.fna
            Warning: found a read < 20bp in 4.TCA.454Reads.fna
            seed length: 20bp
            format: fasta
            [Mon Oct 12 15:32:46 2009] Mapping reads against m_musculus with Bowtie
            [Mon Oct 12 15:33:25 2009] Joining segment hits
            [Mon Oct 12 15:33:25 2009] Searching for junctions via segment mapping
            Warning: junction database is empty!
            [Mon Oct 12 15:36:12 2009] Joining segment hits

            Comment


            • #7
              Two things that may be causing problems:

              1) Did you check that the Bowtie index records have no spaces in the names? If your index has spaces in the names, you should upgrade to Bowtie version 0.11.x, as we recently resolved an interoperability bug that can trigger this.

              2) Are the sequences for your 454 reads all on single line, or does a read span more than one line? The current version of TopHat has a bug in handling FASTA or FASTQ files where the sequence record for a given read spans more than one line.

              If neither of these is the case for you, please email me the logs from the run. I'll need more information to see what's wrong.

              Comment


              • #8
                Originally posted by Cole Trapnell View Post
                Two things that may be causing problems:

                1) Did you check that the Bowtie index records have no spaces in the names? If your index has spaces in the names, you should upgrade to Bowtie version 0.11.x, as we recently resolved an interoperability bug that can trigger this.

                2) Are the sequences for your 454 reads all on single line, or does a read span more than one line? The current version of TopHat has a bug in handling FASTA or FASTQ files where the sequence record for a given read spans more than one line.

                If neither of these is the case for you, please email me the logs from the run. I'll need more information to see what's wrong.
                Thank you very much for your help! I downloaded the index from the tophat site so I assume it is correct, and I installed tophat and bowtie just today so I assume I'm up to date on versions. There are no spaces. But indeed my fasta file has multiple line records. I'm going to fix that and try again and I'll let you know. Thanks again!!!

                Comment


                • #9
                  Hi All/Cole
                  Just to update on my question. It was indeed the space in the ref genome names which caused the problems. Now everything is running without any problem. Thanks to Cole for his help.

                  Ole

                  Comment


                  • #10
                    I fixed the multiple line thing and unfortunately it did the same thing again. Here are my files. This has my input file, the command I used (in note.txt) and the entire directory tophat_out. I installed it today with the latest 64 bit versions on a power mac g6 desktop. Thank you for any help you can provide!

                    Comment


                    • #11
                      The index linked from the TopHat site unfortunately IS affected by the interoperability bug I mentioned above - I never had a chance to rebuild them with simpler names. I checked the logs in these files, and you appear to have Bowtie 0.10.0 installed, which will trigger the bug. Please upgrade to Bowtie 0.11.2 and give this another shot. Sorry for the inconvenience.

                      Comment


                      • #12
                        Originally posted by Cole Trapnell View Post
                        The index linked from the TopHat site unfortunately IS affected by the interoperability bug I mentioned above - I never had a chance to rebuild them with simpler names. I checked the logs in these files, and you appear to have Bowtie 0.10.0 installed, which will trigger the bug. Please upgrade to Bowtie 0.11.2 and give this another shot. Sorry for the inconvenience.
                        Thanks again for your help! I downloaded this file:

                        bowtie-0.11.2-bin-macos-10.5-x86_64.zip

                        But now when I run this version of bowtie it throws this error:

                        > bowtie
                        dyld: unknown required load command 0x80000022
                        Trace/BPT trap

                        Sorry I'm having so much trouble but I hope I've almost got it, thanks again for your help!

                        Comment


                        • #13
                          Originally posted by Cole Trapnell View Post
                          The index linked from the TopHat site unfortunately IS affected by the interoperability bug I mentioned above - I never had a chance to rebuild them with simpler names. I checked the logs in these files, and you appear to have Bowtie 0.10.0 installed, which will trigger the bug. Please upgrade to Bowtie 0.11.2 and give this another shot. Sorry for the inconvenience.
                          I tried 11.3 and got the same error, only 10.0 seems to run.... what am I doing wrong?

                          > dyld: unknown required load command 0x80000022
                          >Trace/BPT trap

                          Comment


                          • #14
                            Hmm - that's a new one. What version of OS X are you running this on?

                            Comment


                            • #15
                              I was getting the same message with bowtie-0.11.2-bin-macos-10.5-x86_64.zip. Working from source and setting BITS=64 seems to be fine. Mac OS X 10.5.8.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X