Announcement

Collapse
No announcement yet.

tophat, not producing rpkm values with -G option

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat, not producing rpkm values with -G option

    Hi there,

    I'm running a medaka 75bp paired end dataset through tophat. I am currently using the ensembl genome and the ensembl genome annotation. However I get the following output when I run it.

    [Sun Oct 11 19:29:55 2009] Beginning TopHat run (v1.0.9)
    -----------------------------------------------
    [Sun Oct 11 19:29:55 2009] Preparing output location s6_genome/
    [Sun Oct 11 19:29:55 2009] Checking for Bowtie index files
    [Sun Oct 11 19:29:55 2009] Checking for reference FASTA file
    [Sun Oct 11 19:29:55 2009] Checking for Bowtie
    Bowtie version: 0.10.0.0
    [Sun Oct 11 19:29:55 2009] Checking reads
    seed length: 76bp
    format: fastq
    quality scale: --solexa-quals
    [Sun Oct 11 19:33:18 2009] Reading known junctions from GFF file
    Warning: TopHat did not find any junctions in GFF file
    Splitting reads into 3 segments
    [Sun Oct 11 19:38:24 2009] Mapping reads against medaka_ens_v2 with Bowtie
    [Sun Oct 11 19:57:30 2009] Mapping reads against medaka_ens_v2 with Bowtie
    [Sun Oct 11 20:16:46 2009] Mapping reads against medaka_ens_v2 with Bowtie
    Splitting reads into 3 segments
    [Sun Oct 11 20:38:29 2009] Mapping reads against medaka_ens_v2 with Bowtie
    [Sun Oct 11 20:57:22 2009] Mapping reads against medaka_ens_v2 with Bowtie
    [Sun Oct 11 21:16:32 2009] Mapping reads against medaka_ens_v2 with Bowtie
    [Sun Oct 11 21:35:27 2009] Searching for junctions via segment mapping
    Warning: junction database is empty!
    [Sun Oct 11 21:36:14 2009] Joining segment hits
    [Sun Oct 11 21:38:35 2009] Joining segment hits
    [Sun Oct 11 21:40:52 2009] Reporting output tracks
    -----------------------------------------------
    The output showed no slice junctions and no rpkm values for the genes. Initially I thought it was a previously reported error about the reference names having spaces so I removed spaces from the fasta header but to no avail. Any suggestions as to what is wrong ?

    Here is the command I used:
    tophat -o s6_genome -m 2 --solexa-quals -p 4 -G /data/tmp/warren/medaka_ensembl.gff3 -r 250 /data/tmp/warren/med_ens_gen_v2/medaka_ens_v2 /data/tmp/warren/new_data/s_6_1_sequence.txt /data/tmp/warren/new_data/s_6_2_sequence.txt

    Thanks!
    Last edited by warrenemmett; 10-11-2009, 11:59 PM. Reason: more info on the problem

  • #2
    Your command looks good to me, this is the log I got:

    [Thu Sep 17 15:47:31 2009] Beginning TopHat run (v1.0.10)
    -----------------------------------------------
    [Thu Sep 17 15:47:31 2009] Preparing output location ./tophat_out/
    [Thu Sep 17 15:47:31 2009] Checking for Bowtie index files
    [Thu Sep 17 15:47:31 2009] Checking for reference FASTA file
    Warning: Could not find FASTA file /home/jyli/Bowtie/indexes/h_sapiens_asm.fa
    [Thu Sep 17 15:47:31 2009] Reconstituting reference FASTA file from Bowtie index
    [Thu Sep 17 16:14:45 2009] Checking for Bowtie
    Bowtie version: 0.10.0.2
    [Thu Sep 17 16:14:45 2009] Checking reads
    seed length: 35bp
    format: fastq
    quality scale: --phred33-quals
    Splitting reads into 1 segments
    [Thu Sep 17 16:29:36 2009] Mapping reads against h_sapiens_asm with Bowtie
    Splitting reads into 1 segments
    [Thu Sep 17 18:05:05 2009] Mapping reads against h_sapiens_asm with Bowtie
    [Thu Sep 17 19:31:54 2009] Searching for junctions via segment mapping
    [Thu Sep 17 22:32:23 2009] Retrieving sequences for splices
    [Thu Sep 17 22:36:14 2009] Indexing splices
    [Thu Sep 17 22:56:52 2009] Mapping reads against segment_juncs with Bowtie
    [Thu Sep 17 23:43:51 2009] Joining segment hits
    [Fri Sep 18 00:15:03 2009] Mapping reads against segment_juncs with Bowtie
    [Fri Sep 18 00:59:43 2009] Joining segment hits
    [Fri Sep 18 01:09:10 2009] Reporting output tracks
    -----------------------------------------------
    Run complete [11:28:54 elapsed]

    And here are the export:

    total 9940384
    drwxr-xr-x 4 jyli lbg 4096 2009-10-02 07:59 .
    drwxr-xr-x 8 jyli lbg 4096 2009-10-13 10:09 ..
    -rw-r--r-- 1 jyli lbg 2769911379 2009-09-24 23:54 accepted_hits.sam
    -rw-r--r-- 1 jyli lbg 429956740 2009-09-24 22:55 coverage.wig
    -rw-r--r-- 1 jyli lbg 665887 2009-09-24 22:55 custom_exonDB.04152009.hg18.exon.gff.expr
    -rw-r--r-- 1 jyli lbg 34341903 2009-09-24 15:25 custom_exonDB.juncs
    -rw-r--r-- 1 jyli lbg 7631 2009-10-01 22:57 first_100_junctions.bed
    -rw-r--r-- 1 jyli lbg 347 2009-10-02 07:59 first_5_junctions.bed
    -rw-r--r-- 1 jyli lbg 8820910 2009-09-24 22:55 junctions.bed
    -rw-r--r-- 1 jyli lbg 1191009384 2009-09-24 15:26 left_kept_reads.fq
    -rw-r--r-- 1 jyli lbg 2300806300 2009-09-24 22:00 left_kept_reads.fq.candidate_hits.sam
    drwxr-xr-x 2 jyli lbg 4096 2009-09-24 22:44 logs
    -rw-r--r-- 1 jyli lbg 1185323667 2009-09-24 15:27 right_kept_reads.fq
    -rw-r--r-- 1 jyli lbg 2238071143 2009-09-24 22:44 right_kept_reads.fq.candidate_hits.sam
    drwxr-xr-x 2 jyli lbg 4096 2009-09-24 23:54 tmp

    And, I think the rpkm file is: custom_exonDB.04152009.hg18.exon.gff.expr with the first few lines as:

    A1BG|1 1.175336
    A1CF|29974 0.000000
    A2BP1|54715 0.000000
    A2LD1|87769 2.244221
    A2ML1|144568 4.897304
    A2M|2 130.948346
    A3GALT2|127550 0.000000
    A4GALT|53947 6.922208
    A4GNT|51146 0.341636
    AAA1|404744 0.043603
    ....

    Good luck

    Comment


    • #3
      The warning indicates that the GFF file you are using has different chromosome names than that of the bowtie index. Can you verify that the GFF file's chromosome fields match what you see when you do bowtie-inspect --names?

      Comment


      • #4
        I added -G but still get no RPKM. Any suggestions please? this is my command:

        tophat -r 50 --mate-std-dev 100 -a 5 -i 50 -I 1000000 -p 4 --solexa1.3-qual --coverage-search --fill-gaps --microexon-search -G pc_gene.gff3 hg19 s_1_raw_1 s_1_raw_2

        I got the sam result right though

        Comment


        • #5
          when i was initially getting Tophat to run a few weeks ago i had a hard time getting the GFF file to work. to make my GFF file i used the knownGene table from the UCSC site and had it produce a GTF file. I found a conversion script that changed it to a GFF3 format file. on top of that I had to do a text-replacement for any occurrence of "transcript" and replaced it with "mRNA". at first this didn't work. the bowtie index i was using turned out to be the real issue. it worked fine without the gff3 file but when i included it i'd get that same "junctions database is empty" error. I was using a bowtie index that was pre-compiled and linked from the bowtie site. to resolve the issue i built a new bowtie index myself using FASTA files sorted by chromosome downloaded from the UCSC site. since my gff3 file came from there i figured maybe my bowtie index should come from there as well. sure enough that fixed it.

          hope that helps.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            I just noticed in trying to resolve a similar issue that bowtie-index is using the entire defline of the genome FASTA files as the sequence name, rather than only the portion of the defline up to the first whitespace. I suspect this will trip a lot of people up. One solution is to strip out the trailing comments from your deflines before indexing, but it seems this might be better addressed by modifying bowtie-index to use only the first whitespace delimited field for sequence names.

            Comment


            • #7
              Originally posted by vaughn View Post
              I just noticed in trying to resolve a similar issue that bowtie-index is using the entire defline of the genome FASTA files as the sequence name, rather than only the portion of the defline up to the first whitespace. I suspect this will trip a lot of people up. One solution is to strip out the trailing comments from your deflines before indexing, but it seems this might be better addressed by modifying bowtie-index to use only the first whitespace delimited field for sequence names.
              In the most recent version of Bowtie, text in the defline after the first whitespace is not printed in the "reference" column of the output.

              Comment

              Working...
              X