Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat complains

    I have spent the whole day now trying to get past this error: I have tried everything. UCSC GTF file, NCBI GTF file (converted it to GFF) but I keep getting this error. Can someone please help me? I have made sure the chr names are the same as in my index.

    [Fri Apr 23 17:10:16 2010] Retrieving sequences for splices
    [Fri Apr 23 17:10:16 2010] Indexing splices
    Warning: Empty input file
    Error: No unambiguous stretches of characters in the input. Aborting...
    Command: bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs
    [FAILED]
    Error: Splice sequence indexing failed with err = 1

    example GFF I used:
    chr1 mm9_knownGene gene 4481009 4486494 0.000000 - . ID=uc007afc.1;Name=
    chr1 mm9_knownGene mRNA 4481009 4486494 0.000000 - . ID=uc007afc.1;Name=;Parent=uc007afc.1
    chr1 mm9_knownGene exon 4481009 4482749 0.000000 - . ID=uc007afc.1.;Name=;Parent=uc007afc.1
    ch

  • #2
    I don't know if this is the problem, but all of your ID and parent identifiers are the same. IDs must be unique and an entry can't be its own parent.

    Comment


    • #3
      mouse GFF file for tophat

      Thanks for taking the time to reply What do you mean by "an entry can't be its own parent"?

      I have tried different GFF but get the same errors. For example:

      chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG00000024231
      chr18 protein_coding exon 3383239 3383395 . + . ID=ENSMUST00000115872.1;Name=Cul2;Parent=ENSMUST00000115872
      chr18 protein_coding exon 3399844 3399984 . + . ID=ENSMUST00000115872.2;Name=Cul2;Parent=ENSMUST00000115872
      chr18 protein_coding exon 3405683 3405785 . + . ID=ENSMUST00000115872.3;Name=Cul2;Parent=ENSMUST00000115872
      chr18 protein_coding exon 3414128 3414222 . + . ID=ENSMUST00000115872.4;Name=Cul2;Parent=ENSMUST00000115872
      chr18 protein_coding exon 3417533 3417638 . + . ID=ENSMUST00000115872.5;Name=Cul2;Parent=ENSMUST00000115872
      chr18 protein_coding exon 3418562 3418644 . + . ID=ENSMUST00000115872.6;Name=Cul2;Parent=ENSMUST00000115872
      c

      Do you or anyone else have a mouse GFF that works with tophat-I will really appreaciate your help? I noted that Command is probably because segment_juncs.fa is empty. I don't know where/how tophat gets these sequences.
      bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs [FAILED]
      Last edited by thinkRNA; 04-24-2010, 09:10 AM.

      Comment


      • #4
        Originally posted by thinkRNA View Post
        Thanks for taking the time to reply What do you mean by "an entry can't be its own parent"?
        In your first example you had on a single line "ID=uc007afc.1" and "Parent=uc007afc.1". You were saying that the parent if this feature has the same ID as this feature; in other words, it is its own parent. This is not allowed. You also have all three features in your example (the gene, mRNA and exon) identified with the same ID. This is an improperly formed GFF file which I thought might be the cause of your problem.

        Originally posted by thinkRNA View Post
        I have tried different GFF but get the same errors. For example:

        chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG00000024231
        chr18 protein_coding exon 3383239 3383395 . + . ID=ENSMUST00000115872.1;Name=Cul2;Parent=ENSMUST00000115872
        chr18 protein_coding exon 3399844 3399984 . + . ID=ENSMUST00000115872.2;Name=Cul2;Parent=ENSMUST00000115872
        chr18 protein_coding exon 3405683 3405785 . + . ID=ENSMUST00000115872.3;Name=Cul2;Parent=ENSMUST00000115872
        chr18 protein_coding exon 3414128 3414222 . + . ID=ENSMUST00000115872.4;Name=Cul2;Parent=ENSMUST00000115872
        chr18 protein_coding exon 3417533 3417638 . + . ID=ENSMUST00000115872.5;Name=Cul2;Parent=ENSMUST00000115872
        chr18 protein_coding exon 3418562 3418644 . + . ID=ENSMUST00000115872.6;Name=Cul2;Parent=ENSMUST00000115872
        c

        Do you or anyone else have a mouse GFF that works with tophat-I will really appreaciate your help? I noted that Command is probably because segment_juncs.fa is empty. I don't know where/how tophat gets these sequences.
        bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs [FAILED]
        This GFF is properly formed. Each feature has a unique ID, and the exons properly identify the mRNA as their parent. Alas, I now have no explanation as to what is causing your problems with Tophat.

        Comment


        • #5
          I found the problem. tophat creates a .fa file from the indexes. Even though I had this file in the directory, mm9.fa, for some reason it was empty. If the file was not present, bowtie-build creates it. I started from scratch in a new directory with new links to the indexes, and it worked!

          Comment


          • #6
            Hi, how did you solve this problem?

            I get the same error:

            tophat -G GFF3/data/gff3/combined.gff --no-novel-juncs indexes/genomic reads/11.3.10/R43s_4_sequence.fastq

            [Sun Jul 4 16:24:40 2010] Beginning TopHat run (v1.0.11)
            -----------------------------------------------
            [Sun Jul 4 16:24:40 2010] Preparing output location ./tophat_out/
            [Sun Jul 4 16:24:40 2010] Checking for Bowtie index files
            [Sun Jul 4 16:24:40 2010] Checking for reference FASTA file
            [Sun Jul 4 16:24:40 2010] Checking for Bowtie
            Bowtie version: 0.12.3.0
            [Sun Jul 4 16:24:40 2010] Checking reads
            seed length: 36bp
            format: fastq
            quality scale: --phred33-quals
            [Sun Jul 4 16:26:29 2010] Reading known junctions from GFF file
            [Sun Jul 4 16:27:31 2010] Mapping reads against DictyAx4_genomic with Bowtie
            [Sun Jul 4 16:46:16 2010] Joining segment hits
            [Sun Jul 4 16:48:28 2010] Retrieving sequences for splices
            [Sun Jul 4 16:48:32 2010] Indexing splices
            Warning: Empty input file
            Error: No unambiguous stretches of characters in the input. Aborting...
            Command: bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs
            [FAILED]
            Error: Splice sequence indexing failed with err = 1


            It works if I just use -G option but I've also added the --no-novel-juncs option.

            Any ideas?

            (P.s This isn't on mouse genome. And the GFF it one I obtained from the database of my species not gtf conversted to GFF3)
            Last edited by James; 07-05-2010, 08:06 AM. Reason: Add more info

            Comment


            • #7
              Dear ThinkRNA (or anyone else),

              When creating your Ensembl-based GFF/GTF file how did you get the gene trivial name inserted (marked in red below)? I have tried to do this using the UCSC table browser to generate an Ensembl-based GTF file but I just end up with the Ensemble I.D. even though the tivial name is in the underlying data table as the "name2" field.

              chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG0000 0024231

              Thanks.

              Comment


              • #8
                Hi ThinkRNA,

                I have the same problem as you did. And I tried running it from different directories from scrath and I get the same error.


                [Tue Sep 20 15:23:30 2011] Beginning TopHat run (v1.2.0)
                -----------------------------------------------
                [Tue Sep 20 15:23:30 2011] Preparing output location tophat_BC4_gtf_mix1/
                [Tue Sep 20 15:23:31 2011] Checking for Bowtie index files
                [Tue Sep 20 15:23:31 2011] Checking for reference FASTA file
                [Tue Sep 20 15:23:31 2011] Checking for Bowtie
                Bowtie version: 0.12.7.0
                [Tue Sep 20 15:23:31 2011] Checking for Samtools
                Samtools Version: 0.1.8
                [Tue Sep 20 15:23:53 2011] Checking reads
                min read length: 95bp, max read length: 101bp
                format: fastq
                quality scale: solexa33 (reads generated with GA pipeline version < 1.3)
                [Tue Sep 20 15:26:25 2011] Reading known junctions from GTF file
                [Tue Sep 20 15:27:39 2011] Mapping reads against zv9 with Bowtie
                [Tue Sep 20 16:47:01 2011] Joining segment hits
                [Tue Sep 20 16:53:45 2011] Mapping reads against zv9 with Bowtie(1/4)
                [Tue Sep 20 17:40:17 2011] Mapping reads against zv9 with Bowtie(2/4)
                [Tue Sep 20 18:25:21 2011] Mapping reads against zv9 with Bowtie(3/4)
                [Tue Sep 20 19:12:59 2011] Mapping reads against zv9 with Bowtie(4/4)
                [Tue Sep 20 20:55:40 2011] Searching for junctions via segment mapping
                [Tue Sep 20 20:57:48 2011] Retrieving sequences for splices
                [Tue Sep 20 21:01:18 2011] Indexing splices
                Warning: Empty input file
                Error: No unambiguous stretches of characters in the input. Aborting...
                Command: bowtie-build tophat_BC4_gtf_mix1/tmp/segment_juncs.fa tophat_BC4_gtf_mix1/tmp/segment_juncs
                [FAILED]
                Error: Splice sequence indexing failed with err = 1


                Here is a sample of my gtf file

                chr1 danRer7_ensGene stop_codon 25135780 25135782 0.000000 - . gene_id "ENSDART00000112
                899"; transcript_id "ENSDART00000112899";
                chr1 danRer7_ensGene CDS 25135783 25135824 0.000000 - 0 gene_id "ENSDART00000112899"; tr
                anscript_id "ENSDART00000112899";
                chr1 danRer7_ensGene exon 25135780 25135824 0.000000 - . gene_id "ENSDART00000112899"; tr
                anscript_id "ENSDART00000112899";


                And here is my Tophat command:
                python /share/bin/tophat-1.2.0.Linux_x86_64/tophat -p 100 -g 5 -a 10 --solexa-quals -o tophat_BC1_gtf_mix1 -G /home/lakshmaa/scratch/Task54/ensembl_zv9_2.gtf /share/apps/Genomes/Zv9_Bowtie/zv9 /home/lakshmaa/scratch/Task54/barcode/READ_corrected_raw/BC1_mix1_all.fq


                Can anyone please help me solve this problem!

                Thanks,
                Abi

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X