Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat - Error: gtf_to_fasta returned an error.

    Hi all,
    i am trying to use tophat with annotation file.
    i am working on zv9 annotations from UCSC.
    i fixed the original gtf file to match the first column in it to reference sequence in the bowtie index.
    for example:
    GTF - (2 first lines)
    #chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
    chr1 + 50321633 50410568 50322024 50393582 11 50321633,50323684,50327722,50376641,50384688,50384995,50387281,50388021,50392530,50393547,50409289, 50322231,50323751,50327850,50376774,50384782,50385109,50387444,50388129,50392579,50393588,50410568, 0 lef1 cmplcmpl 0,0,1,0,1,2,2,0,0,1,-1,

    REFERENCE - (2 first lines)
    >chr1
    TTCTTCTGGGGAAAGTCTGATTTGATTTATTTCCCTTTTAAGATCAATATTATTAGCCCC

    when i execute tophat without the GTF it all run well.
    now i am having this error:
    Error: gtf_to_fasta returned an error.

    My command:
    nohup ./tophat -r 430 -p 10 -z 0 -G ../annotation /mnt/FILE/index/zvgenome ../ex1/R1_001.fastq ../ex1/R2_001.fastq &

    Does anyone familiar with this?

    Best,
    Pap

  • #2
    this is all the output

    [Thu Feb 9 06:59:47 2012] Beginning TopHat run (v1.4.0)
    -----------------------------------------------
    [Thu Feb 9 06:59:47 2012] Preparing output location ./tophat_out/
    [Thu Feb 9 06:59:47 2012] Checking for Bowtie index files
    [Thu Feb 9 06:59:47 2012] Checking for reference FASTA file
    [Thu Feb 9 06:59:47 2012] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Thu Feb 9 06:59:47 2012] Checking for Samtools
    Samtools Version: 0.1.18
    [Thu Feb 9 06:59:47 2012] Generating SAM header for /mnt/FILE/index/zvgenome
    format: fastq
    quality scale: phred33 (default)
    [Thu Feb 9 06:59:51 2012] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
    [Thu Feb 9 06:59:51 2012] Preparing reads
    left reads: min. length=101, count=21379580
    right reads: min. length=101, count=21310206
    [Thu Feb 9 07:08:54 2012] Creating transcriptome data files..
    [FAILED]
    Error: gtf_to_fasta returned an error.

    Comment


    • #3
      I get the same error and I working with zv9(zebra Fish genome) . Can anyone please help me with this?

      Comment


      • #4
        I don't think your GTF file is in the right format.

        According to UCSC, GTF file contains 9 column:
        <seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes]



        Originally posted by papori View Post
        Hi all,
        i am trying to use tophat with annotation file.
        i am working on zv9 annotations from UCSC.
        i fixed the original gtf file to match the first column in it to reference sequence in the bowtie index.
        for example:
        GTF - (2 first lines)
        #chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
        chr1 + 50321633 50410568 50322024 50393582 11 50321633,50323684,50327722,50376641,50384688,50384995,50387281,50388021,50392530,50393547,50409289, 50322231,50323751,50327850,50376774,50384782,50385109,50387444,50388129,50392579,50393588,50410568, 0 lef1 cmplcmpl 0,0,1,0,1,2,2,0,0,1,-1,

        REFERENCE - (2 first lines)
        >chr1
        TTCTTCTGGGGAAAGTCTGATTTGATTTATTTCCCTTTTAAGATCAATATTATTAGCCCC

        when i execute tophat without the GTF it all run well.
        now i am having this error:
        Error: gtf_to_fasta returned an error.

        My command:
        nohup ./tophat -r 430 -p 10 -z 0 -G ../annotation /mnt/FILE/index/zvgenome ../ex1/R1_001.fastq ../ex1/R2_001.fastq &

        Does anyone familiar with this?

        Best,
        Pap

        Comment


        • #5
          Genome file of Entamoeba in GTF format

          Hi,
          I am working with entamoeba histolytica data. I need entamoeba histolytica reference genome data in GTF format. I got the file in genebank format but unable to find out in GTF format. If any one can provide me the appropriate link, I would be very grateful.

          Comment


          • #6
            Hi,

            I had the same problem, but think I have solved it now. I believe the error occurs because the fasta file name is different from the index files and/or gtf file. So if your index and gtf base is Danio_rerio. then your fasta file should be Danio_rerio.fa.

            Comment


            • #7
              hi, i also have that problem. here the chromosome name is same between index and gtf. the file name of index, fa, gtf is hg18_ref. anyone can help me?

              Comment


              • #8
                Tophat problem gtf to fasta

                Many have faced the same problem. Actually I just overcame the problem. Follow the steps and see if you can too.
                1. 1.Go on the following link and select the genome you want to download. In my case I downloaded the mm10 mouse genome UCSC. (http://cufflinks.cbcb.umd.edu/igenomes.html)
                2. 2. Unzip the file. You will see mm10/Annotation mm10/Sequence. These folders inside them have all the files required for the tophat run. Just make sure the paths while running the tophat command are directed to them.
                3. 3.Here is the code I used:
                  tophat -p 8 --keep-fasta-order --no-coverage-search --library-type fr-firststrand -G Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2014-05-23-16-05-10/Genes/genes.gtf --transcriptome-index Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes -g 10 --output-dir shP1_4hr_n1 Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome *.fastq.gz


                In the above case the archive has the UCSC genes.gtf file which already has the chr annotation to it and the gene names. Make sure you don't rename those files. Also then the output file to the transcriptome index has to be something like Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes , I don't know somehow that worked. Then the index files are in the Sequence/Bowtie2Index folder, you can also use the bowtie1 Index file. Last is the input.

                Hope this helps. If it doesn't let me know and I can help you further.

                Tulip.

                Comment


                • #9
                  Many have faced the same problem. Actually I just overcame the problem. Follow the steps and see if you can too.
                  1. 1.Go on the following link and select the genome you want to download. In my case I downloaded the mm10 mouse genome UCSC. (http://cufflinks.cbcb.umd.edu/igenomes.html)
                  2. 2. Unzip the file. You will see mm10/Annotation mm10/Sequence. These folders inside them have all the files required for the tophat run. Just make sure the paths while running the tophat command are directed to them.
                  3. 3.Here is the code I used:
                    tophat -p 8 --keep-fasta-order --no-coverage-search --library-type fr-firststrand -G Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2014-05-23-16-05-10/Genes/genes.gtf --transcriptome-index Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes -g 10 --output-dir shP1_4hr_n1 Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome *.fastq.gz


                  In the above case the archive has the UCSC genes.gtf file which already has the chr annotation to it and the gene names. Make sure you don't rename those files. Also then the output file to the transcriptome index has to be something like Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes , I don't know somehow that worked. Then the index files are in the Sequence/Bowtie2Index folder, you can also use the bowtie1 Index file. Last is the input.

                  Hope this helps. If it doesn't let me know and I can help you further.

                  Tulip.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin







                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has...
                    12-02-2024, 01:49 PM
                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-02-2024, 09:29 AM
                  0 responses
                  144 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-02-2024, 09:06 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-02-2024, 08:03 AM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-22-2024, 07:36 AM
                  0 responses
                  72 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X