Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gtf_to_fasta returned an error

    Hi all,
    Could you help me to solve the problem show as following please?

    Here is the error:
    $ tophat -p 8 -G PLncDB.gff -o SP2_131004_2thout --no-novel-juncs genome SP2_R1_PE100_clipped.fastq SP2_R2_PE100_clipped.fastq

    [2013-10-04 18:27:48] Beginning TopHat run (v2.0.9)
    -----------------------------------------------
    [2013-10-04 18:27:48] Checking for Bowtie
    Bowtie version: 2.1.0.0
    [2013-10-04 18:27:48] Checking for Samtools
    Samtools version: 0.1.18.0
    [2013-10-04 18:27:48] Checking for Bowtie index files (genome)..
    [2013-10-04 18:27:48] Checking for reference FASTA file
    [2013-10-04 18:27:48] Generating SAM header for genome
    format: fastq
    quality scale: phred33 (default)
    [2013-10-04 18:27:49] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
    [2013-10-04 18:27:49] Preparing reads
    left reads: min. length=12, max. length=100, 36930071 kept reads (3968 discarded)
    right reads: min. length=12, max. length=100, 36922334 kept reads (11705 discarded)
    Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
    [2013-10-04 18:45:24] Building transcriptome data files..
    [FAILED]
    Error: gtf_to_fasta returned an error.

    I used self-made gff, here is part of it:
    1 PlncDB LincRNA 2497 2816 . - . Parent=At1NC000020
    1 PlncDB LincRNA 11100 11372 . + . Parent=At1NC000060
    1 PlncDB LincRNA 43086 43295 . - . Parent=At1NC000160
    1 PlncDB LincRNA 51391 51733 . + . Parent=At1NC000200
    1 PlncDB LincRNA 90168 90401 . - . Parent=At1NC000340
    1 PlncDB LincRNA 91355 91685 . - . Parent=At1NC000350
    1 PlncDB LincRNA 107634 107877 . - . Parent=At1NC000410
    1 PlncDB LincRNA 135708 136008 . - . Parent=At1NC000530
    1 PlncDB LincRNA 140482 140691 . + . Parent=At1NC000560
    1 PlncDB LincRNA 207751 208011 . - . Parent=At1NC000780
    1 PlncDB LincRNA 217908 218112 . - . Parent=At1NC000810

    Here is Arabidopsis Genome
    $ head genome.fa
    >1
    CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCC
    TACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTTCTCTGGTTGAAAATCATTGT
    GTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTATTGTTGTGTGTAGATTTTTTAAAAATATCATTT
    GAGGTCAATACAAATCCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTG
    TTATATTGGATACAAGCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTG
    GTTTATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTTACTCCTTTG
    TGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTGTAGGGATGAAGTCTTTCTTCG
    TTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG

    How can I solve this problem?

    Thanks in advance!
    Last edited by SpreeFu; 10-04-2013, 09:42 AM.

  • #2
    BTW, tophat ran soomthly when I used TAIR10_GFF3_genes.gff.

    Comment


    • #3
      If you look in the run log, you'll see the exact gtf_to_fasta command that's run. Simply run that yourself to find out what the actual (usually more informative) error message is.

      Comment


      • #4
        Originally posted by dpryan View Post
        If you look in the run log, you'll see the exact gtf_to_fasta command that's run. Simply run that yourself to find out what the actual (usually more informative) error message is.
        Hi Ryan, thanks for your reply!

        I got the run log
        #>map_start:
        /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out

        Do you mean I have to change the parameters, because my gff file is not somehow standard?

        Comment


        • #5
          I can't tell you what, if anything, you need to change yet. You need to simply run
          Code:
          /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
          to see what the underlying error message is. Then you can figure out what's actually wrong.

          Comment


          • #6
            Originally posted by dpryan View Post
            I can't tell you what, if anything, you need to change yet. You need to simply run
            Code:
            /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
            to see what the underlying error message is. Then you can figure out what's actually wrong.
            It showed Error: no ID found for GFF record start
            So I have to change 'Parent=' to 'ID='?
            Do I need add the line '1 TAIR10 Chromosome 1 30427671 . . . ID=chr1;Name=Chr1' as in TAIR10_genes.gff?

            Comment


            • #7
              Originally posted by dpryan View Post
              I can't tell you what, if anything, you need to change yet. You need to simply run
              Code:
              /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
              to see what the underlying error message is. Then you can figure out what's actually wrong.
              After replacing 'Parent' to 'ID', new error came....

              [2013-10-04 20:31:29] Building Bowtie index from PLncDB.fa
              [FAILED]
              Error: Couldn't build bowtie index with err = 1

              then I ran log, present:

              Settings:
              Output files: "SP2_131004_2thout/tmp/PLncDB.*.bt2"
              Line rate: 6 (line is 64 bytes)
              Lines per side: 1 (side is 64 bytes)
              Offset rate: 4 (one in 16)
              FTable chars: 10
              Strings: unpacked
              Max bucket size: default
              Max bucket size, sqrt multiplier: default
              Max bucket size, len divisor: 4
              Difference-cover sample period: 1024
              Endianness: little
              Actual local endianness: little
              Sanity checking: disabled
              Assertions: disabled
              Random seed: 0
              Sizeofs: void*:8, int:4, long:8, size_t:8
              Input files DNA, FASTA:
              SP2_131004_2thout/tmp/PLncDB.fa
              Warning: Empty fasta file: 'SP2_131004_2thout/tmp/PLncDB.fa'
              Warning: All fasta inputs were empty
              Total time for call to driver() for forward index: 00:00:00
              Error: Encountered internal Bowtie 2 exception (#1)

              Is here means that I have to generate an new index, with exactly the same name as PLncDB, but not previous built genome?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X