Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SpreeFu
    replied
    Originally posted by dpryan View Post
    I can't tell you what, if anything, you need to change yet. You need to simply run
    Code:
    /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
    to see what the underlying error message is. Then you can figure out what's actually wrong.
    After replacing 'Parent' to 'ID', new error came....

    [2013-10-04 20:31:29] Building Bowtie index from PLncDB.fa
    [FAILED]
    Error: Couldn't build bowtie index with err = 1

    then I ran log, present:

    Settings:
    Output files: "SP2_131004_2thout/tmp/PLncDB.*.bt2"
    Line rate: 6 (line is 64 bytes)
    Lines per side: 1 (side is 64 bytes)
    Offset rate: 4 (one in 16)
    FTable chars: 10
    Strings: unpacked
    Max bucket size: default
    Max bucket size, sqrt multiplier: default
    Max bucket size, len divisor: 4
    Difference-cover sample period: 1024
    Endianness: little
    Actual local endianness: little
    Sanity checking: disabled
    Assertions: disabled
    Random seed: 0
    Sizeofs: void*:8, int:4, long:8, size_t:8
    Input files DNA, FASTA:
    SP2_131004_2thout/tmp/PLncDB.fa
    Warning: Empty fasta file: 'SP2_131004_2thout/tmp/PLncDB.fa'
    Warning: All fasta inputs were empty
    Total time for call to driver() for forward index: 00:00:00
    Error: Encountered internal Bowtie 2 exception (#1)

    Is here means that I have to generate an new index, with exactly the same name as PLncDB, but not previous built genome?

    Leave a comment:


  • SpreeFu
    replied
    Originally posted by dpryan View Post
    I can't tell you what, if anything, you need to change yet. You need to simply run
    Code:
    /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
    to see what the underlying error message is. Then you can figure out what's actually wrong.
    It showed Error: no ID found for GFF record start
    So I have to change 'Parent=' to 'ID='?
    Do I need add the line '1 TAIR10 Chromosome 1 30427671 . . . ID=chr1;Name=Chr1' as in TAIR10_genes.gff?

    Leave a comment:


  • dpryan
    replied
    I can't tell you what, if anything, you need to change yet. You need to simply run
    Code:
    /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out
    to see what the underlying error message is. Then you can figure out what's actually wrong.

    Leave a comment:


  • SpreeFu
    replied
    Originally posted by dpryan View Post
    If you look in the run log, you'll see the exact gtf_to_fasta command that's run. Simply run that yourself to find out what the actual (usually more informative) error message is.
    Hi Ryan, thanks for your reply!

    I got the run log
    #>map_start:
    /home/jana/Software/tophat-2.0.9.Linux_x86_64/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir SP2_131004_2thout/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations PLncDB.gff --gtf-juncs 0SP2_131004_2thout/tmp/PLncDB.juncs --no-closure-search --no-coverage-search --no-microexon-search PLncDB.gff genome.fa SP2_131004_2thout/tmp/PLncDB.fa > SP2_131004_2thout/logs/g2f.out

    Do you mean I have to change the parameters, because my gff file is not somehow standard?

    Leave a comment:


  • dpryan
    replied
    If you look in the run log, you'll see the exact gtf_to_fasta command that's run. Simply run that yourself to find out what the actual (usually more informative) error message is.

    Leave a comment:


  • SpreeFu
    replied
    BTW, tophat ran soomthly when I used TAIR10_GFF3_genes.gff.

    Leave a comment:


  • SpreeFu
    started a topic gtf_to_fasta returned an error

    gtf_to_fasta returned an error

    Hi all,
    Could you help me to solve the problem show as following please?

    Here is the error:
    $ tophat -p 8 -G PLncDB.gff -o SP2_131004_2thout --no-novel-juncs genome SP2_R1_PE100_clipped.fastq SP2_R2_PE100_clipped.fastq

    [2013-10-04 18:27:48] Beginning TopHat run (v2.0.9)
    -----------------------------------------------
    [2013-10-04 18:27:48] Checking for Bowtie
    Bowtie version: 2.1.0.0
    [2013-10-04 18:27:48] Checking for Samtools
    Samtools version: 0.1.18.0
    [2013-10-04 18:27:48] Checking for Bowtie index files (genome)..
    [2013-10-04 18:27:48] Checking for reference FASTA file
    [2013-10-04 18:27:48] Generating SAM header for genome
    format: fastq
    quality scale: phred33 (default)
    [2013-10-04 18:27:49] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
    [2013-10-04 18:27:49] Preparing reads
    left reads: min. length=12, max. length=100, 36930071 kept reads (3968 discarded)
    right reads: min. length=12, max. length=100, 36922334 kept reads (11705 discarded)
    Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
    [2013-10-04 18:45:24] Building transcriptome data files..
    [FAILED]
    Error: gtf_to_fasta returned an error.

    I used self-made gff, here is part of it:
    1 PlncDB LincRNA 2497 2816 . - . Parent=At1NC000020
    1 PlncDB LincRNA 11100 11372 . + . Parent=At1NC000060
    1 PlncDB LincRNA 43086 43295 . - . Parent=At1NC000160
    1 PlncDB LincRNA 51391 51733 . + . Parent=At1NC000200
    1 PlncDB LincRNA 90168 90401 . - . Parent=At1NC000340
    1 PlncDB LincRNA 91355 91685 . - . Parent=At1NC000350
    1 PlncDB LincRNA 107634 107877 . - . Parent=At1NC000410
    1 PlncDB LincRNA 135708 136008 . - . Parent=At1NC000530
    1 PlncDB LincRNA 140482 140691 . + . Parent=At1NC000560
    1 PlncDB LincRNA 207751 208011 . - . Parent=At1NC000780
    1 PlncDB LincRNA 217908 218112 . - . Parent=At1NC000810

    Here is Arabidopsis Genome
    $ head genome.fa
    >1
    CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCC
    TACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTTCTCTGGTTGAAAATCATTGT
    GTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTATTGTTGTGTGTAGATTTTTTAAAAATATCATTT
    GAGGTCAATACAAATCCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTG
    TTATATTGGATACAAGCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTG
    GTTTATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTTACTCCTTTG
    TGGAAATGTTTGTTCTATCAATTTATCTTTTGTGGGAAAATTATTTAGTTGTAGGGATGAAGTCTTTCTTCG
    TTGTTGTTACGCTTGTCATCTCATCTCTCAATGATATGG

    How can I solve this problem?

    Thanks in advance!
    Last edited by SpreeFu; 10-04-2013, 09:42 AM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X