Seqanswers Leaderboard Ad

**mikep** · 09-29-2013, 07:28 PM

You haven't really indicated how you added the cufflinks output to the cuffdiff input. From the looks of your cufflinks command line you are using -GTF which means no novel transcripts. In that case running cufflinks is entirely pointless, and you didn't need to do it.

Moving onto cuffdiff. From the looks of it there is a problem with the Oryza.gtf that you are using. cuffdiff expects a number of fields, specifically transcript_id, gene_id, tss_id, and p_id to perform the various comparisons.

So my first guess is your gtf is malformed/missing fields.

**deepika123** · 09-30-2013, 05:16 AM

Thanks for your reply.....
Actually i have paired end illumina sequencing data, one is control and other one is experimental and mapped on rice genome using tophat and the command is:

tophat -G rice_data/Oryza.gtf -p 5 -o K1_tophat_new rice_data/Oryza rice_data/K1_R1_filter.fastq rice_data/K1_R2_filter.fastq

tophat -G rice_data/Oryza.gtf -p 5 -o K2_tophat_new rice_data/Oryza rice_data/K2_R1_filter.fastq rice_data/K2_R2_filter.fastq

after running this, i got the error:[2013-09-30 18:54:34] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2013-09-30 18:54:34] Checking for Bowtie
Bowtie version: 2.1.0.0
[2013-09-30 18:54:34] Checking for Samtools
Samtools version: 0.1.19.0
[2013-09-30 18:54:34] Checking for Bowtie index files (genome)..
[2013-09-30 18:54:34] Checking for reference FASTA file
[2013-09-30 18:54:34] Generating SAM header for rice_data/Oryza
format: fastq
quality scale: phred33 (default)
[2013-09-30 18:54:35] Reading known junctions from GTF file
[2013-09-30 18:54:44] Preparing reads
left reads: min. length=57, max. length=83, 33510885 kept reads (8 discarded)
right reads: min. length=57, max. length=83, 33509751 kept reads (1142 discarded)
[2013-09-30 19:30:03] Building transcriptome data files..
[2013-09-30 19:30:23] Building Bowtie index from Oryza.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1

But when i ran it without GTF file then it run successfully. Can anyone tell me that where i am wrong?

and the format of gtf file of rice is:

Un protein_coding exon 1489 1644 . - . gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129"; seqedit "false";
Un protein_coding CDS 1489 1644 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129"; protein_id "13113.m00129";
Un protein_coding start_codon 1642 1644 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129";
Un protein_coding exon 1193 1357 . - . gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129"; seqedit "false";
Un protein_coding CDS 1196 1357 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129"; protein_id "13113.m00129";
Un protein_coding stop_codon 1193 1195 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129";
Un protein_coding exon 4230 4368 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131"; seqedit "false";
Un protein_coding CDS 4230 4368 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131"; protein_id "13113.m00131";
Un protein_coding start_codon 4366 4368 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131";
Un protein_coding exon 3151 3521 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "2"; transcript_name "13113.m00131"; seqedit "false";
Un protein_coding CDS 3151 3521 . - 2 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "2"; transcript_name "13113.m00131"; protein_id "13113.m00131";
Un protein_coding exon 2986 3001 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "3"; transcript_name "13113.m00131"; seqedit "false";
Un protein_coding CDS 2986 3001 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "3"; transcript_name "13113.m00131"; protein_id "13113.m00131";
Un protein_coding exon 2314 2900 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131"; seqedit "false";
Un protein_coding CDS 2317 2900 . - 2 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131"; protein_id "13113.m00131";
Un protein_coding stop_codon 2314 2316 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131";
Un protein_coding exon 5772 5785 . + . gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133"; seqedit "false";
Un protein_coding CDS 5772 5785 . + 0 gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133"; protein_id "13113.m00133";
Un protein_coding start_codon 5772 5774 . + 0 gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133";
Un protein_coding exon 6126 6308 . + . gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "2"; transcript_name "13113.m00133"; seqedit "false";

is there something wrong in GTF file? and if yes then pls tell me where i got rice GTF file.

**mikep** · 10-01-2013, 12:09 AM

OK, so you've listed a few problems, let's start with the one posted originally (cuffdiff)

I am not familiar with the rice genome or what format it is in or what it's chromosomes are called. However, the GTF file posted above lists the chromsome name as "Un", which I doubt is an actual chromosome. Not sure where you got your GTF from , but the first column in the GTF must match exactly the names of the chromosomes in the fa file.

Secondly, tophat2, unlike tophat1, builds a bowtie index for the transcriptome, you should read the manual about this, and use --transcriptome-index. It looks as though the index couldn't be built, might be a read permission error, or a path error.

Thirdly, your tophat commandlines don't look like paired end inputs, since there's only one fq file being mentioned.

**deepika123** · 10-01-2013, 11:37 PM

hello...

i ran tophat with GTF file of rice but i got error and i do not know how to solve it please anyone can suggest me that how can i solve this problem.

/usr/local/bin/tophat -G rice_data/Oryza.gtf -p 5 -o KC_tophat_new --transcriptome-index=known rice_data/Oryza rice_data/KC-24Hr_R1_filter.fastq rice_data/KC-24Hr_R2_filter.fastq

and error is:

[2013-10-02 12:36:15] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2013-10-02 12:36:15] Checking for Bowtie
Bowtie version: 2.1.0.0
[2013-10-02 12:36:15] Checking for Samtools
Samtools version: 0.1.19.0
[2013-10-02 12:36:19] Checking for Bowtie index files (genome)..
[2013-10-02 12:36:19] Checking for reference FASTA file
[2013-10-02 12:36:19] Generating SAM header for rice_data/Oryza
format: fastq
quality scale: phred33 (default)
[2013-10-02 12:36:19] Reading known junctions from GTF file
[2013-10-02 12:36:27] Preparing reads
left reads: min. length=56, max. length=83, 37951504 kept reads (113 discarded)
right reads: min. length=57, max. length=83, 37949924 kept reads (1693 discarded)
[2013-10-02 13:09:09] Building transcriptome data files..
[FAILED]
Error: gtf_to_fasta returned an error.

thanks in advance

**dpryan** · 10-02-2013, 12:06 AM

If you look in the run log, you'll find the actual gtf_to_fasta command that was run. Just run that yourself to get a more informative error message.

**deepika123** · 10-02-2013, 12:18 AM

yes i saw run log and run the command:

/usr/local/bin/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir KC_tophat_new/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations known/Oryza.gff --gtf-juncs KC_tophat_new/tmp/Oryza.juncs --no-closure-search --no-coverage-search --no-microexon-search known/Oryza.gff rice_data/Oryza.fa known/Oryza.fa > KC_tophat_new/logs/g2f.out

and the error is:
terminate called after throwing an instance of 'std:

ut_of_range'
what(): basic_string::substr
Aborted

how to solve this

**dpryan** · 10-02-2013, 12:29 AM

gtf_to_fasta sometimes fails on non-standard chromosome/contig names (see issue #38). Alternatively, this might also be caused by the annotation file describing a feature that goes beyond the bounds of one of the chromosomes/contigs. Short of going through the code, I can't really say which. You might just try to make a truncated annotation file and see if that works. If so, just expand it until you hit the error again. Then you'll know where the problem is.

**deepika123** · 10-02-2013, 12:46 AM

Thanks dpryan...
Ok.. i am trying. Can you send me the link from where i downloaded gtf files of plant because i am trying to download from ensemble but i am unable to download due DAS and i am so good in programming .
i can be possible that error is due to annotation file if you send me the link, m thankful to you.

**dpryan** · 10-02-2013, 12:53 AM

Well, you already have the gff file, so just truncate that.

**deepika123** · 10-03-2013, 02:01 AM

thanks dpryan.. now tophat run successfully

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

cufflinks output

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News