Seqanswers Leaderboard Ad

**kmcarr** · 04-23-2010, 05:15 PM

I don't know if this is the problem, but all of your ID and parent identifiers are the same. IDs must be unique and an entry can't be its own parent.

**thinkRNA** · 04-24-2010, 09:07 AM

mouse GFF file for tophat

Thanks for taking the time to reply

What do you mean by "an entry can't be its own parent"?

I have tried different GFF but get the same errors. For example:

chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG00000024231
chr18 protein_coding exon 3383239 3383395 . + . ID=ENSMUST00000115872.1;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3399844 3399984 . + . ID=ENSMUST00000115872.2;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3405683 3405785 . + . ID=ENSMUST00000115872.3;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3414128 3414222 . + . ID=ENSMUST00000115872.4;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3417533 3417638 . + . ID=ENSMUST00000115872.5;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3418562 3418644 . + . ID=ENSMUST00000115872.6;Name=Cul2;Parent=ENSMUST00000115872
c

Do you or anyone else have a mouse GFF that works with tophat-I will really appreaciate your help? I noted that Command is probably because segment_juncs.fa is empty. I don't know where/how tophat gets these sequences.
bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs [FAILED]

**kmcarr** · 04-26-2010, 08:49 AM

Originally posted by thinkRNA View Post

Thanks for taking the time to reply

What do you mean by "an entry can't be its own parent"?

In your first example you had on a single line "ID=uc007afc.1" and "Parent=uc007afc.1". You were saying that the parent if this feature has the same ID as this feature; in other words, it is its own parent. This is not allowed. You also have all three features in your example (the gene, mRNA and exon) identified with the same ID. This is an improperly formed GFF file which I thought might be the cause of your problem.

Originally posted by thinkRNA View Post

I have tried different GFF but get the same errors. For example:

chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG00000024231
chr18 protein_coding exon 3383239 3383395 . + . ID=ENSMUST00000115872.1;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3399844 3399984 . + . ID=ENSMUST00000115872.2;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3405683 3405785 . + . ID=ENSMUST00000115872.3;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3414128 3414222 . + . ID=ENSMUST00000115872.4;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3417533 3417638 . + . ID=ENSMUST00000115872.5;Name=Cul2;Parent=ENSMUST00000115872
chr18 protein_coding exon 3418562 3418644 . + . ID=ENSMUST00000115872.6;Name=Cul2;Parent=ENSMUST00000115872
c

Do you or anyone else have a mouse GFF that works with tophat-I will really appreaciate your help? I noted that Command is probably because segment_juncs.fa is empty. I don't know where/how tophat gets these sequences.
bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs [FAILED]

This GFF is properly formed. Each feature has a unique ID, and the exons properly identify the mRNA as their parent. Alas, I now have no explanation as to what is causing your problems with Tophat.

**thinkRNA** · 04-26-2010, 09:01 AM

I found the problem. tophat creates a .fa file from the indexes. Even though I had this file in the directory, mm9.fa, for some reason it was empty. If the file was not present, bowtie-build creates it. I started from scratch in a new directory with new links to the indexes, and it worked!

**James** · 07-05-2010, 08:04 AM

Hi, how did you solve this problem?

I get the same error:

tophat -G GFF3/data/gff3/combined.gff --no-novel-juncs indexes/genomic reads/11.3.10/R43s_4_sequence.fastq

[Sun Jul 4 16:24:40 2010] Beginning TopHat run (v1.0.11)
-----------------------------------------------
[Sun Jul 4 16:24:40 2010] Preparing output location ./tophat_out/
[Sun Jul 4 16:24:40 2010] Checking for Bowtie index files
[Sun Jul 4 16:24:40 2010] Checking for reference FASTA file
[Sun Jul 4 16:24:40 2010] Checking for Bowtie
Bowtie version: 0.12.3.0
[Sun Jul 4 16:24:40 2010] Checking reads
seed length: 36bp
format: fastq
quality scale: --phred33-quals
[Sun Jul 4 16:26:29 2010] Reading known junctions from GFF file
[Sun Jul 4 16:27:31 2010] Mapping reads against DictyAx4_genomic with Bowtie
[Sun Jul 4 16:46:16 2010] Joining segment hits
[Sun Jul 4 16:48:28 2010] Retrieving sequences for splices
[Sun Jul 4 16:48:32 2010] Indexing splices
Warning: Empty input file
Error: No unambiguous stretches of characters in the input. Aborting...
Command: bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs
[FAILED]
Error: Splice sequence indexing failed with err = 1

It works if I just use -G option but I've also added the --no-novel-juncs option.

Any ideas?

(P.s This isn't on mouse genome. And the GFF it one I obtained from the database of my species not gtf conversted to GFF3)

**ChrisL** · 07-06-2010, 02:08 AM

Dear ThinkRNA (or anyone else),

When creating your Ensembl-based GFF/GTF file how did you get the gene trivial name inserted (marked in red below)? I have tried to do this using the UCSC table browser to generate an Ensembl-based GTF file but I just end up with the Ensemble I.D. even though the tivial name is in the underlying data table as the "name2" field.

chr18 protein_coding mRNA 3383239 3435368 . + . ID=ENSMUST00000115872;Name=Cul2;Parent=ENSMUSG0000 0024231

Thanks.

**lakshmaa** · 09-21-2011, 04:03 AM

Hi ThinkRNA,

I have the same problem as you did. And I tried running it from different directories from scrath and I get the same error.

[Tue Sep 20 15:23:30 2011] Beginning TopHat run (v1.2.0)
-----------------------------------------------
[Tue Sep 20 15:23:30 2011] Preparing output location tophat_BC4_gtf_mix1/
[Tue Sep 20 15:23:31 2011] Checking for Bowtie index files
[Tue Sep 20 15:23:31 2011] Checking for reference FASTA file
[Tue Sep 20 15:23:31 2011] Checking for Bowtie
Bowtie version: 0.12.7.0
[Tue Sep 20 15:23:31 2011] Checking for Samtools
Samtools Version: 0.1.8
[Tue Sep 20 15:23:53 2011] Checking reads
min read length: 95bp, max read length: 101bp
format: fastq
quality scale: solexa33 (reads generated with GA pipeline version < 1.3)
[Tue Sep 20 15:26:25 2011] Reading known junctions from GTF file
[Tue Sep 20 15:27:39 2011] Mapping reads against zv9 with Bowtie
[Tue Sep 20 16:47:01 2011] Joining segment hits
[Tue Sep 20 16:53:45 2011] Mapping reads against zv9 with Bowtie(1/4)
[Tue Sep 20 17:40:17 2011] Mapping reads against zv9 with Bowtie(2/4)
[Tue Sep 20 18:25:21 2011] Mapping reads against zv9 with Bowtie(3/4)
[Tue Sep 20 19:12:59 2011] Mapping reads against zv9 with Bowtie(4/4)
[Tue Sep 20 20:55:40 2011] Searching for junctions via segment mapping
[Tue Sep 20 20:57:48 2011] Retrieving sequences for splices
[Tue Sep 20 21:01:18 2011] Indexing splices
Warning: Empty input file
Error: No unambiguous stretches of characters in the input. Aborting...
Command: bowtie-build tophat_BC4_gtf_mix1/tmp/segment_juncs.fa tophat_BC4_gtf_mix1/tmp/segment_juncs
[FAILED]
Error: Splice sequence indexing failed with err = 1

Here is a sample of my gtf file

chr1 danRer7_ensGene stop_codon 25135780 25135782 0.000000 - . gene_id "ENSDART00000112
899"; transcript_id "ENSDART00000112899";
chr1 danRer7_ensGene CDS 25135783 25135824 0.000000 - 0 gene_id "ENSDART00000112899"; tr
anscript_id "ENSDART00000112899";
chr1 danRer7_ensGene exon 25135780 25135824 0.000000 - . gene_id "ENSDART00000112899"; tr
anscript_id "ENSDART00000112899";

And here is my Tophat command:
python /share/bin/tophat-1.2.0.Linux_x86_64/tophat -p 100 -g 5 -a 10 --solexa-quals -o tophat_BC1_gtf_mix1 -G /home/lakshmaa/scratch/Task54/ensembl_zv9_2.gtf /share/apps/Genomes/Zv9_Bowtie/zv9 /home/lakshmaa/scratch/Task54/barcode/READ_corrected_raw/BC1_mix1_all.fq

Can anyone please help me solve this problem!

Thanks,
Abi

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

tophat complains

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News