Hello All,
I was having this issue, while I was running "cuffmerge" on the assemblies built using cufflinks 2.1.1.
It turned out, that the problem with duplicated entries was not with the gencode gtf file which I was using for reference, but the "transcripts.gtf" file created during cufflinks step.
After, updating cufflinks to a newer version 2.2.1 and re-running cufflinks step has resolved this issue.
Hope that helps.
Good luck
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
It seems that the GTF file is provided by TAIR now, has anyone tried it?
ftp://ftp.arabidopsis.org/home/tair/...enes_exons.gtf
thanks,
Originally posted by kpatel View PostHi kmcarr,
Would it be possible for you to email me the TAIR9 gtf file?
thanks
Leave a comment:
-
gff.pm
Hi kmcarr,
could you post your gff.pm hack? I need to do this conversion and need to worry about frame.
Thanks,
Bob
Leave a comment:
-
Hi kmcarr,
I am also interested in your TAIR9 gtf file. Would it be possible to email me this file ([email protected]) ?
Thanks !
Leave a comment:
-
Hi kmcarr,
Would it be possible for you to email me the TAIR9 gtf file?
thanks
Leave a comment:
-
Originally posted by DrD2009 View PostThank you for the reply that clears some things up for me.
I do have a few questions though:
1.) How were able to convert the TAIR9 GFF3 files into GTF format?
2.) We are mostly interested in investigating small RNA such as miRNA, siRNA, and other non-coding RNA. We have files for them in GFF. The siRNA data started out as just sequences in supplementary data. From those I aligned them to the genome and created a GFF from that data. How could I supply files such as those to Cufflinks?
Example:
Code:Chr1 TAIR9 Jacobsen_siRNA 10002796 10002812 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004771 10004794 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004925 10004941 . . . . Chr1 TAIR9 Jacobsen_siRNA 10007606 10007626 . . . .
Note: I was going to post the entire TAIR9 GTF but the gzipped file is too large to attach and I don't have an accessible server. If you desperately need it send me a PM an I could e-mail it to you.Attached Files
Leave a comment:
-
GList.hh:592 error
Same situation for me. I cannot run cuffcompare because of duplicate errors. What I did was to delete all duplicated exon lines (exon numbers vary though) but keep transcript lines with a perl script. Compared to original gtf file generated by cufflinks, this new "transcript only" gtf file sounds have all information including strand.
however, I still got error "GList error (GList.hh:592):Invalid list index: 0".
Henko, can you share your idea what is going on?
cheers
Leave a comment:
-
Hi, I'm encountering a similar issue with cuffcompare. While trying to run it with the transcripts.gtf generated from cufflinks, it gave me the following error:
GList error (GList.hh:592):Invalid list index: 0
This is very strange because the file was generated from cufflinks, it's supposed to work with cuffcompare. Could someone please help?
Thanks!
-EDIT-
I found out that it could be because of the missing strand information. Sorry about that.
Leave a comment:
-
Thank you for the reply that clears some things up for me.
I do have a few questions though:
1.) How were able to convert the TAIR9 GFF3 files into GTF format?
2.) We are mostly interested in investigating small RNA such as miRNA, siRNA, and other non-coding RNA. We have files for them in GFF. The siRNA data started out as just sequences in supplementary data. From those I aligned them to the genome and created a GFF from that data. How could I supply files such as those to Cufflinks?
Example:
Code:Chr1 TAIR9 Jacobsen_siRNA 10002796 10002812 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004771 10004794 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004925 10004941 . . . . Chr1 TAIR9 Jacobsen_siRNA 10007606 10007626 . . . .
Leave a comment:
-
Ignore everything except for exons and CDS lines; those are all that matter to cufflinks. Every exon or CDS entry which is part of the same gene must have the same "gene_id". Every exon or CDS which is part of the same transcript must have the same "transcript_id". Here is an example of one gene (AT1G01020) which has two transcripts (AT1G01020.1 and AT1G01020.2).
The GFF3 (TAIR9 annotation);
Code:Chr1 TAIR9 gene 5928 8737 . - . ID=AT1G01020;Note=protein_coding_gene;Name=AT1G01020 Chr1 TAIR9 mRNA 5928 8737 . - . ID=AT1G01020.1;Parent=AT1G01020;Name=AT1G01020.1;Index=1 Chr1 TAIR9 protein 6915 8666 . - . ID=AT1G01020.1-Protein;Name=AT1G01020.1;Derives_from=AT1G01020.1 Chr1 TAIR9 five_prime_UTR 8667 8737 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8571 8666 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8571 8737 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8417 8464 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8417 8464 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8236 8325 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8236 8325 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7942 7987 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7942 7987 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7762 7835 . - 2 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7762 7835 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7564 7649 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7564 7649 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7384 7450 . - 1 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7384 7450 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7157 7232 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7157 7232 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 6915 7069 . - 2 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 three_prime_UTR 6437 6914 . - . Parent=AT1G01020.1 Chr1 TAIR9 exon 6437 7069 . - . Parent=AT1G01020.1 Chr1 TAIR9 three_prime_UTR 5928 6263 . - . Parent=AT1G01020.1 Chr1 TAIR9 exon 5928 6263 . - . Parent=AT1G01020.1 Chr1 TAIR9 mRNA 6790 8737 . - . ID=AT1G01020.2;Parent=AT1G01020;Name=AT1G01020.2;Index=1 Chr1 TAIR9 protein 7315 8666 . - . ID=AT1G01020.2-Protein;Name=AT1G01020.2;Derives_from=AT1G01020.2 Chr1 TAIR9 five_prime_UTR 8667 8737 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8571 8666 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8571 8737 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8417 8464 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8417 8464 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8236 8325 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8236 8325 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7942 7987 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7942 7987 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7762 7835 . - 2 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7762 7835 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7564 7649 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7564 7649 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7315 7450 . - 1 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 three_prime_UTR 7157 7314 . - . Parent=AT1G01020.2 Chr1 TAIR9 exon 7157 7450 . - . Parent=AT1G01020.2 Chr1 TAIR9 three_prime_UTR 6790 7069 . - . Parent=AT1G01020.2 Chr1 TAIR9 exon 6790 7069 . - . Parent=AT1G01020.2
Code:Chr1 TAIR9 CDS 8571 8666 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8571 8737 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8417 8464 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8417 8464 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8236 8325 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8236 8325 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7942 7987 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7942 7987 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7762 7835 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7762 7835 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7564 7649 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7564 7649 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7384 7450 . - 1 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7384 7450 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7157 7232 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7157 7232 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 6915 7069 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 6437 7069 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 5928 6263 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8571 8666 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8571 8737 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 8417 8464 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8417 8464 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 8236 8325 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8236 8325 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7942 7987 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7942 7987 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7762 7835 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7762 7835 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7564 7649 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7564 7649 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7315 7450 . - 1 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7157 7450 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 6790 7069 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2";
Leave a comment:
-
Problems creating GTF for Cufflinks annotation
I have been trying to supply a GTF for annotation with Cufflinks/Cuffcompare and I have been having no success at all.
I started by only having GFF files. The organism I work with, Arabidopsis, does not have any published GTF annotation files that I have been able to locate and I saw someone else on here was unable to locate any as well. So I attempted to convert the GFFs I had into GTFs by converting the ninth column. I used http://mblab.wustl.edu/GTF22.html as my reference.
On the first try I simply took the feature column and made it the gene_id and the transcript_id, knowing the names would be nice, but for our purposes just knowing what the reads represent is sufficient (mRNA, miRNA, siRNA, pseudogene, etc.)
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene"; transcript_id "gene"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA"; transcript_id "mRNA"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein"; transcript_id "protein";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. Error: duplicate GFF ID 'mRNA' encountered!
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene-2"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA-3"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein-4";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene12"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA13"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein14";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Can anyone make a recommendation on changing a GFF into a GTF? Tophat was able to supply GFF files for annotation, but for some reason Cufflinks only allows GTF files to provide annotation. It's great for some of the more mainstream organisms, but a lot of them (Arabidopsis in my case) only have annotations in GFF and GFF3 which creates a wall in being able to process the expression data.
Any and all help/suggestions would be greatly appreciated. I've been hung on up this problem for some time now and I have no more ideas on how to proceed.
Thanks as always.Tags: None
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: