Hello All,
I was having this issue, while I was running "cuffmerge" on the assemblies built using cufflinks 2.1.1.
It turned out, that the problem with duplicated entries was not with the gencode gtf file which I was using for reference, but the "transcripts.gtf" file created during cufflinks step.
After, updating cufflinks to a newer version 2.2.1 and re-running cufflinks step has resolved this issue.
Hope that helps.
Good luck
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
It seems that the GTF file is provided by TAIR now, has anyone tried it?
ftp://ftp.arabidopsis.org/home/tair/...enes_exons.gtf
thanks,
Originally posted by kpatel View PostHi kmcarr,
Would it be possible for you to email me the TAIR9 gtf file?
thanks
Leave a comment:
-
gff.pm
Hi kmcarr,
could you post your gff.pm hack? I need to do this conversion and need to worry about frame.
Thanks,
Bob
Leave a comment:
-
Hi kmcarr,
I am also interested in your TAIR9 gtf file. Would it be possible to email me this file ([email protected]) ?
Thanks !
Leave a comment:
-
Hi kmcarr,
Would it be possible for you to email me the TAIR9 gtf file?
thanks
Leave a comment:
-
Originally posted by DrD2009 View PostThank you for the reply that clears some things up for me.
I do have a few questions though:
1.) How were able to convert the TAIR9 GFF3 files into GTF format?
2.) We are mostly interested in investigating small RNA such as miRNA, siRNA, and other non-coding RNA. We have files for them in GFF. The siRNA data started out as just sequences in supplementary data. From those I aligned them to the genome and created a GFF from that data. How could I supply files such as those to Cufflinks?
Example:
Code:Chr1 TAIR9 Jacobsen_siRNA 10002796 10002812 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004771 10004794 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004925 10004941 . . . . Chr1 TAIR9 Jacobsen_siRNA 10007606 10007626 . . . .
Note: I was going to post the entire TAIR9 GTF but the gzipped file is too large to attach and I don't have an accessible server. If you desperately need it send me a PM an I could e-mail it to you.Attached Files
Leave a comment:
-
GList.hh:592 error
Same situation for me. I cannot run cuffcompare because of duplicate errors. What I did was to delete all duplicated exon lines (exon numbers vary though) but keep transcript lines with a perl script. Compared to original gtf file generated by cufflinks, this new "transcript only" gtf file sounds have all information including strand.
however, I still got error "GList error (GList.hh:592):Invalid list index: 0".
Henko, can you share your idea what is going on?
cheers
Leave a comment:
-
Hi, I'm encountering a similar issue with cuffcompare. While trying to run it with the transcripts.gtf generated from cufflinks, it gave me the following error:
GList error (GList.hh:592):Invalid list index: 0
This is very strange because the file was generated from cufflinks, it's supposed to work with cuffcompare. Could someone please help?
Thanks!
-EDIT-
I found out that it could be because of the missing strand information. Sorry about that.
Leave a comment:
-
Thank you for the reply that clears some things up for me.
I do have a few questions though:
1.) How were able to convert the TAIR9 GFF3 files into GTF format?
2.) We are mostly interested in investigating small RNA such as miRNA, siRNA, and other non-coding RNA. We have files for them in GFF. The siRNA data started out as just sequences in supplementary data. From those I aligned them to the genome and created a GFF from that data. How could I supply files such as those to Cufflinks?
Example:
Code:Chr1 TAIR9 Jacobsen_siRNA 10002796 10002812 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004771 10004794 . . . . Chr1 TAIR9 Jacobsen_siRNA 10004925 10004941 . . . . Chr1 TAIR9 Jacobsen_siRNA 10007606 10007626 . . . .
Leave a comment:
-
Ignore everything except for exons and CDS lines; those are all that matter to cufflinks. Every exon or CDS entry which is part of the same gene must have the same "gene_id". Every exon or CDS which is part of the same transcript must have the same "transcript_id". Here is an example of one gene (AT1G01020) which has two transcripts (AT1G01020.1 and AT1G01020.2).
The GFF3 (TAIR9 annotation);
Code:Chr1 TAIR9 gene 5928 8737 . - . ID=AT1G01020;Note=protein_coding_gene;Name=AT1G01020 Chr1 TAIR9 mRNA 5928 8737 . - . ID=AT1G01020.1;Parent=AT1G01020;Name=AT1G01020.1;Index=1 Chr1 TAIR9 protein 6915 8666 . - . ID=AT1G01020.1-Protein;Name=AT1G01020.1;Derives_from=AT1G01020.1 Chr1 TAIR9 five_prime_UTR 8667 8737 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8571 8666 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8571 8737 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8417 8464 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8417 8464 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 8236 8325 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 8236 8325 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7942 7987 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7942 7987 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7762 7835 . - 2 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7762 7835 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7564 7649 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7564 7649 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7384 7450 . - 1 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7384 7450 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 7157 7232 . - 0 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 exon 7157 7232 . - . Parent=AT1G01020.1 Chr1 TAIR9 CDS 6915 7069 . - 2 Parent=AT1G01020.1,AT1G01020.1-Protein; Chr1 TAIR9 three_prime_UTR 6437 6914 . - . Parent=AT1G01020.1 Chr1 TAIR9 exon 6437 7069 . - . Parent=AT1G01020.1 Chr1 TAIR9 three_prime_UTR 5928 6263 . - . Parent=AT1G01020.1 Chr1 TAIR9 exon 5928 6263 . - . Parent=AT1G01020.1 Chr1 TAIR9 mRNA 6790 8737 . - . ID=AT1G01020.2;Parent=AT1G01020;Name=AT1G01020.2;Index=1 Chr1 TAIR9 protein 7315 8666 . - . ID=AT1G01020.2-Protein;Name=AT1G01020.2;Derives_from=AT1G01020.2 Chr1 TAIR9 five_prime_UTR 8667 8737 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8571 8666 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8571 8737 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8417 8464 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8417 8464 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 8236 8325 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 8236 8325 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7942 7987 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7942 7987 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7762 7835 . - 2 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7762 7835 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7564 7649 . - 0 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 exon 7564 7649 . - . Parent=AT1G01020.2 Chr1 TAIR9 CDS 7315 7450 . - 1 Parent=AT1G01020.2,AT1G01020.2-Protein; Chr1 TAIR9 three_prime_UTR 7157 7314 . - . Parent=AT1G01020.2 Chr1 TAIR9 exon 7157 7450 . - . Parent=AT1G01020.2 Chr1 TAIR9 three_prime_UTR 6790 7069 . - . Parent=AT1G01020.2 Chr1 TAIR9 exon 6790 7069 . - . Parent=AT1G01020.2
Code:Chr1 TAIR9 CDS 8571 8666 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8571 8737 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8417 8464 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8417 8464 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8236 8325 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 8236 8325 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7942 7987 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7942 7987 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7762 7835 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7762 7835 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7564 7649 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7564 7649 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7384 7450 . - 1 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7384 7450 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 7157 7232 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 7157 7232 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 6915 7069 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 6437 7069 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 EXON 5928 6263 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.1"; Chr1 TAIR9 CDS 8571 8666 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8571 8737 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 8417 8464 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8417 8464 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 8236 8325 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 8236 8325 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7942 7987 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7942 7987 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7762 7835 . - 2 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7762 7835 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7564 7649 . - 0 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7564 7649 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 CDS 7315 7450 . - 1 gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 7157 7450 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2"; Chr1 TAIR9 EXON 6790 7069 . - . gene_id "AT1G01020"; transcript_id "AT1G01020.2";
Leave a comment:
-
Problems creating GTF for Cufflinks annotation
I have been trying to supply a GTF for annotation with Cufflinks/Cuffcompare and I have been having no success at all.
I started by only having GFF files. The organism I work with, Arabidopsis, does not have any published GTF annotation files that I have been able to locate and I saw someone else on here was unable to locate any as well. So I attempted to convert the GFFs I had into GTFs by converting the ninth column. I used http://mblab.wustl.edu/GTF22.html as my reference.
On the first try I simply took the feature column and made it the gene_id and the transcript_id, knowing the names would be nice, but for our purposes just knowing what the reads represent is sufficient (mRNA, miRNA, siRNA, pseudogene, etc.)
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene"; transcript_id "gene"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA"; transcript_id "mRNA"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein"; transcript_id "protein";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. Error: duplicate GFF ID 'mRNA' encountered!
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene-2"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA-3"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein-4";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Code:Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene12"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA13"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein14";
Code:cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Can anyone make a recommendation on changing a GFF into a GTF? Tophat was able to supply GFF files for annotation, but for some reason Cufflinks only allows GTF files to provide annotation. It's great for some of the more mainstream organisms, but a lot of them (Arabidopsis in my case) only have annotations in GFF and GFF3 which creates a wall in being able to process the expression data.
Any and all help/suggestions would be greatly appreciated. I've been hung on up this problem for some time now and I have no more ideas on how to proceed.
Thanks as always.Tags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: