Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to prepare gene references for Cufflinks

    Hi community

    I am very confused to prepare the gene references for TopHat-Cufflinks.

    Code:
    Tb427_01_v4	EuPathDB	supercontig	1	1064569	.	+	.	"ID=Tb427_01_v4;Name=Tb427_01_v4;description=Tb427_01_v4;size=1064569;web_id=Tb427_01_v4;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_01_v4,taxon:1854310"
    Tb427_10_v5	EuPathDB	supercontig	1	4145152	.	+	.	"ID=Tb427_10_v5;Name=Tb427_10_v5;description=Tb427_10_v5;size=4145152;web_id=Tb427_10_v5;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_10_v5,taxon:1854310"
    Tb427_11_01_v4	EuPathDB	supercontig	1	4977113	.	+	.	"ID=Tb427_11_01_v4;Name=Tb427_11_01_v4;description=Tb427_11_01_v4;size=4977113;web_id=Tb427_11_01_v4;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_11_01_v4,taxon:1854310"
    Tb427_01_v4	EuPathDB	gene	57396	59181	.	-	.	ID=Tb427.01.100;Name=Tb427.01.100;description=RNA+polymerase+%28pseudogene%29%2C+putative;size=1786;web_id=Tb427.01.100;locus_tag=Tb427.01.100;size=1786
    Tb427_01_v4	EuPathDB	mRNA	57396	59181	.	-	.	"ID=rna_Tb427.01.100-1;Name=Tb427.01.100-1;description=Tb427.01.100-1;size=1786;Parent=Tb427.01.100;Ontology_term=GO:0003899,GO:0006350,GO:0032549;Dbxref=EuPathDB:Tb427.01.100,taxon:1854310"
    Tb427_01_v4	EuPathDB	CDS	59086	59181	.	-	0	ID=cds_Tb427.01.100-1;Name=cds;description=.;size=96;Parent=rna_Tb427.01.100-1
    Tb427_01_v4	EuPathDB	CDS	57396	59081	.	-	0	ID=cds_Tb427.01.100-1;Name=cds;description=.;size=1686;Parent=rna_Tb427.01.100-1
    Tb427_01_v4	EuPathDB	exon	59086	59181	.	-	.	ID=exon_Tb427.01.100-1;Name=exon;description=exon;size=96;Parent=rna_Tb427.01.100-1
    Tb427_01_v4	EuPathDB	exon	57396	59081	.	-	.	ID=exon_Tb427.01.100-2;Name=exon;description=exon;size=1686;Parent=rna_Tb427.01.100-1
    Tb427_01_v4	EuPathDB	gene	289975	291237	.	+	.	ID=Tb427.01.1000;Name=Tb427.01.1000;description=developmentally+regulated+phosphoprotein;size=1263;web_id=Tb427.01.1000;locus_tag=Tb427.01.1000;size=1263
    Tb427_01_v4	EuPathDB	mRNA	289975	291237	.	+	.	"ID=rna_Tb427.01.1000-1;Name=Tb427.01.1000-1;description=Tb427.01.1000-1;size=1263;Parent=Tb427.01.1000;Ontology_term=GO:0005524;Dbxref=EuPathDB:Tb427.01.1000,NCBI_gi:261326019,taxon:1854310"
    Tb427_01_v4	EuPathDB	CDS	289975	291237	.	+	0	ID=cds_Tb427.01.1000-1;Name=cds;description=.;size=1263;Parent=rna_Tb427.01.1000-1
    Tb427_01_v4	EuPathDB	exon	289975	291237	.	+	.	ID=exon_Tb427.01.1000-1;Name=exon;description=exon;size=1263;Parent=rna_Tb427.01.1000-1
    Tb427_01_v4	EuPathDB	gene	291667	293826	.	+	.	ID=Tb427.01.1010;Name=Tb427.01.1010;description=hypothetical+protein%2C+conserved;size=2160;web_id=Tb427.01.1010;locus_tag=Tb427.01.1010;size=2160
    Tb427_01_v4	EuPathDB	mRNA	291667	293826	.	+	.	"ID=rna_Tb427.01.1010-1;Name=Tb427.01.1010-1;description=Tb427.01.1010-1;size=2160;Parent=Tb427.01.1010;Ontology_term=GO:0008270;Dbxref=EuPathDB:Tb427.01.1010,taxon:1854310"
    Tb427_01_v4	EuPathDB	CDS	291667	293826	.	+	0	ID=cds_Tb427.01.1010-1;Name=cds;description=.;size=2160;Parent=rna_Tb427.01.1010-1
    Tb427_01_v4	EuPathDB	exon	291667	293826	.	+	.	ID=exon_Tb427.01.1010-1;Name=exon;description=exon;size=2160;Parent=rna_Tb427.01.1010-1
    Tb427_01_v4	EuPathDB	gene	294311	295591	.	+	.	ID=Tb427.01.1020;Name=Tb427.01.1020;description=hypothetical+protein%2C+conserved;size=1281;web_id=Tb427.01.1020;locus_tag=Tb427.01.1020;size=1281
    Tb427_01_v4	EuPathDB	mRNA	294311	295591	.	+	.	"ID=rna_Tb427.01.1020-1;Name=Tb427.01.1020-1;description=Tb427.01.1020-1;size=1281;Parent=Tb427.01.1020;Dbxref=EuPathDB:Tb427.01.1020,taxon:1854310"
    Tb427_01_v4	EuPathDB	CDS	294311	295591	.	+	0	ID=cds_Tb427.01.1020-1;Name=cds;description=.;size=1281;Parent=rna_Tb427.01.1020-1
    Tb427_01_v4	EuPathDB	exon	294311	295591	.	+	.	ID=exon_Tb427.01.1020-1;Name=exon;description=exon;size=1281;Parent=rna_Tb427.01.1020-1
    Tb427_01_v4	EuPathDB	gene	297053	298426	.	+	.	ID=Tb427.01.1030;Name=Tb427.01.1030;description=hypothetical+protein%2C+conserved;size=1374;web_id=Tb427.01.1030;locus_tag=Tb427.01.1030;size=1374
    
    ... ... ...
    
    Tb427_09_v4	EuPathDB	rRNA	281365	281579	.	-	.	ID=rna_Tb427.02.1520-1;Name=Tb427.02.1520-1;description=Tb427.02.1520-1;size=215;Parent=Tb427.02.1520;Dbxref=EuPathDB:Tb427.02.1520,taxon:1854310
    Tb427_02_v4	EuPathDB	snRNA	1025274	1025348	.	+	.	ID=rna_Tb427.02.5680-1;Name=Tb427.02.5680-1;description=Tb427.02.5680-1;size=75;Parent=Tb427.02.5680;Dbxref=EuPathDB:Tb427.02.5680,taxon:1854310
    Tb427_04_v4	EuPathDB	tRNA	318221	318292	.	-	.	ID=rna_Tb427.04.1195-1;Name=Tb427.04.1195-1;description=Tb427.04.1195-1;size=72;Parent=Tb427.04.1195;Dbxref=EuPathDB:Tb427.04.1195,taxon:1854310
    Tb427_04_v4	EuPathDB	snRNA	326071	326164	.	+	.	ID=rna_Tb427.04.1213-1;Name=Tb427.04.1213-1;description=Tb427.04.1213-1;size=94;Parent=Tb427.04.1213;Dbxref=EuPathDB:Tb427.04.1213,taxon:1854310
    Tb427_04_v4	EuPathDB	scRNA_encoding	1287901	1287973	.	-	.	ID=rna_Tb427.04.4663:scRNA-1;Name=Tb427.04.4663%3AscRNA-1;description=Tb427.04.4663%3AscRNA-1;size=73;Parent=Tb427.04.4663:scRNA;Dbxref=EuPathDB:Tb427.04.4663:scRNA,taxon:1854310
    Tb427_08_v4	EuPathDB	transcript	861170	861430	.	-	.	ID=rna_Tb427.08.2861-1;Name=Tb427.08.2861-1;description=Tb427.08.2861-1;size=261;Parent=Tb427.08.2861;Dbxref=EuPathDB:Tb427.08.2861,taxon:1854310

    My gene references look as above. I've run TopHat with full gene references. Now I am running Cufflinks pipeline with -M mask.gff to exclude rows, I don't need except for 4 types of gene, mRNA, CDS and exon.

    My questions are:
    1) Do I use the gene references correctly? or I just need to run TopHat with gene reference with gene/mRNA/CDS/exon?

    2) I've checked few gene reference files provided by Cufflinks website, e.g. yeast. Those files only provide rows for both types of CDS and exon (I don't consider the TSS and TTS). What is the difference between references with "gene/mRNA/CDS/exon" and with "CDS/exon"? As my know, the gene can include multiple mRNAs, which can include multiple CDSs, so the region of gene should cover all of its mRNAs/CDSs. If I don't tell Cufflinks the region of gene, how Cufflinks know the gene's location by a reference file with "CDS/exon".

    3) In question 2), for the reference with "gene/mRNA/CDS/exon" and with "CDS/exon", which one do you prefer when you use Cufflinks?

    PS: I mean Cufflinks that is a pipeline including cufflinks, cuffdiff ... ...

    Many thanks in advance.

  • #2
    Hi all,

    It turns out that the annotation from Cufflinks website is in GTF2 format, whereas my annotation is in GFF3 format.

    I did a comparison between both files by converting the GFF3 to GTF2. So the items in GFT2 should be the same the GFF3 with CDS/exon except warning items.

    Anyway. I think I should redo my job from TopHat to Cufflinks again.

    Thanks everyone.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X