Hi community
I am very confused to prepare the gene references for TopHat-Cufflinks.
My gene references look as above. I've run TopHat with full gene references. Now I am running Cufflinks pipeline with -M mask.gff to exclude rows, I don't need except for 4 types of gene, mRNA, CDS and exon.
My questions are:
1) Do I use the gene references correctly? or I just need to run TopHat with gene reference with gene/mRNA/CDS/exon?
2) I've checked few gene reference files provided by Cufflinks website, e.g. yeast. Those files only provide rows for both types of CDS and exon (I don't consider the TSS and TTS). What is the difference between references with "gene/mRNA/CDS/exon" and with "CDS/exon"? As my know, the gene can include multiple mRNAs, which can include multiple CDSs, so the region of gene should cover all of its mRNAs/CDSs. If I don't tell Cufflinks the region of gene, how Cufflinks know the gene's location by a reference file with "CDS/exon".
3) In question 2), for the reference with "gene/mRNA/CDS/exon" and with "CDS/exon", which one do you prefer when you use Cufflinks?
PS: I mean Cufflinks that is a pipeline including cufflinks, cuffdiff ... ...
Many thanks in advance.
I am very confused to prepare the gene references for TopHat-Cufflinks.
Code:
Tb427_01_v4 EuPathDB supercontig 1 1064569 . + . "ID=Tb427_01_v4;Name=Tb427_01_v4;description=Tb427_01_v4;size=1064569;web_id=Tb427_01_v4;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_01_v4,taxon:1854310" Tb427_10_v5 EuPathDB supercontig 1 4145152 . + . "ID=Tb427_10_v5;Name=Tb427_10_v5;description=Tb427_10_v5;size=4145152;web_id=Tb427_10_v5;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_10_v5,taxon:1854310" Tb427_11_01_v4 EuPathDB supercontig 1 4977113 . + . "ID=Tb427_11_01_v4;Name=Tb427_11_01_v4;description=Tb427_11_01_v4;size=4977113;web_id=Tb427_11_01_v4;molecule_type=dsDNA;organism_name=Trypanosoma brucei;translation_table=11;topology=linear;localization=nuclear;Dbxref=EuPathDB:Tb427_11_01_v4,taxon:1854310" Tb427_01_v4 EuPathDB gene 57396 59181 . - . ID=Tb427.01.100;Name=Tb427.01.100;description=RNA+polymerase+%28pseudogene%29%2C+putative;size=1786;web_id=Tb427.01.100;locus_tag=Tb427.01.100;size=1786 Tb427_01_v4 EuPathDB mRNA 57396 59181 . - . "ID=rna_Tb427.01.100-1;Name=Tb427.01.100-1;description=Tb427.01.100-1;size=1786;Parent=Tb427.01.100;Ontology_term=GO:0003899,GO:0006350,GO:0032549;Dbxref=EuPathDB:Tb427.01.100,taxon:1854310" Tb427_01_v4 EuPathDB CDS 59086 59181 . - 0 ID=cds_Tb427.01.100-1;Name=cds;description=.;size=96;Parent=rna_Tb427.01.100-1 Tb427_01_v4 EuPathDB CDS 57396 59081 . - 0 ID=cds_Tb427.01.100-1;Name=cds;description=.;size=1686;Parent=rna_Tb427.01.100-1 Tb427_01_v4 EuPathDB exon 59086 59181 . - . ID=exon_Tb427.01.100-1;Name=exon;description=exon;size=96;Parent=rna_Tb427.01.100-1 Tb427_01_v4 EuPathDB exon 57396 59081 . - . ID=exon_Tb427.01.100-2;Name=exon;description=exon;size=1686;Parent=rna_Tb427.01.100-1 Tb427_01_v4 EuPathDB gene 289975 291237 . + . ID=Tb427.01.1000;Name=Tb427.01.1000;description=developmentally+regulated+phosphoprotein;size=1263;web_id=Tb427.01.1000;locus_tag=Tb427.01.1000;size=1263 Tb427_01_v4 EuPathDB mRNA 289975 291237 . + . "ID=rna_Tb427.01.1000-1;Name=Tb427.01.1000-1;description=Tb427.01.1000-1;size=1263;Parent=Tb427.01.1000;Ontology_term=GO:0005524;Dbxref=EuPathDB:Tb427.01.1000,NCBI_gi:261326019,taxon:1854310" Tb427_01_v4 EuPathDB CDS 289975 291237 . + 0 ID=cds_Tb427.01.1000-1;Name=cds;description=.;size=1263;Parent=rna_Tb427.01.1000-1 Tb427_01_v4 EuPathDB exon 289975 291237 . + . ID=exon_Tb427.01.1000-1;Name=exon;description=exon;size=1263;Parent=rna_Tb427.01.1000-1 Tb427_01_v4 EuPathDB gene 291667 293826 . + . ID=Tb427.01.1010;Name=Tb427.01.1010;description=hypothetical+protein%2C+conserved;size=2160;web_id=Tb427.01.1010;locus_tag=Tb427.01.1010;size=2160 Tb427_01_v4 EuPathDB mRNA 291667 293826 . + . "ID=rna_Tb427.01.1010-1;Name=Tb427.01.1010-1;description=Tb427.01.1010-1;size=2160;Parent=Tb427.01.1010;Ontology_term=GO:0008270;Dbxref=EuPathDB:Tb427.01.1010,taxon:1854310" Tb427_01_v4 EuPathDB CDS 291667 293826 . + 0 ID=cds_Tb427.01.1010-1;Name=cds;description=.;size=2160;Parent=rna_Tb427.01.1010-1 Tb427_01_v4 EuPathDB exon 291667 293826 . + . ID=exon_Tb427.01.1010-1;Name=exon;description=exon;size=2160;Parent=rna_Tb427.01.1010-1 Tb427_01_v4 EuPathDB gene 294311 295591 . + . ID=Tb427.01.1020;Name=Tb427.01.1020;description=hypothetical+protein%2C+conserved;size=1281;web_id=Tb427.01.1020;locus_tag=Tb427.01.1020;size=1281 Tb427_01_v4 EuPathDB mRNA 294311 295591 . + . "ID=rna_Tb427.01.1020-1;Name=Tb427.01.1020-1;description=Tb427.01.1020-1;size=1281;Parent=Tb427.01.1020;Dbxref=EuPathDB:Tb427.01.1020,taxon:1854310" Tb427_01_v4 EuPathDB CDS 294311 295591 . + 0 ID=cds_Tb427.01.1020-1;Name=cds;description=.;size=1281;Parent=rna_Tb427.01.1020-1 Tb427_01_v4 EuPathDB exon 294311 295591 . + . ID=exon_Tb427.01.1020-1;Name=exon;description=exon;size=1281;Parent=rna_Tb427.01.1020-1 Tb427_01_v4 EuPathDB gene 297053 298426 . + . ID=Tb427.01.1030;Name=Tb427.01.1030;description=hypothetical+protein%2C+conserved;size=1374;web_id=Tb427.01.1030;locus_tag=Tb427.01.1030;size=1374 ... ... ... Tb427_09_v4 EuPathDB rRNA 281365 281579 . - . ID=rna_Tb427.02.1520-1;Name=Tb427.02.1520-1;description=Tb427.02.1520-1;size=215;Parent=Tb427.02.1520;Dbxref=EuPathDB:Tb427.02.1520,taxon:1854310 Tb427_02_v4 EuPathDB snRNA 1025274 1025348 . + . ID=rna_Tb427.02.5680-1;Name=Tb427.02.5680-1;description=Tb427.02.5680-1;size=75;Parent=Tb427.02.5680;Dbxref=EuPathDB:Tb427.02.5680,taxon:1854310 Tb427_04_v4 EuPathDB tRNA 318221 318292 . - . ID=rna_Tb427.04.1195-1;Name=Tb427.04.1195-1;description=Tb427.04.1195-1;size=72;Parent=Tb427.04.1195;Dbxref=EuPathDB:Tb427.04.1195,taxon:1854310 Tb427_04_v4 EuPathDB snRNA 326071 326164 . + . ID=rna_Tb427.04.1213-1;Name=Tb427.04.1213-1;description=Tb427.04.1213-1;size=94;Parent=Tb427.04.1213;Dbxref=EuPathDB:Tb427.04.1213,taxon:1854310 Tb427_04_v4 EuPathDB scRNA_encoding 1287901 1287973 . - . ID=rna_Tb427.04.4663:scRNA-1;Name=Tb427.04.4663%3AscRNA-1;description=Tb427.04.4663%3AscRNA-1;size=73;Parent=Tb427.04.4663:scRNA;Dbxref=EuPathDB:Tb427.04.4663:scRNA,taxon:1854310 Tb427_08_v4 EuPathDB transcript 861170 861430 . - . ID=rna_Tb427.08.2861-1;Name=Tb427.08.2861-1;description=Tb427.08.2861-1;size=261;Parent=Tb427.08.2861;Dbxref=EuPathDB:Tb427.08.2861,taxon:1854310
My gene references look as above. I've run TopHat with full gene references. Now I am running Cufflinks pipeline with -M mask.gff to exclude rows, I don't need except for 4 types of gene, mRNA, CDS and exon.
My questions are:
1) Do I use the gene references correctly? or I just need to run TopHat with gene reference with gene/mRNA/CDS/exon?
2) I've checked few gene reference files provided by Cufflinks website, e.g. yeast. Those files only provide rows for both types of CDS and exon (I don't consider the TSS and TTS). What is the difference between references with "gene/mRNA/CDS/exon" and with "CDS/exon"? As my know, the gene can include multiple mRNAs, which can include multiple CDSs, so the region of gene should cover all of its mRNAs/CDSs. If I don't tell Cufflinks the region of gene, how Cufflinks know the gene's location by a reference file with "CDS/exon".
3) In question 2), for the reference with "gene/mRNA/CDS/exon" and with "CDS/exon", which one do you prefer when you use Cufflinks?
PS: I mean Cufflinks that is a pipeline including cufflinks, cuffdiff ... ...
Many thanks in advance.
Comment