Hi!
I'm working with Cufflinks and I need help urgently!
According to the documentation of Cufflinks, the -g <reference_annotation.(gtf/gff)> option does the following:
Tells Cufflinks to use the reference Supplied annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to Provide additional information in assembly. Output will include all reference transcripts as well as novel genes and isoforms Any That are assembled.
Therefore we should expect that in output files, all genes from GTF reference file (57,559 annotated genes) would appear and this files should have at least 57,000 lines (one for each gene).
However, I've got an 89,063 FPKMs genes file, 32,000 of which are annotated genes and the rest are potential new genes...
The only explanation I find is that although the documentation says, the GTF reference file genes not found in SAM file (because they were not expressing I guess), are being ignored. But then, why appear genes with a FPKM of 0?
Maybe they are expressed genes but after normalization the FPKM value is so low that Cufflinks rounded it to 0?.
In addition, there is another problem with the results of Cufflinks:
Cufflinks generates 2 files transcripts.gtf and genes.fpkm_tracking.
First one contains all the genes and isomorphs assemblies, while second one contains FPKM values per gene without isomorphisms.
It should be expected that a gene appears once in the previous file, but sometimes genes appear several times with different FPKM values and I'm not able to find a criteria to discriminate which is better than the other ...
Any idea?
Thanks and best regards!
I'm working with Cufflinks and I need help urgently!
According to the documentation of Cufflinks, the -g <reference_annotation.(gtf/gff)> option does the following:
Tells Cufflinks to use the reference Supplied annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to Provide additional information in assembly. Output will include all reference transcripts as well as novel genes and isoforms Any That are assembled.
Therefore we should expect that in output files, all genes from GTF reference file (57,559 annotated genes) would appear and this files should have at least 57,000 lines (one for each gene).
However, I've got an 89,063 FPKMs genes file, 32,000 of which are annotated genes and the rest are potential new genes...
The only explanation I find is that although the documentation says, the GTF reference file genes not found in SAM file (because they were not expressing I guess), are being ignored. But then, why appear genes with a FPKM of 0?
Maybe they are expressed genes but after normalization the FPKM value is so low that Cufflinks rounded it to 0?.
In addition, there is another problem with the results of Cufflinks:
Cufflinks generates 2 files transcripts.gtf and genes.fpkm_tracking.
First one contains all the genes and isomorphs assemblies, while second one contains FPKM values per gene without isomorphisms.
It should be expected that a gene appears once in the previous file, but sometimes genes appear several times with different FPKM values and I'm not able to find a criteria to discriminate which is better than the other ...
Any idea?
Thanks and best regards!