So i completed a tophat-cufflinks-cuffcompare-cuffdiff workflow using the cuffcompare combined.gtf file as the reference in cuffdiff. My assumption was that the number of genes and transcripts in the the final cuffdiff files e.g. gene_exp.diff should be the same as in the input Cuffcompare.combined.gtf.
The input Cuffcompare.combined.gtf has 15126 distinct gene_ids while the cuffdiff output file gene_exp.diff has only 9029.
So 6027 genes are "missing"
The "missing genes are all single exon and are either class_code "." OR class_code "u" however not all genes of these classes are found in the "missing" list.
Does any one know if there is a known reason why not all genes would be present in the cuffdiff output files?
The input Cuffcompare.combined.gtf has 15126 distinct gene_ids while the cuffdiff output file gene_exp.diff has only 9029.
So 6027 genes are "missing"
The "missing genes are all single exon and are either class_code "." OR class_code "u" however not all genes of these classes are found in the "missing" list.
Does any one know if there is a known reason why not all genes would be present in the cuffdiff output files?
Comment