Hello all,
I am currently using cufflinks version 2.0.2 to analyze a set of pig RNAseq data. I am running cufflinks with the -g option using the Ensembl gtf for the latest build. When I do a count of the unique gene names in the ensembl gtf, there are 25,009 genes. I run the data through cufflinks using the default settings utilizing the -g option and when I check the output gtf, I only get 23,917 unique Ensembl gene names.
The documentation says that cufflinks will include all genes in the gtf in the output.
My question is this: Why would cufflinks be dropping these genes? Are there built in settings in cufflinks that would cause it to drop genes from different regions?
To check to make sure that these genes did indeed have reads aligning to the region they were in, I ran htseq using the tophat output and the original Ensembl gtf file and get read counts to those genes.
I am currently using cufflinks version 2.0.2 to analyze a set of pig RNAseq data. I am running cufflinks with the -g option using the Ensembl gtf for the latest build. When I do a count of the unique gene names in the ensembl gtf, there are 25,009 genes. I run the data through cufflinks using the default settings utilizing the -g option and when I check the output gtf, I only get 23,917 unique Ensembl gene names.
The documentation says that cufflinks will include all genes in the gtf in the output.
Tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled.
To check to make sure that these genes did indeed have reads aligning to the region they were in, I ran htseq using the tophat output and the original Ensembl gtf file and get read counts to those genes.
Comment