Hi all,
I'm using Cufflinks to calculate transcript abundances on my human RNA seq data but it seems to be having a problem matching transcripts in my data to their correct gene ID's from a reference GTF. I generated a RefSeq hg18.gtf annotation file from the UCSC table browser and loaded that for into cufflinks for all of my samples. About 60% of the loci it processes are labelled with their RefSeq ID, but many of them simply have a generic "CUFF.1" ID attached to them, even though the associated genomic loci are present in my annotation.
For example, This is what I see in my genes.fpkm_tracking file for one locus
CUFF.2 - - CUFF.2 - - chr1:4224-19255 - - OK 47.7646 38.9667 56.5625
But in my annotation that locus is clearly listed as WASH7P
chr1 hg18_refFlat exon 4225 4692 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 4833 4901 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 5659 5810 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 6470 6628 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 6721 6918 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7096 7231 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7469 7605 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7778 7924 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 8131 8229 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 14601 14754 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 19184 19233 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
Am I using the wrong format annotation file for this?
I'm using Cufflinks to calculate transcript abundances on my human RNA seq data but it seems to be having a problem matching transcripts in my data to their correct gene ID's from a reference GTF. I generated a RefSeq hg18.gtf annotation file from the UCSC table browser and loaded that for into cufflinks for all of my samples. About 60% of the loci it processes are labelled with their RefSeq ID, but many of them simply have a generic "CUFF.1" ID attached to them, even though the associated genomic loci are present in my annotation.
For example, This is what I see in my genes.fpkm_tracking file for one locus
CUFF.2 - - CUFF.2 - - chr1:4224-19255 - - OK 47.7646 38.9667 56.5625
But in my annotation that locus is clearly listed as WASH7P
chr1 hg18_refFlat exon 4225 4692 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 4833 4901 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 5659 5810 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 6470 6628 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 6721 6918 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7096 7231 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7469 7605 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 7778 7924 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 8131 8229 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 14601 14754 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
chr1 hg18_refFlat exon 19184 19233 0.000000 - . gene_id "WASH7P"; transcript_id "WASH7P";
Am I using the wrong format annotation file for this?