Dear seqA community,
I'm assembling transcripts on the mouse reference annotations (*.gtf files) provided by Ensembl, NCBI and UCSC. Ideally, I would like to use Ensembl, because they annotate genes as protein-coding, non-coding, pseudo-genes, etc. But I have a problem with Ensembl: some important transcripts are not in the database, for example: Kcnq1ot1 or Ipw
Question #1: Why is that? Should I expect these and other similar genes to be included in a future version of Ensembl?
Both Refseq and UCSC have entries for these genes, but they lack the convenient categorization provided by Ensembl (protein-coding, non-coding, pseudogenes, etc.).
Question #2: I have been unable to find an equivalent categorization file matching UCSC or NCBI identifiers. Can someone point me in the right direction?
Thank you for any advice you can give!
I'm assembling transcripts on the mouse reference annotations (*.gtf files) provided by Ensembl, NCBI and UCSC. Ideally, I would like to use Ensembl, because they annotate genes as protein-coding, non-coding, pseudo-genes, etc. But I have a problem with Ensembl: some important transcripts are not in the database, for example: Kcnq1ot1 or Ipw
Question #1: Why is that? Should I expect these and other similar genes to be included in a future version of Ensembl?
Both Refseq and UCSC have entries for these genes, but they lack the convenient categorization provided by Ensembl (protein-coding, non-coding, pseudogenes, etc.).
Question #2: I have been unable to find an equivalent categorization file matching UCSC or NCBI identifiers. Can someone point me in the right direction?
Thank you for any advice you can give!
Comment