Seqanswers Leaderboard Ad

**jeppepeppe** · 12-01-2013, 04:05 PM

I would try the new Gencode annotation file for mouse. It should be the most comprehensive annotation out there.

Page not found · GitHub Pages

http://www.gencodegenes.org/mouse_stats.html

Sorry, misunderstood.
Not sure why it's not in there.

**sp144** · 12-01-2013, 04:50 PM

Originally posted by jeppepeppe View Post

I would try the new Gencode annotation file for mouse. It should be the most comprehensive annotation out there.

Page not found · GitHub Pages

http://www.gencodegenes.org/mouse_stats.html

Sorry, misunderstood.
Not sure why it's not in there.

Thank you jeppepeppe,

that was a good suggestion, but I went and checked and both examples are indeed missing. From what I can tell right now, my only options are to:
a.) exclude genes not listed in ensembl
b.) use UCSC annotation and ID converter tools to retrieve ensembl annotation matching UCSC IDs. But in this process I'll lose ~ 10% of my data, so not ideal:

Id Converters Test

http://www.scribd.com/doc/18966500/Id-Converters-Test

A comparison test of the performance of six tools for ID conversion in human genomics

**Giulietta EnsemblHelpdesk** · 12-02-2013, 02:51 AM

Hi,

It's good to hear the biotype categorizations are useful to you.

Ensembl will have a more updated mouse gene set than what's on the GENCODE page, as the GENCODE set has been taken from a previous release of Ensembl. (GENCODE is using Ensembl genes- i.e. the merged set between Ensembl automatic annotation and Vega/Havana manual annotation).

We will have an update in mouse genes for the next release (e74), due out this week. (Release 74). This will include updated Vega/Havana manual annotation. I have checked the first gene you mention (KCNQ1OT1) on our test site, and it will be present in the next release.

I hope that helps.

**sp144** · 12-02-2013, 05:44 PM

Originally posted by Giulietta EnsemblHelpdesk View Post

Hi,

It's good to hear the biotype categorizations are useful to you.

Ensembl will have a more updated mouse gene set than what's on the GENCODE page, as the GENCODE set has been taken from a previous release of Ensembl. (GENCODE is using Ensembl genes- i.e. the merged set between Ensembl automatic annotation and Vega/Havana manual annotation).

We will have an update in mouse genes for the next release (e74), due out this week. (Release 74). This will include updated Vega/Havana manual annotation. I have checked the first gene you mention (KCNQ1OT1) on our test site, and it will be present in the next release.

I hope that helps.

Thank you, Giulietta!
That is indeed very helpful news and very lucky for me! My data is aligned to mouse mm9 (build 37) however. Will the e74 annotation only be available for mm10/NCBI38 coordinates? Will it be possible to perform a simple liftover back to mm9 coordinates?

I could of course re-align to mm10 (build 38), but for reasons relating to my custom-built pipeline, I'd prefer to stay in mm9 (build 37) if at all possible.
Thank you and Best wishes!

PS. on a related note I'm a bit unclear as to why the transcript biotypes and gene biotypes differ - is it because some transcripts of protein-coding genes are not translated, etc?

**Emily_Ensembl** · 12-03-2013, 01:36 AM

Hi sp44

We don't update old assemblies with the new annotation, so for NCBIm37 you will only see the release 67 annotation from May 2012, as that was the last release with the old assembly.

Gene and transcript biotypes differ because a gene will have multiple transcripts, which will each have their own biotypes. For example, this gene has some coding and some non-coding transcripts.

Emily

**Giulietta EnsemblHelpdesk** · 12-03-2013, 03:04 AM

Hello sp144,

To add to Emily's message, yes you can lift over coordinates of the new annotation to the older assembly. Ensembl provides an assembly converter tool for this:

Ensembl Tools

http://www.ensembl.org/info/docs/tools/index.html

By the way, if you have a list of genes which are not in the most current Ensembl database, we'd like you to send those along to Vega/Havana- they manually annotate genes which we then merge into our geneset generated by automatic annotation. The contact email is in the link:

Tools

http://www.sanger.ac.uk/resources/databases/vega/

Wellcome Sanger Institute tools directory

Best wishes,
Giulietta

**sp144** · 12-04-2013, 10:47 AM

Thank you Giulietta and Emily,

I took a look at the new assembly, but sadly Ipw is not annotated at all and Kcnq1ot1 is incorrectly annotated as being on the forward strand and consisting of 5 exons. It's actually on the reverse strand and consists of a single exon. I also don't understand why Kcnq1ot1 is capitalized in the gtf.

I'm surprised given that these genes have long been in Refseq and UCSC. I'll contact the Vega/Havana people - but I'm guessing these won't be updated until the next ensembl release. When do you think it will come out? Thank you!

**Giulietta** · 12-05-2013, 02:42 AM

Originally posted by sp144 View Post

Thank you Giulietta and Emily,

I took a look at the new assembly, but sadly Ipw is not annotated at all and Kcnq1ot1 is incorrectly annotated as being on the forward strand and consisting of 5 exons. It's actually on the reverse strand and consists of a single exon. I also don't understand why Kcnq1ot1 is capitalized in the gtf.

I'm surprised given that these genes have long been in Refseq and UCSC. I'll contact the Vega/Havana people - but I'm guessing these won't be updated until the next ensembl release. When do you think it will come out? Thank you!

I find Kcnq1ot1 in mouse on the forward strand, consisting of a single exon:

Gene: ENSMUST00000183938 - Mus_musculus - Ensembl genome browser 111

http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000098857;r=7:143251367-143251826;t=ENSMUST00000183938

Are we looking at the same gene?

**Emily_Ensembl** · 12-05-2013, 02:44 AM

I find four of them, KCNQ1OT1_1, KCNQ1OT1_2, KCNQ1OT1_3 and KCNQ1OT1_5, all neighbours on the forward strand.

Chromosome 7: 143,241,535-143,397,849 - Region in detail - Mus_musculus - Ensembl genome browser 111

http://www.ensembl.org/Mus_musculus/Location/View?db=core;g=ENSMUSG00000099151;r=7:143241535-143397849;t=ENSMUST00000183763

**sp144** · 12-05-2013, 10:03 AM

Yes, thank you Emily, in the gtf there are 4 entries, neighbors on the forward strand. But in Refseq and UCSC there is a single 1-exon transcript on the reverse strand, hence the name: Kcnq1 "opposite transcript" 1 = Kcnq1ot1.

I emailed the VEGA group, but no response yet.
Thank you.

**afrankish** · 12-09-2013, 09:37 AM

Hi

I'm from the HAVANA group at Sanger and although I haven't yet received your email via Vega, I was alerted to this thread via the Ensembl team.

Neither Ipw or Kcnq1ot1 had been manually annotated, but this is not entirely surprising as we have only just started genome-wide manual annotation of non-coding loci in mouse.

I have had the annotation for these loci updated and both will appear in future releases of GENCODE/Ensembl. Just to clarify, the GENCODE and Ensembl genesets are identical (essentially, for human and mouse, Ensembl displays the GENCODE geneset which is created via a merge of manual gene annotation and Ensembl gene predictions) and released in synch (this is well established for human, and while the Ensembl geneset for mouse has been created in the same way as human for several years the separate release of GENCODE gene annotation is more limited - GENCODE M1=Ensembl 65 and GENCODE M2=Ensembl 74). Updates to annotation can take some time to appear in new releases of GENCODE/Ensembl, however, it is possible to see updated manual annotation (which will be included in future releases) via the Vega browser. Click through 'Configure this page' and then click on the 'Havana update' box in the Genes and transcripts section. This track is updated approximately fortnightly.

I hope this is useful

**sp144** · 12-09-2013, 03:16 PM

Thank you, afrankish; I'm mostly looking for a gtf annotation that includes the very useful Ensembl biotype categories yet captures RefSeq and UCSC gene entries missing from Ensembl. I'm sure updating these transcript annotations is challenging, as they represent a moving target with increasing sequencing depth. I just wish there was a mechanism to "fast-track" entries from other major databases for annotation. Both of these genes have been in RefSeq and UCSC for quite some time.

I will keep an eye out for the next ensembl release. Thank you to everyone for contributing to this post - it was my first on seqanswers and I'm impressed that you VEGA and Ensembl folks responded so quickly. Thank you!

**Giulietta** · 12-10-2013, 01:53 AM

Hi sp144,

Just to clarify, in the Ensembl pipeline Kcnq1ot1 has been annotated from RFAM, which has four separate entries:

RF01946 KCNQ1OT1_1 KCNQ1 overlapping transcript 1 conserved region 1
RF01947 KCNQ1OT1_2 KCNQ1 overlapping transcript 1 conserved region 2
RF01948 KCNQ1OT1_3 KCNQ1 overlapping transcript 1 conserved region 3
RF01950 KCNQ1OT1_5 KCNQ1 overlapping transcript 1 conserved region 5

The Ensembl pipeline's strength is very much on coding sequences, and we prefer to receive annotation on ncRNAs from Havana (who manually annotate the genome). As afrankish points out, we merge the Havana manual annotation into the transcript set from the Ensembl automatic pipelines to create the GENCODE set.

We hope to have this annotation for you in the future.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Ensembl/NCBI/UCSC mouse gene annotations for cufflinks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News