Seqanswers Leaderboard Ad

**simonandrews** · 03-27-2012, 11:48 PM

I don't think there is an easy way to do this from within Ensembl. You can use their BioMart interface to bulk download gene related information, but this doesn't work for other feature types. Their recommendation is to use their Perl API to pull down this kind of data, but if that's not something you're comfortable with then I guess that's not much help.

You can actually get at this kind of data much more easily from UCSC. Their table browser system allows you to export any of the annotation tracks into a simple text format which should be easy to import into SeqMonk.

As an aside, which genome are you using? CpG islands should be a standard track in the latest releases of genomes which contain this track.

**jjw14** · 03-29-2012, 05:45 AM

Simon,

Thank you for the information. Sorry for the delay in my response.

I agree with you that the UCSC Table Browser is a great resource, and I have used it before for exporting specific tracks, including CpG Islands.

The reason I wanted to get the CpG Island information from Ensembl was that I have imported a custom genome (pig; Sus scrofa 10.2) into SeqMonk by modifying the EMBL formatted files from as you had described in your help file "Creating a Custom Genome".

Currently, the UCSC table browser is supporting the Nov. 2009, SGSC Sscrofa 9.2/Sscrofa2), so I wasn't sure if the CpG Islands exported from USCS would be compatible with the current S. scrofa 10.2 genome.

I have to admit that the differences in nomenclature for genomes of the same species from NCBI, Ensembl, etc. are still confusing to me, even though I have tried on numerous occasions to determine compatibility. For this reason, I wanted to obtain all data that I am going to put into SeqMonk from the same place. Yes, it's ignorance on my part, but I don't want to risk generating erroneous results.

Thank you for your fast response and advice. If you have any more input, I would be glad to hear it.

jjw

**simonandrews** · 03-29-2012, 05:58 AM

For this you'd have to go to the Ensembl API, though as pre.ensembl isn't in a release yet I'm not actually sure how you'd connect to that database to be able to run queries.

Hopefully the pig assembly will make it into a full ensembl release soon, at which point we'll add it to our list of supported genomes and you'll have the CpG island tracks present.

**jjw14** · 03-29-2012, 06:42 AM

Thanks, Simon. If I figure out how, I will post the method I used here in case some other want to obtain similar data.

jjw

**acongras** · 05-31-2012, 07:11 AM

Hello jjw,

I've just read this :

Originally posted by jjw14 View Post

I have imported a custom genome (pig; Sus scrofa 10.2) into SeqMonk by modifying the EMBL formatted files from as you had described in your help file "Creating a Custom Genome".

jjw

I am currently trying to do the same thing and I would need some tips..
I have downloaded the EMBL files (from 0.dat to 7000.dat) into my Genome directory. It seems that Seqmonk can open them without big troubles even if there are several scaffolds and AC lines into each files.
For many scaffolds, SeqMonk can attribute them to their specific chromosomes so the genome is almost recreated. But some other scaffolds are not attributed to any chromosome and are considered by Seqmonk as very very small independent chromosomes.

My questions are : do you have the same result? If not, how did you modify the files to get a full assembled genome?

Thanks for your help.

**simonandrews** · 05-31-2012, 07:32 AM

Which genome are you trying to use? I've just seen that the pig 10.2 assembly is now released into the main Ensembl, so I've just just kicked off the processing scripts to add it to the supported genomes in SeqMonk. It should be there late tonight or early tomorrow.

In general you can use the EMBL files exported by ensembl, but you only want to use the contigs which form part of the main chromosomes. There are a number of short scaffolds which aren't included in the main assembly (normally with names ending in _random), and it is these which will mess up the genome building in SeqMonk because it will treat each of these as a separate chromosome. In the API you can pull down slices only of type 'chromosome', but from the exported EMBL files you'll need to look at the names of the chromsome and filter out those which aren't actually part of the main assembly.

**simonandrews** · 05-31-2012, 11:40 PM

The Sus scrofa 10.2 genome assembly should now be available as a supported genome.

**acongras** · 06-07-2012, 11:35 PM

Thanks for adding this genome, and for your quick answers.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

SeqMonk: Export features (e.g. CpG Islands) from Ensembl for import into SeqMonk?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News