Unconfigured Ad

**Chipper** · 10-09-2010, 03:00 PM

ucsc liftOver

**dawe** · 10-09-2010, 11:18 PM

As long as there are liftOver chain (i.e. the "dictionary"), you can use liftOver, either online or by downloading the binary and the "dictionary". It works with intervals (BED files), so if you have to translate a wiggle file, convert it into bedgraph first.

d

**malachig** · 10-12-2010, 12:09 PM

To elaborate. In order to use 'liftOver' you need to download the executable tool and the right dictionary file (i.e. the one that corresponds to your current and target genome versions). Links:

Liftover executable & Liftover files.

Find your genome of interest, then follow the appropriate 'LiftOver Files' link, then find the file that corresponds to the two genome builds of interest (e.g. hg18ToHg19.over.chain.gz)

**hengdai** · 10-12-2010, 01:41 PM

I have a similar question. I ran some sequences using tophat/bowtie based on NCBI ref v37 instead of UCSChg19. So the files (sam, wig etc) have Ids like NC_0000001 instead of chr1. Unfortunately then I realized that these Ids doesn't work with IGV.
I heard that NCBI v37.3 and UCSC 19 are identical, so can I just use perl to run a replace on these text files? (I guess it is similar to liftover, but I did not see a dictionary for ncbi->hg)

Thanks
Heng

**malachig** · 10-12-2010, 02:09 PM

Liftover is primarily concerned with converting the coordinates of features on one genome build to the corresponding coordinates on a different genome build of the same species (or orthologous position on a different species build). The difference in naming of the chromosomes themselves is due to different conventions used by UCSC versus NCBI. You should be able to remap the names (but check to see if they have a one-to-one relationship). You can confirm that your NCBI build corresponds to a particular UCSC build here:
UCSC Releases

When dealing with genome builds from different sources, particularly human, it is important to think about how the source (NCBI, UCSC, Ensembl) deals with the haplotype chromosomes and unassembled contigs (those pieces of the genome that still have not been assigned to a chromosome). For these sequences, figuring out the mapping of names is not always obvious and worse, there isn't necessarily a one-to-one relationship. For example, NCBI may keep unassembled contigs from chr1 as separate entries whereas UCSC may place them in a 'chrUn' entry. Thankfully, the bulk of the human build (corresponding to chromosome 1-22, x, y, and the mitochondrial genome) should be consistent and one-to-one. UCSC provides detailed descriptions of the idiosyncrasies of each build on their website under 'assembly details' for each assembly.

**hengdai** · 10-12-2010, 03:12 PM

malachig,

I just compared h_sapiens_37_asm.fa and hg19.fa, they are indeed the same except names. Each with 25 fasta sequences so they are one-to-one. So I guess the unmapped ones are at least consistent for this version of human reference genome.

Thanks for the update. I guess I will just ran a replace for my text output now since it took me several days to ran my program. (I only have very limited nodes)

**foxyg** · 01-25-2011, 08:44 AM

I download the liftover and chain file, the instruction says

liftOver oldFile map.chain newFile unMapped

If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?

**d17** · 01-27-2011, 11:40 AM

Originally posted by foxyg View Post

I download the liftover and chain file, the instruction says

liftOver oldFile map.chain newFile unMapped

If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?

oldFile is the file of coordinates you want to convert (typically BED)
unMapped is a file that is created when you run liftOver: it contains those features in oldFile that did not lift over to the new coordinates, and gives the reason why (e.g. partially deleted)

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Translate coordinates between 2 references

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News