Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Translate coordinates between 2 references

    Hi,

    Are there any tools to translate the coordinates between 2 reference fasta files, such as HG18 and HG19. I need a tool which if I give 2 references and a indel file, and a list a of locations in 1 reference, then return the according locations in the other reference.

    I would hate to have to write that myself.

    Thanks

  • #2
    ucsc liftOver

    Comment


    • #3
      As long as there are liftOver chain (i.e. the "dictionary"), you can use liftOver, either online or by downloading the binary and the "dictionary". It works with intervals (BED files), so if you have to translate a wiggle file, convert it into bedgraph first.

      d

      Comment


      • #4
        To elaborate. In order to use 'liftOver' you need to download the executable tool and the right dictionary file (i.e. the one that corresponds to your current and target genome versions). Links:

        Liftover executable & Liftover files.

        Find your genome of interest, then follow the appropriate 'LiftOver Files' link, then find the file that corresponds to the two genome builds of interest (e.g. hg18ToHg19.over.chain.gz)

        Comment


        • #5
          I have a similar question. I ran some sequences using tophat/bowtie based on NCBI ref v37 instead of UCSChg19. So the files (sam, wig etc) have Ids like NC_0000001 instead of chr1. Unfortunately then I realized that these Ids doesn't work with IGV.
          I heard that NCBI v37.3 and UCSC 19 are identical, so can I just use perl to run a replace on these text files? (I guess it is similar to liftover, but I did not see a dictionary for ncbi->hg)

          Thanks
          Heng

          Comment


          • #6
            Liftover is primarily concerned with converting the coordinates of features on one genome build to the corresponding coordinates on a different genome build of the same species (or orthologous position on a different species build). The difference in naming of the chromosomes themselves is due to different conventions used by UCSC versus NCBI. You should be able to remap the names (but check to see if they have a one-to-one relationship). You can confirm that your NCBI build corresponds to a particular UCSC build here:
            UCSC Releases

            When dealing with genome builds from different sources, particularly human, it is important to think about how the source (NCBI, UCSC, Ensembl) deals with the haplotype chromosomes and unassembled contigs (those pieces of the genome that still have not been assigned to a chromosome). For these sequences, figuring out the mapping of names is not always obvious and worse, there isn't necessarily a one-to-one relationship. For example, NCBI may keep unassembled contigs from chr1 as separate entries whereas UCSC may place them in a 'chrUn' entry. Thankfully, the bulk of the human build (corresponding to chromosome 1-22, x, y, and the mitochondrial genome) should be consistent and one-to-one. UCSC provides detailed descriptions of the idiosyncrasies of each build on their website under 'assembly details' for each assembly.

            Comment


            • #7
              malachig,

              I just compared h_sapiens_37_asm.fa and hg19.fa, they are indeed the same except names. Each with 25 fasta sequences so they are one-to-one. So I guess the unmapped ones are at least consistent for this version of human reference genome.

              Thanks for the update. I guess I will just ran a replace for my text output now since it took me several days to ran my program. (I only have very limited nodes)

              Comment


              • #8
                I download the liftover and chain file, the instruction says

                liftOver oldFile map.chain newFile unMapped

                If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?

                Comment


                • #9
                  Originally posted by foxyg View Post
                  I download the liftover and chain file, the instruction says

                  liftOver oldFile map.chain newFile unMapped

                  If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?
                  oldFile is the file of coordinates you want to convert (typically BED)
                  unMapped is a file that is created when you run liftOver: it contains those features in oldFile that did not lift over to the new coordinates, and gives the reason why (e.g. partially deleted)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X