Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting an early genome assembly to current coordinates?

    Hi there,

    I have been involved in a project where I have aligned my sequence data to MGSCv3, the first build of the mouse genome, which consists of ~250,000 contigs. My project is testing whether I can use sequencing technique to order and reconstitute these contigs into a more 'complete' genome.

    As such, I would like to see how accurately I have ordered my MGSCv3-data by comparing it to the actual locations of each of the 250k contigs in the latest build of the mouse genome (GRCm38 / mm10). I initially did a 'dirty' approach of just taking the first 100nt of each contig, and performing a bwa aln to the latest build, but I would like to get more accurate localizations.

    Initially I thought I could just find the current mm10 coordinates of the MGSCv3 accession numbers or gi numbers in NCBI, but I can't locate such a table.

    Then I thought I could use LiftOver to find the coordinates, but the assembly versions don't go back far enough in UCSC (they only support liftOvers from mm7 onward). Then I tried BLAT or BLAST, but the online versions couldn't handle the number of records I want to analyze, and I couldn't find a good way to implement a local installation to do this.

    Finally, I've been looking at NCBI remap, but again the web-based version cannot handle the number of records, and I can't find a way to implement this locally. Also, the identifiers for remap MGSCv3 are different to the identifiers I have. From the NCBI-downloaded build, each fasta region is in the format

    "gi|20564479|emb|CAAA01000001.1|,9601"

    while remap wants the location in the format

    "chrMmUn_WIFeb01_42457:1 -9600"

    I was wondering if this community has any ideas on how to convert bulk records from a very early reference assembly to a later version? Or if there are any repositories that would contain this information?

    Any advice would be greatly appreciated!

  • #2
    LiftOver files for the older genome builds for mouse are available via UCSC archives: http://genome-archive.cse.ucsc.edu/downloads.html

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM
    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 05-24-2024, 07:15 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 10:28 AM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 07:35 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-22-2024, 02:06 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Working...
    X