Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapability or Uniqueness of Reference Genome

    The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

    I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

    Any ideas?

  • #2
    method for computing "mappable" regions of genome

    The program vmatch [1] can be used to find all pairs of regions longer than a length that are similar (or identical) to each other, where similarity can be defined in terms of maximum edit or hamming distance). Use the -dbnomatch option to report on all regions NOT participating in any such matches, i.e. are genome-wide dis-similar (or unique).

    [1] http://www.zbh.uni-hamburg.de/vmatch/

    I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?

    Comment


    • #3
      Originally posted by RockChalkJayhawk View Post
      The UCSC genome browser has a track that displays the mappability of short fragments to a reference genome. My question is, "Is there a tool to estimate the mappability of longer read lengths?"

      I am trying to define the parameters necessary to accurately align highly homologous gene family members and thier respective pseudogenes. They can have up to 90-95% identical sequences. I want to find a way to tell me the likelyhood of mapping all of the possible 500 bp sequences correctly and how many mismatches are present.

      Any ideas?
      Follow the same approach as the mappability tracks for your 500bp sequences. Sample 500bp across the interested regions and map with your favorite short-read aligner (with the proper sensitivity). Then filter and process the resulting alignments to generate your own track. If you need help, there are many starving bioinformaticians.

      Originally posted by malcook View Post
      I am unfamiliar with the UCSC mappability track. Can you provide a link to an example of this?
      Try the "Mapability" track under "Mapping and Sequencing Tracks".

      Comment


      • #4
        hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

        The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).

        Comment


        • #5
          Originally posted by malcook View Post
          hmmm - I don't see any mappability in either human or mouse at ucsc. (sounds of poking around) Oh, I see, it exists for human in hg18 but not most recent hg19. OK. Interesting. Thanks!

          The proposed `vmatch` based solution produces whole genome mappability boolean vector... a base is mappable if all the k-mers (i.e. 36mers) which span it are sufficiently unique (i.e. no match within edit distance of 2).
          It should not be too hard to generate these tracks using a short-read aligner and a reference genome. The methods of how each track is computed can be found in the track's details. I find these tracks extremely useful.

          Comment


          • #6
            What RockChalkJayhawk was asking for is challenging. It is difficult to find a suboptimal alignment 5-10% away for a 500bp sequence. I do not know how vmatch works, but probably it does not work well in this case.

            The right aligner for this task is ssaha2 or bwa/bwasw. But even with these tools, you still have a big chance of missing a suboptimal alignment 5-10% away.

            Comment


            • #7
              Mappability

              What I have done so far is to isolate the locus I am interested (either a 75 kb locus or 250kb), then I generated a custom perl script to make every possible n-length fragment. I then mapped it back to the entire reference looking for multiple matches and up to 5 mismatches using SeqMap and created a custom track to view it in UCSC. It's probably not the best approach, but it seems like its doing what I want it to.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 11:09 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 06:13 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X