I found this track a very important piece of information to have in mind while mapping reads to the human genome. To be honest I have not found it discussed that much out there (I may be wrong ). I hope it is helpful for someone.
I copy-pasted the most relevant information from the link. It belongs to a track called "Mapability or Uniqueness of Reference Genome" in the UCSC's Genome Browser
Description
To see it properly just go to the genome browser, go to your favourite locus and below, in the track list, it should appear.
These tracks display the level of sequence uniqueness of the reference hg18 genome. They were generated using different window sizes and high signal will be found in areas where the sequence is unique.
Methods
The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marker as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks.
The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks.
The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project.
The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project.
The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project.
Credits
The Broad alignability track was created by the Broad Institute (contact: [email protected]). Data generation and analysis was supported by funds from the NHGRI (the ENCODE project), the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute.
The Duke uniqueness and Duke excluded regions tracks were created by Terry Furey (contact: [email protected]) and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support.
The Rosetta uniqueness track was created by John Castle at Rosetta Inpharmatics (Merck) (contact: [email protected]), with assistance from Melissa Cline at UCSC.
The UMass uniqueness track was created by Bryan Lajoie (contact: [email protected]) in Job Dekker's Lab at the University of Massachusetts Medical School. Funding Support: NIH grant HG003143 to JD. Keck Distinguished Young Scholar Award to JD. This track was generated as part of the ENCODE project funded by the NHGRI.
I copy-pasted the most relevant information from the link. It belongs to a track called "Mapability or Uniqueness of Reference Genome" in the UCSC's Genome Browser
Description
To see it properly just go to the genome browser, go to your favourite locus and below, in the track list, it should appear.
These tracks display the level of sequence uniqueness of the reference hg18 genome. They were generated using different window sizes and high signal will be found in areas where the sequence is unique.
Methods
The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marker as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks.
The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks.
The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project.
The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project.
The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project.
Credits
The Broad alignability track was created by the Broad Institute (contact: [email protected]). Data generation and analysis was supported by funds from the NHGRI (the ENCODE project), the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute.
The Duke uniqueness and Duke excluded regions tracks were created by Terry Furey (contact: [email protected]) and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support.
The Rosetta uniqueness track was created by John Castle at Rosetta Inpharmatics (Merck) (contact: [email protected]), with assistance from Melissa Cline at UCSC.
The UMass uniqueness track was created by Bryan Lajoie (contact: [email protected]) in Job Dekker's Lab at the University of Massachusetts Medical School. Funding Support: NIH grant HG003143 to JD. Keck Distinguished Young Scholar Award to JD. This track was generated as part of the ENCODE project funded by the NHGRI.
Comment