Homogenous amplification is of course expected, in fact required; what is more of an issue is irregular amplification (for some reason one fragment is amplified 100 times more than the average) which messes up any downstream quantitative analyses that relies on read counts. The idea is that exact same reads in the context of a large genome for a given sequencing depth is very unlikely. Again, exact duplicate removal is standard practice for many in the ChIP-SEQ community to avoid spurious peak calls. MACS for instance which is widely used has inbuilt duplicate removal:
Sometimes the same tag can be sequenced repeatedly, more times than expected from a random genome-wide tag distribution. Such tags might arise from biases during ChIP-DNA amplification and sequencing library preparation, and are likely to add noise to the final peak calls. Therefore, MACS removes duplicate tags in excess of what is warranted by the sequencing depth (binomial distribution p-value <10-5). For example, for the 3.9 million FoxA1 ChIP-Seq tags, MACS allows each genomic position to contain no more than one tag and removes all the redundancies.
Leave a comment: