Seqanswers Leaderboard Ad

**apfejes** · 02-19-2009, 04:31 PM

Interesting question. In ChIP-seq, we often see "odd" stuff, which includes biases to certain sections of clearly unexpected regions of the genome. That often includes large "peaks" in centromeres, or just large stacks of duplicates.

However, while we don't know the sources of all of this "odd" stuff, we can account for most of it with good controls. (I doubt that the fragmentation is completely random, though, regardless of which method you use...)

If you're looking for other sources, many groups do a PCR step on their DNA before sequencing, which might preferentially amplify fragments, and of course, you are isolating DNA from a large population of cells, so it's possible that you're just getting a lot of pulled down material from a whole collection of cells where that signal is strong.

Anyhow, I would also suggest that your pipeline of how you handle the reads also makes a difference. You don't specify the aligner or the filtering techniques being used, so that makes it really hard to get to the bottom of what you're seeing.

Good luck making sense of your data!

**mingkunli** · 02-26-2009, 06:15 AM

Seems we have similar problem.
I have several identical reads and of course they mapped to the same position.
When I analyze the 454 data, i keep one and remove others, because it is likely caused by some technical problem.
But for Solexa data, I don't know any reason can make me remove them.

**dvh** · 02-26-2009, 10:59 AM

another source: the current human genome sequence is imperfect. there are likely sequences which are in fact repeats but do not appear so in the current genome assembly.

if we see 'read-towers' we regard them as artefact until proved otherwise.

**ieuanclay** · 03-10-2009, 06:17 AM

During library construction (454/Illumina etc...) almost all protocols have a PCR amplification stage, if only to get enough material to sequence. Unless you are expecting it, I would remove any exactly identical sequence reads if they were going to affect downstream analysis. Removing reads may sound like a bad thing, but we have found that the bias that is caused by keeping replicated reads can be huge (and muddies an already muddy pool!), so although it is conservative, and may be removing useful data, without any way to prove the reads come from idependant sources, i would always remove them. You might consider barcoding your library when you amplify (easy to do) and at least this way, any identical, but idependantly produced, sequences will now be seperable.

Sorry for the long post...

...

**basickler** · 03-12-2009, 09:26 AM

ieuanclay is correct. The duplication is caused by the library prep steps. We've found by lowering the number of PCR cycles or doing a 2 stage PCR instead you get less duplication. So basically you get so much sequence you're seeing 2 products of a PCR reaction sequenced.

It only works for paired end sequencing but I judge library diversity by looking at the number of identical paired end reads (same exact start-end for the pair). Weather you want to remove them or not is left up to you as, for a low diversity library, they can cause spurious SNP calls and such depending on the algorithm and the PCR fidelity.

And the purity filter doesn't work on alignment, just call quality. Think of it like trimming away the bad phred scores.

**tec** · 09-16-2009, 05:56 AM

duplicates in ChIPSeq

Hello,

i have exactly the same problem but find this thread just now

Please look at - http://seqanswers.com/forums/showthread.php?t=2592

Many thanks for your help, it is much appreciated!!!

tec

**tec** · 10-07-2009, 04:30 AM

multiple reads having the same sequence...

Hello all,

the problem with duplicate reads still keeps me busy..
Therefore we performed a Topo cloning resequencing check of the library.
Surprisingly, over 75% of the clones were unique - which doesn't correlate with the sequencing run!!!

Does anyone have an idea???

Thanks! tec

Topics	Statistics	Last Post
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, Today, 10:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 23 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM

Seqanswers Leaderboard Ad

Announcement

multiple reads having the same sequence...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News