Seqanswers Leaderboard Ad

**Chipper** · 09-05-2011, 05:08 AM

Are you using the multiplexing primers from Illumina? We had similar problems and got a much better yield of unique fragments after switching to full-length adaptors.

**OptimusBrien** · 09-05-2011, 05:12 AM

No we aren't using the multiplexing primers....

**protist** · 09-05-2011, 05:29 AM

Check out SEQanswers Thread and threads therein:

Removing duplicates is it really necessary? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=6854

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

& http://ngsbuzz.blogspot.com/2010/10/...explained.html

**mudshark** · 09-05-2011, 05:49 AM

a) what are your parameters to define "poor results"?

b) i do not agree that read duplications are PCR artifacts in general. did you evaluate your data with/without de-duplication? did you run a control sample (input)? if yes, do you observe high duplication in the control as well?

**niceday** · 09-05-2011, 05:51 AM

It sounds like a ligation buffer problem. If the ligation buffer isnt stored properly or has too many freeze thaw cycles it stops working. If your ligation efficiency is very low then the complexity of your library is reduced and after PCR you get duplication because of the low amount of correctly ligated starting material.

I have seen this problem quite a few times.

**simonandrews** · 09-06-2011, 12:00 AM

In general you want to do as few PCR cycles as you can. There's no point in doing 18 cycles if you only need 12 to get enough material. We've seen diversity in libraries dramatically increase from a reduction of only 3-4 PCR cycles.

Most of the time though the PCR cycles aren't the root cause of your problem. Some other step earlier in the library prep is normally causing too great a loss of material which then necessitates the extra PCR in order to make the library. Getting as much material through the library construction is the key. I know that many of our scientists have found that they were able to eliminate some of the intermediate cleanup steps to reduce the amount of material loss which seems to make a big difference, especially from small amounts of starting material.

It's also worth checking that you really have a problem with duplication. FastQC makes a general check for duplication levels, but failing the duplicate sequences test doesn't necessarily indicate a problem. In particular if you have a relatively small number of highly enriched sites in your ChIP then you may have saturated the set of potential start sites for reads, and will then start creating duplicates. The real problem comes where you can see duplicated reads in an enriched region which isn't saturated, since these suggest the duplication is technical rather than biological. If you said you could see the duplication when you looked at the data I suspect this is a real problem in your data, but it's worth checking.

**JohnK@Genome_Quest** · 09-14-2011, 10:12 AM

Originally posted by OptimusBrien View Post

Hey,
I recently got several ChIP seq datasets back from our collaborators. Upon analysing the results I was quite surprised to find that the results were poor, surprised given the fact that the antibodies all work in ChIP and these ChIPs in particular definitely worked as they were QC'd as much as possible regards enrichments at known binding sites etc.....

Anyway it was suggested to me to run the raw sequencing files through FASTQC which I did. I had noticed during the analysis a high level of read duplication in the libraries and sure enough FASTQC picked up on this as well. All of the libraries fail miserably on this parameter.

My question is, where does this high level of read duplication come from? Surely it has to be from the PCR amplication in the library prep protocol (I did 18 cycles). Should I expect much better results if I used fewer rounds of PCR - 12,14.....something like this?

Thanks
Optimus

Large duplication in SE ChIP-Seq libraries is a typical result of PCR. This can directly reflect the difficulty in isolating your protein via cross-linkers or how effective your antibodies are. Take this example, say you had a very small initial amount of DNA and had to bring your library up to a proper size for sequencing. You bind your antibodies, perform your ChIP, and then you wash away, PCR, and then sequence. You performed your PCR as such. You then sequence. Naturally, you'll expect many clones of the same DNA fragment due to the small initial DNA size. When you map them, you'll see a large number of duplicates, and you must remove them as they'll give you uninformative coverage, and have a direct effect on your peak-calling ability. As the above poster said, you want as many unique start sites (uniquely mapped reads) as possible. A work-around is ensuring you've effectively ChIP'ed your binding sites and sometimes this requires changes in the methods and materials. I've personally seen duplication levels as high as 90% and as low as 40%. In all cases, I remove the duplicates. However, this is all relative. I would maybe consider not removing them if your throughput was very, very low, but these things are so parameterized, and one important thing to consider is the method used in creating your inputs/controls. Having an adequate background could save your low throughput as well.

**ETHANol** · 09-15-2011, 08:23 AM

A lot of great points said already on this thread. I'd just like to add that I am able to substantially decrease the number of amplification PCR cycles by using the Kapa HF polymerase instead of Phusion polymerase. I am not a Kapa salesman, but I am just convinced it is a superior product.

That being said, as noted previously, it is not the root cause of your problem if is even a problem at all. While it will not solve your problems, it will push things in the direction you want, i.e. great efficiency through the library generation process and reduce GC-bias as a bonus.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Very high duplication of sequences in ChIP-Seq sequencing results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News