Seqanswers Leaderboard Ad

**Ben3** · 12-01-2022, 06:36 AM

vghuici that is a pretty high duplication rate. Add more cycles would really only increase the amount of duplication.

What method are you using to detect the duplicates? The method should be examining both the 5' and the 3' of the insert to make sure it's a true duplicate.

**vghuici** · 12-01-2022, 07:31 AM

Hi Ben3, thanks for responding! We have constructed our libraries using UMIs, with NEB kit and adapters. To detect duplicates we use picard gatk (UmiAwareMarkDuplicatesWithMateCigar, when we consider UMIs) or MarkDuplicates (if not considering UMIs). By the way, UMIs make little difference in our case.

https://gatk.broadinstitute.org/hc/e...icates-Picard-

"The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file."

While for 80 ng of input for the library high duplication rates could be expected, I am baffled to see pretty much the same with as much as 500 ng.

**Ben3** · 12-01-2022, 01:59 PM

vghuici I'm more used to diverse samples and less duplicates, so to understand this a little better I put some useful resources below that will hopefully help you out.

https://biostar.galaxyproject.org/p/28822/index.html
https://www.biostars.org/p/399103/
https://www.biostars.org/p/112588/
https://dnatech.genomecenter.ucdavis...cr-duplicates/
https://bioinformatics.stackexchange...-as-duplicates
https://biostar.galaxyproject.org/p/28822/index.html

**luc** · 12-09-2022, 03:26 PM

Hello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).

**vghuici** · 12-13-2022, 02:32 AM

Originally posted by luc View Post

Hello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).

Hi Luc, thank you for your comments; the hyb temperature seems not to be a problem, while we have performed the capture in pooling conditions as recommended by IDT (except that we performed 5 cycles after capture instead of the minimum of 6 cycles mentioned in the manual). Your comment about PCR-free library construction not being an optimal strategy to keep complexity could instead be quite on target. I have had this strong suspicion since we got a similar duplication rate with 80 ng (and 4 cycles) or 500 ng (PCR-free) of library input with the very same sample.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Duplication rate too high in WES

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News