We are trying to detect variants at very low frequency in human tissues using WES (UMI-UDI libraries), aiming for a mean depth of 3000x, but we are getting very high duplication rates. In our first experiments we started with very low inputs, 20 ng into the library construction (8 PCR cycles), and got 70% duplication rate. Now we have used inputs of 80 ng (with 4 PCR cycles), and 500 ng (no PCR), followed by 160 ng input of each sample into the pool (equimass for 10 samples) for exome capture. For sequencing they just loaded 1/4 of the capture output (around 12 ng) into the Novaseq 6000 S2 (850 Gb output). The surprise came that the duplication rate was still around 45-50%, and minimal difference between 80 and 500 ng input. Do you have a hint on what could cause this apparent lack of complexity? Need more PCR cycles, to pool samples asymmetrically depending on the input, to load all the sample in the sequencer? According to the sequencing service, they are at the technical limit, so is not possible to load more sample. Any hint will be super-welcome, thanks a lot!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Hi Ben3, thanks for responding! We have constructed our libraries using UMIs, with NEB kit and adapters. To detect duplicates we use picard gatk (UmiAwareMarkDuplicatesWithMateCigar, when we consider UMIs) or MarkDuplicates (if not considering UMIs). By the way, UMIs make little difference in our case.
https://gatk.broadinstitute.org/hc/e...icates-Picard-
"The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file."
While for 80 ng of input for the library high duplication rates could be expected, I am baffled to see pretty much the same with as much as 500 ng.
Comment
-
vghuici I'm more used to diverse samples and less duplicates, so to understand this a little better I put some useful resources below that will hopefully help you out.
https://biostar.galaxyproject.org/p/28822/index.html
https://www.biostars.org/p/399103/
https://www.biostars.org/p/112588/
https://dnatech.genomecenter.ucdavis...cr-duplicates/
https://bioinformatics.stackexchange...-as-duplicates
https://biostar.galaxyproject.org/p/28822/index.html
Comment
-
Hello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).
- Likes 1
Comment
-
Originally posted by luc View PostHello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).
- Likes 1
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
53 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment