Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplication rate too high in WES

    We are trying to detect variants at very low frequency in human tissues using WES (UMI-UDI libraries), aiming for a mean depth of 3000x, but we are getting very high duplication rates. In our first experiments we started with very low inputs, 20 ng into the library construction (8 PCR cycles), and got 70% duplication rate. Now we have used inputs of 80 ng (with 4 PCR cycles), and 500 ng (no PCR), followed by 160 ng input of each sample into the pool (equimass for 10 samples) for exome capture. For sequencing they just loaded 1/4 of the capture output (around 12 ng) into the Novaseq 6000 S2 (850 Gb output). The surprise came that the duplication rate was still around 45-50%, and minimal difference between 80 and 500 ng input. Do you have a hint on what could cause this apparent lack of complexity? Need more PCR cycles, to pool samples asymmetrically depending on the input, to load all the sample in the sequencer? According to the sequencing service, they are at the technical limit, so is not possible to load more sample. Any hint will be super-welcome, thanks a lot!

  • #2
    vghuici that is a pretty high duplication rate. Add more cycles would really only increase the amount of duplication.

    What method are you using to detect the duplicates? The method should be examining both the 5' and the 3' of the insert to make sure it's a true duplicate.

    Comment


    • #3
      Hi Ben3, thanks for responding! We have constructed our libraries using UMIs, with NEB kit and adapters. To detect duplicates we use picard gatk (UmiAwareMarkDuplicatesWithMateCigar, when we consider UMIs) or MarkDuplicates (if not considering UMIs). By the way, UMIs make little difference in our case.

      https://gatk.broadinstitute.org/hc/e...icates-Picard-

      "The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file."

      While for 80 ng of input for the library high duplication rates could be expected, I am baffled to see pretty much the same with as much as 500 ng.

      Comment


      • #4
        vghuici I'm more used to diverse samples and less duplicates, so to understand this a little better I put some useful resources below that will hopefully help you out.

        https://biostar.galaxyproject.org/p/28822/index.html
        https://www.biostars.org/p/399103/
        https://www.biostars.org/p/112588/
        https://dnatech.genomecenter.ucdavis...cr-duplicates/
        https://bioinformatics.stackexchange...-as-duplicates
        https://biostar.galaxyproject.org/p/28822/index.html

        Comment


        • #5
          Hello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
          More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
          PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).

          Comment


          • #6
            Originally posted by luc View Post
            Hello vghuici, the inefficiency bottleneck could be occurring at either the library prep stage or hybridization capture & re-amplification stage (e.g. bait concentration could be too low; hybridization temperature off?). Since you already modified the first part in several ways perhaps the second part causes the problems?
            More QC data for each stage of the process would be helpful. How many PCR cycles after the capture?
            PCR-free libraries are not necessarily beneficial for exome capture in my eyes. I would suggest running at least 5 PCR cycles before the hybridization capture to enrich for complete Illumina libraries (with both p5 and p7 sequences).
            Hi Luc, thank you for your comments; the hyb temperature seems not to be a problem, while we have performed the capture in pooling conditions as recommended by IDT (except that we performed 5 cycles after capture instead of the minimum of 6 cycles mentioned in the manual). Your comment about PCR-free library construction not being an optimal strategy to keep complexity could instead be quite on target. I have had this strong suspicion since we got a similar duplication rate with 80 ng (and 4 cycles) or 500 ng (PCR-free) of library input with the very same sample.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X