Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Anomilie
    Member
    • Jun 2013
    • 13

    Failed chip-seq experiments

    Hi All,

    I was wondering whether you could share your experiences and thoughts on the following scenario.

    Say you are analyzing someone elses data and as a first point of call you do some quality checking based on whichever guidelines are available (such as the Encode guidelines for Chip-seq for examples). During your quality assessment, you observe that the data is of extreme poor quality (lets continue with the chip-seq example and say you observe 85-95% PCR duplication etc). At this point you think, this experiment probably should be repeated but due to time restrictions, you continue with stringent parameters (including the PCR duplicates) and take the resulting overlapping peaks from 2 different methods (observed with IDR < 0.01 for 2 samples). You present the handful of peaks observed to the owner of the data, but they aren't happy with what they are seeing. They ask you to relax the parameters and flood the system with noise, violating any guidelines, just for the sake of having some data to base future experiments on.

    - Have you come across a scenario like this?
    - What are your thoughts on using a less stringent methods on an extremely poor data set?

    Thanks in advance for any responses
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Are you sure they're PCR duplicates rather than real duplicates? If so, how do you know?

    Comment

    • ffinkernagel
      Senior Member
      • Oct 2009
      • 110

      #3
      You know because a library of a million 'effective reads' (after dedup) distributed in the largish genome of a mammal isn't useful, and no antibody is so good that you'll only get the enriched regions. And if they were real, you'd still have many starts in a small region, not a start here, one there, one over yonder...

      I basically stand my ground. People come to me because I have the experience in these data sets, and my advice is: find the error and repeat the experiment.

      Do not waste time with this unusable data set. Mostly they come around when I start repeating the old 'if your positive control is no different from your negative control you can draw no conclusions from this experiment' mantra.

      If they're desperate or dense enough not to grasp the above point, it's best to cut your losses and suggest they find somebody else to look at their data.

      Comment

      • Anomilie
        Member
        • Jun 2013
        • 13

        #4
        Thanks ffinkernagel for sharing your experience.

        To answer your question, Brian Bushnell, the way that I assessed PCR duplicates is though the following:

        1) In FASTQC look at the duplication level tab, this will give a rough estimate.

        Result 90-95% duplication level.

        2) On the file containing aligned reads (usually bam file) calculate the fraction of non-redundant reads (NRF in Encode guidelines) by calculating the number of unique genomic positions/all uniquely mapped read.

        Result: 12-17% of reads are non-redundant

        3) Sort your aligned reads (bam file) according to chromosomal location and perform samtools rmdup and picard MarkDuplicates (with option REMOVE_DUPLICATES=TRUE). Calculate the percentage of reads remaining from the original and assess if there are any difference between samtools and Picard.

        Result: between 95- 99% of reads removed

        4) Visualize your aligned reads in a genome viewer such as IGV. If the reads stack up on top of each other, with black spaces in between stacks, rather than diagonally overlapping reads, you have PCR duplicates
        Last edited by Anomilie; 05-27-2014, 06:42 PM.

        Comment

        • Chacal
          Junior Member
          • Nov 2014
          • 3

          #5
          Anomilie,

          I do not get point 4 of your last post. Could you provide a visual for vertically stacked reads versus diagonally stacked reads from IGV? I am doing ATAC and I get lots of duplicate reads 50-90% despite all efforts to optimise cell numbers and PCR cycles for library amplification.

          Thanks you.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Point 4 is the most important one, or in my opinion, the only one really able to indicate a difference between PCR and real duplicates. If the coverage is randomly distributed (in terms of start coordinates), then duplication events are implied to be real duplicates that naturally result from high sequencing depth. However, if you have a few stacks in which all the reads line up perfectly with the same start location, and few or no reads starting between the stacks, this implies PCR duplicates.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              The percentage of duplicated reads is not meaningful without knowing the total number. It is possible to have an excellent ChIP to yield only a few million unique reads, if you sequence this on one lane most reads will be duplicates. And if you have one good replicate and one that failed then the IDR will only give you the common false positives. Just call peaks on unique starts and look at the wiggle in a genome browser.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                Today, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 06:09 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              37 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              42 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...