Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • skip56558
    Junior Member
    • Feb 2012
    • 2

    autocorrelation pattern in ChIP-seq alignments

    Hello,

    We have ChIP-seq data that was from a single-end run with 35 bp reads. There are a few samples, with a different antibody used in each one. We aligned the reads and created autocorrelation plots (sometimes called cross-correlation) using HOMER and SPP. The DNA fragment length is around 150 bp, so we expect to see a single large peak at 150 bp.

    Some of the samples look as we expect, but some have a large peak at 35 bp, and a small peak at 150 bp. Does this mean that something is wrong with these samples?

    Thanks!
  • mudshark
    Senior Member
    • Jan 2009
    • 138

    #2
    in fact it is a cross-correlation not an autocorrelation.

    as regards your question: i have seen this before and I don't think it is a problem in the first place. It probably depends on the 'true' fragment size of your target bound DNA, the signal-to-noise ratio and the abundance of target sites. i.e. if your signal to noise is low and the target sites are just a few you will get the average fragment size determined by the size selection step. if you have a good signal to noise and the target protein protects 35 bp of DNA you might get a cross correlation of 35bp.

    Comment

    • Chipper
      Senior Member
      • Mar 2008
      • 323

      #3
      It's the other way around - good signal to noise gives av fragment size, else the correlation is dominated by a peak of exactly the read length. Not sure why though, but has nothing to do with protein DNA protection.

      Comment

      • cwhelan
        Member
        • Nov 2010
        • 23

        #4
        This very insightful and helpful post by Anshul Kundaje on the MACS mailing list has a really good theory involving the mappability of the genome for why you see this pattern in non-enriched ChIP-seq data sets:

        Comment

        • skip56558
          Junior Member
          • Feb 2012
          • 2

          #5
          Thank you all for your responses!

          I've looked at the data again, and the best cross-correlation profiles are from the best antibodies, so your explanations make sense.

          I only have one lingering question: is the data from the not-as-good cross-correlation profiles still usable? That is, do we need to repeat those entire experiments, or will MACS be able to identify the real peaks?

          Many thanks!

          skip56558

          Comment

          • cwhelan
            Member
            • Nov 2010
            • 23

            #6
            In my experience I have not found realistic-looking or useable peaks in these types of data sets, unfortunately. I usually try to examine some of the peaks in a browser - you can tell pretty quickly if they look like real ChIP-seq peaks, which are very enriched compared to the background, or just like slightly higher regions in a noisy background. Another way to check is to run your peaks through an annotation tool like CEAS and look for enrichment in promoter regions.

            My experience is with ChIP-seq for transcription factor binding sites, so that advice might not apply for other types of experiments like histone modifications, though.

            Comment

            • BAMseek
              Senior Member
              • Apr 2011
              • 124

              #7
              Originally posted by cwhelan View Post
              This very insightful and helpful post by Anshul Kundaje on the MACS mailing list has a really good theory involving the mappability of the genome for why you see this pattern in non-enriched ChIP-seq data sets:

              http://groups.google.com/group/macs-...595465a1f9b212
              Here is some more information from the same author: Phantom Peaks

              I've also noticed the same thing - that there are usually two peaks: one at the read length and one at the average fragment length. I have found that the strength of the fragment length peak compared to the read length peak is usually a good indicator of the signal-to-noise quality and one's ability to detect peaks in the data.

              I've always been under the impression that those peaks at the read length might be caused by PCR duplication, but the above link also has a good idea about biases in mappability.

              Justin

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:08 AM
              0 responses
              6 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              53 views
              0 reactions
              Last Post SEQadmin2  
              Working...