Unconfigured Ad

**akundaje** · 10-30-2009, 12:09 AM

IMHO the answer is no. There are several issues I can think of.

First, you cannot compare the number of peaks or get any statistics directly without any estimate of replication noise. If you had replicate experiments for atleast one (ideally both) conditions and u ran peak calls on the replicates you could get an estimate of the variance of the called peaks.

Secondly, it depends on your p-value/enrichment cutoff. If you are restricting to very strong peaks then you could potentially compare the numbers. Reason is as you relax ur threshold for calling peaks, the different experiments could bleed in noisy peaks at different rates. So a tiny change in the p-value threshold could cause massive differences in number of peaks called. For example, I have seem ample cases of biological and technical replicates of the same experiment giving quite different number of peaks for the same threshold with the same peak caller program. The strongest peaks tend to agree but as u go down the list the consistency gets worse.

Also, hopefully the control experiment used is common or that is going to make it even harder to do a head to head comparison.

Ideally, you want to rank your peaks by their enrichment/p-value and compute rank statistics on that to estimate how different the two experiments are.

**repinementer** · 08-06-2011, 05:00 PM

May I add another question. I have the same scenario. 2 different chipseq from 2 different experiments (one in brain and one in heart). brain chipseq has 10 million tags and heart 4 million tags. I want to map the raw number of tags around promoter. But this difference in no.of tags is not giving any patterns except a flat line on the top and one at the bottom.

I tried to normalize in this way. But it didn't work at all. Any ideas about normalizing ChIP-Seq sample with different number of tags from 2 different experiments ?

position_cDNAnorm = (position_cDNA / sum_cDNA) * average_sum_cDNA

* position_cDNAnorm = normalised cDNA value for specific position and specific DBP
* position_cDNA = cDNA value for specific position and specific DBP
* sum_cDNA = total cDNA count for specific DBP
* average_sum_cDNA = average of total cDNA counts of all DBPs
DBP= DNA Bindign Protein (Transcription factor)

**ETHANol** · 08-06-2011, 10:08 PM

I completely agree with akundaje. I would like to emphasize his point that even if you have the the same number of tags from two biological or even technical replicates and compare them you will get different peaks called. Replicates will help weed out the borderline peaks. The peaks called and read count is not a linear relationship.

Since there is this issue with variation of borderline peaks called at the peak arbitrary cut off, I think the thing to do is in you peak finder run you two ChIP samples as 'treatment' and 'control'. This will identify significant differences between the samples. However, some of these differences may be from differences in chromatin structure and sheering efficiency and not txn factor binding. So this requires a second step. Take your significant differences and then intersect those with your list of peaks and you should end up with a list of real differences between the two conditions.

You should still normalize the read counts and get some replicates.

I made a blog post on my new blog on this subject. So here is the shameless link to it:

How to compare transcription factor binding between two ChIP-seq samples

http://ethanomics.wordpress.com/2011/08/07/how-to-compare-transcription-factor-binding-between-two-chip-seq-samples/

This is a question people seem to be having some difficult with, as I’ve seen it asked a few times on SeqAnswers. You have results from two ChIP-seq experiments. For example, you want to know if N…

This seems like a pretty good way to go about addressing the question at hand, but there may be better ways.

**howi** · 08-07-2011, 09:42 AM

In my experience no clear statement can be made without replicates. E.g. we had two replicates with about 4K peaks. The overlap of the peaks was 100. That already tells you quite something about peak calling and its interpretations. After looking at ChIP-seq data from others I experienced the same. But folks tend to pool their replicates before peak calling to get around that. Anyway if I see people taking peak numbers to answer biological questions the first thing I do is to look at the raw data (if it is available). In most but one cases I would say that peak numbers mean nothing.

Another case was the analysis of cells with very low TF protein level upon treatment (like in a KO situation). Peak calling reveals double the amount of peaks for that situation compared to untreated cells with TF binding and normal protein levels.

I did not find any answers on how to rank my peaks to compare different treatments. For me it worked quite well to plot the tag enrichments (Input, IgG, Treated, Untreated) +-3kb around my peaks in a heat map and do k-means clustering. That identified strongly enriched sites I can trust.

**emilyjia2000** · 08-22-2011, 01:43 PM

ETHANol,

It is a good solution somehow. But in my case, I would like to compare the two samples to see if these two samples are similar or different. It might need some statistical calculation I guess.
Any suggestions will be highly appreciated.
Thanks

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 50 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 51 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

normalization of ChIP-seq data

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News