Unconfigured Ad

**pmcget** · 04-16-2008, 09:35 AM

It's a bit difficult to tell based on such limited information. It could be either a problem with how you are using the aligner or a problem with the library prep or ....something else entirely

Can you tell use what alignment program you are using? e.g. is it ELAND?

If so do you have any stats from the output e.g. if you run the following unix command we can see what the breakdown of unique matches/repetitive matches/no matches is

cut -f3 my.eland.output.file | sort | uniq -c

You could also check to see if you are getting a lot of the same sequence reads (which might indicate library prep problems):

cut -f2 my.eland.output.file | sort | uniq -c | sort

**apfejes** · 04-16-2008, 01:14 PM

1-2% aligned is very low. You might also want to verify you're aligning against the correct reference sequence, and to make sure that the tags that don't align aren't just adapter dimers/trimers/etc.

I would also ask how many reads you received for each lane.

Finally, with ChIP-Seq, the volumes of starting material are very low, so it's possible that you're getting contamination from something else. If you were running a gel to select the desired size range, I would suggest you make sure you run ONLY the ChIP-Seq results on that gel. If you also run a ladder or another experiment on the same gel, you can get a significant amount of contamination. (E.g. we found ladder sequences in our Chip-Seq experiments, even when separated by 5+ empty wells.)

**Chipper** · 04-16-2008, 01:47 PM

I guess that is why they recommend doing size separation after adaptor ligation now...

What I would do first would be to check sequences for adapors and possibly try aligning to some bacteria in case it is contaminated, also check the reagents so that the beads are not saturated with ssDNA. And if you have read > 30 bases try aligning truncated reads (sequencing errors are most common in the ends of reads).

Where are the aligned reads placed, are they only in Satellite repeats etc or wher you would expect them?

**sblake** · 07-10-2008, 11:45 AM

apfejes - in regards to your contamination issue of the ladder with a sample on the chip-seq size selection gel, how do evaluate what to excise without a ladder? tks

**apfejes** · 07-10-2008, 12:21 PM

Hi sblake,

I was told the people in the lab run the gel with a blue dye that migrates at along with fragments of a particular fragment size. They use this as a guide to indicate the approximate position to excise - I understand it took a bit of practice to get the technique right, but once you have it down, it's not too bad.

Off hand, i don't know which dye they use, but I certainly remember the blue dyes from my (long ago) days of running gels. I'd hate to try this on a really small gel, but on a longer one, I don't see this being a problem.

**sblake** · 07-10-2008, 12:45 PM

aha! makes sense, tks

**kmay** · 08-01-2008, 06:36 AM

This low alignment rate puzzles me. elly never replied to this thread

My first guess was what apfejes took into account: was it the right reference genome? Otherwise the experiment went terribly wrong.

chIP-seq usually generates rather noisy data. But the noise is in the aligned tags and delivers often rates of 4% to 5% of aligned tags falling into clusters. Here amplification plays a crucial role. One step too many generates quite some headache. However, 4% is enough to still get good results at the end.

Klaus

**elly** · 08-02-2008, 01:18 AM

Hi kmay,

I'm so sorry that I didn't reply to my thread.
I was so busy to set up another application.

I'm sure that my referernce genome sequence was correct.

Could you please explain why ChIP-seq usually generate noisy data?
And how 4~5% aligned data can be used for further analysis?
What is the rest 95~96% of reads?

I haven't solved this problem yet and still seeking solution.

Your suggestion and explaination would be so helpful for me.
Thank you in advance.

elly.

**kmay** · 08-02-2008, 03:32 AM

elly,

sorry for having been mis-understandable!

I am talking about two steps.
1st: mapping of the reads to the genome
2nd: clustering of the mapped reads from step 1 into regions (clusters) of enriched read density.

We have only example data for a DNAse-seq experiment online, but they might be helpful in explaining the difference.

Step1 statistics
Step 2 statistics with arbitrary variation of tag density per bp-window, to demonstrate effects of such. Usually we calculate significance of tag density based on a poisson distribution

If step 1 delivers only 5% something is terribly wrong.
Did you try other mapping algorithms than Eland? Do mappings with increasing relaxation criteria:
1 point mutation,2..,3... indel1, 2, 3... and see how mapped tag numbers behave.

The 5% I´ve been talking about, correspond to the number of mapped tags falling into clusters of enriched density from step2.
The "noisyness" in this stage largely depends on specificity of the antibody. There is always a lot of unspecific binding carried over. Another major effect has the experimental set-up. Whether and how you do a control for subtraction.
Unspecific ab or just input control. To our experience the latter shows better results. Last but not least, be very careful with amplifications. Noise rather quickly gets up to signal levels.

5% vs. th rest 95%: well, i am afraid there is no clear-cut statistical method to decide at the end about the success. It requires some human brain to look at the raw data and clusters in the genome annotation (we do this in ElDorado). You can sign up for free for two weeks and inspect the open chromatin data, or go here to see the DGE results from our Science paper. Go down to "user data" and choose which data you want to see.

Statistics comes into play again at the next step: see which TF-binding sites are over-represented in the clusters (hopefully the one you IPed), whether they are part of a complex model or they are phylogenetically conserved.

Hope this helps!

Cheers

Klaus

**Chipper** · 08-02-2008, 03:39 AM

Hi again,
I guess Klaus is reffering to the fact that of all aligned reads only a small percentage occur in peaks of significant enrichment. You will always have some genomic background and due to the large genome size this will generate a high number of radomly aligned sequences even if you have a good enrichment ratio.

What beads and blocking did you use for ChIP, is there any possible contaminats like ssDNA?

**Chipper** · 08-02-2008, 03:52 AM

Klaus,

did not see your answer to elly before. Is the amplification a problem in your data even if you do unique positions only for the uniquely aligned reads?

**kmay** · 08-03-2008, 07:54 AM

Chipper,

I am not sure wheter I understand your question right. The amplification in the wet lab amplifies everything, including unspecific noise. One should expect for later analysis that higher copy number tags get up more quickly than unique signals and downstram analysis for perfect and unique matches only should eliminate this. We always at the first step look at perfect and unique maches only. However, there is a lot of "noise" (unspecific bound DNA or otherwise carried over oligos) which matches pefect and uniquely, too.

We don´t do te wet lab, we do only analyses. And with the many different data sets we saw so far, we found that amplification seems to be the next crucial step after ab-specificity.

If you are interested, I could bring in our specialist for that.

Klaus

**Susanne** · 08-25-2009, 06:01 AM

Chipper, replying to your post #11 in this thread, how are you able to tell whether or not ssDNA is contaminating the sample?

**Nix** · 08-25-2009, 06:45 AM

Did you by chance use salmon sperm or another DNA as a block or carrier during the chIP?

(We've had 2 separate groups who did just this. Leads to very low % alignment. Uggg! On the other hand were up to 100million salmon reads if anyone wants to take a go at a denovo assembly.)

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 52 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

ChIP-Seq problems with library generation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News