Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • ChIP-Seq problems with library generation

    I am trying to sequencing ChIPed DNA.
    My DNA was sonicated to 500~1kbp during IP step.
    I chose two ways to solve this size problem.
    Before starting library preparation for SOLEXA, one was fragmenting them again to make them lower than 200bp in length.
    Another was just proceeding with them.
    Finally, I could get successfully the enriched-adptor modified ChIP-DNA.
    But, after analyzing them, I recognized that the %align is only 1~2% for both methods.(% PF was ~60%, total throughput was good.)
    Is there anyone who can explain which step can ocuur this knid of problem?
    Thank you for your help in advance.

  • #2
    It's a bit difficult to tell based on such limited information. It could be either a problem with how you are using the aligner or a problem with the library prep or ....something else entirely

    Can you tell use what alignment program you are using? e.g. is it ELAND?

    If so do you have any stats from the output e.g. if you run the following unix command we can see what the breakdown of unique matches/repetitive matches/no matches is

    cut -f3 my.eland.output.file | sort | uniq -c

    You could also check to see if you are getting a lot of the same sequence reads (which might indicate library prep problems):

    cut -f2 my.eland.output.file | sort | uniq -c | sort


    • #3
      1-2% aligned is very low. You might also want to verify you're aligning against the correct reference sequence, and to make sure that the tags that don't align aren't just adapter dimers/trimers/etc.

      I would also ask how many reads you received for each lane.

      Finally, with ChIP-Seq, the volumes of starting material are very low, so it's possible that you're getting contamination from something else. If you were running a gel to select the desired size range, I would suggest you make sure you run ONLY the ChIP-Seq results on that gel. If you also run a ladder or another experiment on the same gel, you can get a significant amount of contamination. (E.g. we found ladder sequences in our Chip-Seq experiments, even when separated by 5+ empty wells.)
      The more you know, the more you know you don't know. —Aristotle


      • #4
        I guess that is why they recommend doing size separation after adaptor ligation now...

        What I would do first would be to check sequences for adapors and possibly try aligning to some bacteria in case it is contaminated, also check the reagents so that the beads are not saturated with ssDNA. And if you have read > 30 bases try aligning truncated reads (sequencing errors are most common in the ends of reads).

        Where are the aligned reads placed, are they only in Satellite repeats etc or wher you would expect them?


        • #5
          apfejes - in regards to your contamination issue of the ladder with a sample on the chip-seq size selection gel, how do evaluate what to excise without a ladder? tks


          • #6
            Hi sblake,

            I was told the people in the lab run the gel with a blue dye that migrates at along with fragments of a particular fragment size. They use this as a guide to indicate the approximate position to excise - I understand it took a bit of practice to get the technique right, but once you have it down, it's not too bad.

            Off hand, i don't know which dye they use, but I certainly remember the blue dyes from my (long ago) days of running gels. I'd hate to try this on a really small gel, but on a longer one, I don't see this being a problem.
            The more you know, the more you know you don't know. —Aristotle


            • #7
              aha! makes sense, tks


              • #8
                This low alignment rate puzzles me. elly never replied to this thread
                My first guess was what apfejes took into account: was it the right reference genome? Otherwise the experiment went terribly wrong.

                chIP-seq usually generates rather noisy data. But the noise is in the aligned tags and delivers often rates of 4% to 5% of aligned tags falling into clusters. Here amplification plays a crucial role. One step too many generates quite some headache. However, 4% is enough to still get good results at the end.



                • #9
                  Hi kmay,

                  I'm so sorry that I didn't reply to my thread.
                  I was so busy to set up another application.

                  I'm sure that my referernce genome sequence was correct.

                  Could you please explain why ChIP-seq usually generate noisy data?
                  And how 4~5% aligned data can be used for further analysis?
                  What is the rest 95~96% of reads?

                  I haven't solved this problem yet and still seeking solution.

                  Your suggestion and explaination would be so helpful for me.
                  Thank you in advance.



                  • #10

                    sorry for having been mis-understandable!

                    I am talking about two steps.
                    1st: mapping of the reads to the genome
                    2nd: clustering of the mapped reads from step 1 into regions (clusters) of enriched read density.

                    We have only example data for a DNAse-seq experiment online, but they might be helpful in explaining the difference.

                    Step1 statistics
                    Step 2 statistics with arbitrary variation of tag density per bp-window, to demonstrate effects of such. Usually we calculate significance of tag density based on a poisson distribution

                    If step 1 delivers only 5% something is terribly wrong.
                    Did you try other mapping algorithms than Eland? Do mappings with increasing relaxation criteria:
                    1 point mutation,2..,3... indel1, 2, 3... and see how mapped tag numbers behave.

                    The 5% I´ve been talking about, correspond to the number of mapped tags falling into clusters of enriched density from step2.
                    The "noisyness" in this stage largely depends on specificity of the antibody. There is always a lot of unspecific binding carried over. Another major effect has the experimental set-up. Whether and how you do a control for subtraction.
                    Unspecific ab or just input control. To our experience the latter shows better results. Last but not least, be very careful with amplifications. Noise rather quickly gets up to signal levels.

                    5% vs. th rest 95%: well, i am afraid there is no clear-cut statistical method to decide at the end about the success. It requires some human brain to look at the raw data and clusters in the genome annotation (we do this in ElDorado). You can sign up for free for two weeks and inspect the open chromatin data, or go here to see the DGE results from our Science paper. Go down to "user data" and choose which data you want to see.

                    Statistics comes into play again at the next step: see which TF-binding sites are over-represented in the clusters (hopefully the one you IPed), whether they are part of a complex model or they are phylogenetically conserved.

                    Hope this helps!




                    • #11
                      Hi again,
                      I guess Klaus is reffering to the fact that of all aligned reads only a small percentage occur in peaks of significant enrichment. You will always have some genomic background and due to the large genome size this will generate a high number of radomly aligned sequences even if you have a good enrichment ratio.

                      What beads and blocking did you use for ChIP, is there any possible contaminats like ssDNA?


                      • #12

                        did not see your answer to elly before. Is the amplification a problem in your data even if you do unique positions only for the uniquely aligned reads?


                        • #13

                          I am not sure wheter I understand your question right. The amplification in the wet lab amplifies everything, including unspecific noise. One should expect for later analysis that higher copy number tags get up more quickly than unique signals and downstram analysis for perfect and unique matches only should eliminate this. We always at the first step look at perfect and unique maches only. However, there is a lot of "noise" (unspecific bound DNA or otherwise carried over oligos) which matches pefect and uniquely, too.

                          We don´t do te wet lab, we do only analyses. And with the many different data sets we saw so far, we found that amplification seems to be the next crucial step after ab-specificity.

                          If you are interested, I could bring in our specialist for that.



                          • #14
                            Chipper, replying to your post #11 in this thread, how are you able to tell whether or not ssDNA is contaminating the sample?


                            • #15
                              Did you by chance use salmon sperm or another DNA as a block or carrier during the chIP?

                              (We've had 2 separate groups who did just this. Leads to very low % alignment. Uggg! On the other hand were up to 100million salmon reads if anyone wants to take a go at a denovo assembly.)


                              Latest Articles


                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin

                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin

                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM





                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              Last Post seqadmin