Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tools for finding differential histone modification sites?

    We did ChIP-seq for a few histone modification marks such as H3K4me3 and H3K9me3 in our treatment and control conditions.

    We want to find out which genomic regions have shown modified histone mark binding after the treatment. Any suggestions on choosing the right tool for this purpose?

  • #2
    You could try ChIPDiff



    I haven't used it myself but it is designed for the kind of question you are trying to answer.

    Comment


    • #3
      It seems ChIPDiff only works well under certain conditions. I'm not sure whether the assumptions of HMM holds for all of our data.

      Comment


      • #4
        SICER is designed for ChIP-seq analysis of histones... If you're looking for the difference between two, though, I'm not sure if that'll help.





        Maybe call peaks and use BEDTools to compare their locations?

        Comment


        • #5
          No. SICER is designed to find peaks/islands. It should not be used to identify differential sites. What I'm referring here are two ChIP samples from our experimental treatment and control. I'm not trying to compare ChIP vs. input and find the enriched histone binding sites.

          It seems using peak finding algorithm won't give very good results. Because it mostly relies on the cutoffs used to define peaks. And it won't tell you the difference when there are two overlapping peaks but one is stronger than the other.

          Comment


          • #6
            Oh. Sorry, I didn't understand.

            One naive approach would be to make a ratio of the data (add 1 to every genomic location to avoid dividing by zero) and then call peaks on that... this would show you where the biggest differences were, at least.

            Other people might have better ideas...

            Comment


            • #7
              I found this old thread through Google and I want to do the same thing: compare differential binding between treated and untreated samples.

              For example, identify where ChIP peaks are gained or lost in the treated sample. Ideally, we would also like to identify if there is simply an increase or decrease in binding beyond a certain threshold (eg. if a particular site has a 3-fold increase or decrease in enrichment (peak height) in the treated sample), even if it's not a complete gain or loss. If it matters, we're not looking at histone modifications (although may in the future).

              Since this thread was posted a year ago, are there any new tools to do this, before I spend more time trying to make one?

              Comment


              • #8
                Hi,

                a couple of people are using our DESeq tool, originally designed for RNA-Seq, for this purpose, and recently, a first paper appeared that used DESeq for comparative ChIP-Seq:

                Maze I, Feng J, Wilkinson MB, et al. Cocaine dynamically regulates heterochromatin and repetitive element unsilencing in nucleus accumbens.
                PNAS 2011;108(7):3035, http://www.pnas.org/content/108/7/3035.full

                The general idea is to define suitable counting bins, i.e., intervals on the genome that mark a binding region or a single chromatin mark. Then, you make a count table, with one column for each sample and one row for each counting bin, indicating how many reads overlap with the bin.

                To define the counting bins, there are two options:

                - If you work with histone marks with fixed relation to the gene body, you may simple use gene models. For example, if you want to know how the H3K4me3 peak changes that is usually found at the transcription start site of active genes you may simply define one counting bin for each gene and let this bin stretch from, say, 100 bp upstream to 100 bp downstream.

                - If you look for peaks in intergenic regions, e.g., enhancer binding regions, my suggestion would be to pool the data from all samples, run them through a peak finder and use the called binding regions (intervals) as counting bins.

                In any case, the bins should be not too wide. It seems that you get better signal to noise ratio, if you only try to capture the the high middle portions of the peak, because then, you get the strongest differences if the peak heights differ.

                If anybody tries this, please let me know; we are currently collecting feed-back on how well this works in practice.

                Simon

                Comment


                • #9
                  Originally posted by Simon Anders View Post
                  - If you look for peaks in intergenic regions, e.g., enhancer binding regions, my suggestion would be to pool the data from all samples, run them through a peak finder and use the called binding regions (intervals) as counting bins.
                  Simon

                  This sounds similar to what I've been trying. I didn't think to pool the samples before calling peaks though. Instead I've been using SISSRS to make a list of the peaks in the two samples then use a Perl script to find all unique peaks (ie. anywhere there is either an overlap between the two lists, or if there's a peak in one file but not the other). Once I have this list of total peaks, the script goes through pre-made .wig files to find the peak height at each site, compare between samples, and based on criteria (eg. peak height, fold change in enrichment), output a list.

                  I wonder if scanning through the aligned read file would be faster than the .wig's? I already had the .wig's though so I've been doing it that way. On my 4 year old Core2Duo 2GHz laptop it takes about 30 minutes per sample to call peaks in SISSRS and about 1 hour to run the Perl script with ~35,000 peaks per sample (making the list of peaks takes about 2 minutes, scanning through the .wig's takes the rest of the hour). So far it seems to be working, but it's still a work in progress.

                  Comment


                  • #10
                    I don't believe differential histone modification can be determined purely by comparing counts. Did you normalise your ChIP samples to input control for peak calling? There should at least be some normalisation to correct for systematic bias, but I'm not sure whether linear scaling to the input control is enough. PeakSeq compares samples to input control to correct bias, while the method by Taslim uses a LOESS non-linear regression method to compare any two samples.

                    I think the latter method may be better - as long as there aren't real global differences in histone modification - since the input control is not necessarily the best control for all bias. There may be other technical causes of variation between samples, such as level of ChIP enrichment (signal/noise), that need to be addressed before making conclusions about differential histone modifications.

                    I'd like to hear if anyone has more insight into this question, as it's a problem I'm also currently facing.

                    Comment


                    • #11
                      DESeq only uses linear scaling. From the scatter plots I've looked at so far, this seem to be fine, i.e., a LOESS normalization would not be that different. However, I definitely agree that this is not clear a priori. Checking the normalization with scatter plots is certainly essential.

                      The other interesting question is what to do with the input. A simple normalization like dividing by the input counts has no bearing on differential occupancy analysis, because if all samples are normalized against the same input sample, the input simply cancels out.

                      If you only look at one sample, the input helps you distinguish real from false peaks. You consider a peak as false if it appears with similar strength and shape in the input. I'd expect such a false peak to look the same in all ChIP samples, so even if we do not recognize it as false, we will not call it differentially expressed. this justifies, in my view, to use the input only for peak finding but not for peak comparing.

                      Comment


                      • #12
                        given all the limitations of the ChIPSeq technology and software tools I would rather start with a simple approach: fixed bin counts ip/input to identify enriched windows in each condition followed by fixed bin counts treatment/control in all the enriched bins. starting with a bin size of at least 200, better 500.
                        You do not need any software tool for that.

                        the only problem (and that is a very huge one) is a big difference in sequencing depth between treatment and control samples requiring serious 'normalization'. Gut feeling: linear scaling is not good enough for ChIPSeq as there is a serious oversampling problem in some genomic regions. Therefore I would only trust such an analysis if you have comparable sequencing depths in all samples.

                        Comment


                        • #13
                          I'm not sure, I can follow you here. Why is there a "big difference in sequencing depth between treatment and control samples"? What kind of treatment do you have in mind that would change things so drastically. (We do agree that 'control' means a ChIP sample, not input?)

                          Apart from that, I agree. What you need a non-trivial method for, and what people are using our DESeq package for, is to figure out whether a ratio of treatment/control that differs from 1 is statistically significant.

                          Comment


                          • #14
                            @mudshark

                            When you say 'difference in sequencing depth', do you mean technical variation arising somewhere between the library construction and obtaining mapped reads? Because my other point was that variation could also arise before library construction, from differences in ChIP enrichment between samples. Any normalisation would need to take into account both of these sources of variation.

                            Comment


                            • #15
                              sorry guys: when I mentioned "big difference in sequencing depth" I tried to refer to an experimental scenario that happened to me more than once: varying sequencing depth based on technical problems (e.g. very uneven read distribution in a multiplex run). Just as an example 5 Mio reads in control IP versus 10 Mio reads in treatment IP. I personally consider this impossible to analyze (in a ChIPSeq expt)

                              Of course there might be variation before library preparation. However, biological effects in chromatin are frequently rather mild, therefore I do more fear the technical variations. Still, any biological variation has also be considered.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 11:49 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X