Originally posted by kentawan
View Post
The intensity difference filter is only useful in a case where:
- You have a measure where the size of the measure and the noise are correlated - generally anything which is a direct manipulation of a sequence count
- You expect that the majority of the data will not be changing such that it makes sense to look for outliers after creating a model over the rest of the data.
..and unfortunately neither of these really applies to methlyation data.
There are a few different ways to look at methlyation data depending on what you're looking for, but a typical recipe we'd use would be something like this:
- Decide where you're going to measure. It almost never makes sense to analyse individual cytosines as the coverage for each will generally be poor and differences will never be significant. If you're targeting something specific like gene bodies, promoters or CpG islands then put probes over those, if not then tile probes of an appropriate size (our default is 3-5kb if we have reasonably good coverage - go larger if your coverage isn't great). Use whichever probe generator is appropriate to make probes over these regions.
- Quantitate your data using the bisulphite quantitation pipeline using the probes you made in the step above. I'd suggest setting the minimum coverage depth and the measures per feature both to 1 to start with (this isn't the default at the moment, but it will be in the next release)
- Decide on the minimum absolute methylation change you're prepared to care about. In well measured regions you might find that a change of 1% or less will be significant, but you want to consider the biological context and whether this is likely to mean anything. We'd probably normally start looking at a minimum 5-10% change.
- Run a contingency based test (Filtering > Statistical Test > Chi Square > For/Rev) and select the two datasets you want to compare. Set the minimum difference to the value you selected in the last step. This will give you an initial hit list.
- Select the hit list you got from the contingency test and then filter this using your current quantitation using the differences filter (Filter > Value difference > Individual probes). Select both of your datasets in both option lists and set the minimum difference to be the cutoff you selected before.
You should now have a list of probes which show a methylation change which is both statistically and biologically significant. If you did tiled probes then the next step would be to try to relate the positions you found to biological features to try to understand why they were selected.
There are many other ways to go about this, but hopefully this will at least get you started.
Leave a comment: