Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • statistical analysis of 454 bisulfite sequence data - small sample size

    Hello everyone,

    I am trying to figure out how to analyse some bisulfite sequencing data that I have and I am hoping that someone will have some suggestions as to how I should go about doing it. I have looked online and in statistics textbooks, but am totally stumped

    I have performed 454 BS sequencing of a number of PCR amplicons. I have two different treatment groups, with n=3 biological replicates in each (six sets of read data in total). I want to use two types of statistical analysis to assess differences in methylation between the treatment groups. I would like to test for differences in methylation (1) at individual CpG sites within an amplicon and (2) across each amplicon as a whole. I think that it will be necessary for me to analyse my results as count data rather than %methylation values, as I have a small sample size and the %methylation values probably do not conform to normality or homogeneity of variance assumptions. Similar studies that I have seen in the literature have used a Fisher's exact test for (1) and a negative binomial generalised linear model for (2). However, these studies have analysed unreplicated data (where biological replicates were pooled prior to PCR) far and as I know these stat tests are unable to accommodate my replicated data. In another post, somebody suggested that the program DESeq could be used for (1). After trying to use DESeq to analyse my data I realised that this is not possible as the relatively small number of CpG sites that I have to analyse result in inaccurate mean/dispersion estimates.

    If anyone has any idea as to which statistical tests would be appropriate for my data I would be very grateful.

    Thank you in advance

  • #2
    I'd be a bit hesitant to try to shoe-horn this into DESeq or one of the other RNAseq tools, the negative binomial distribution doesn't really fit bisulfite sequencing well. This sort of data is generally handled in one of a few ways:

    (1) Logistic regress (e.g., in methylKit), which you can do easily enough in R.
    (2) Smoothing followed by either a t-test or wilcoxon test, similar to how BSseq/Bsmooth works.
    (3) Beta-binomial regression (e.g., in BiSeq).

    I would say that the Beta-binomial methods will win out long term since they're actually able to model the underlying biology. You can just use the betareg package from CRAN in R to do this. The next thing to think about is if you're interested in single CpGs or whole regions. Most of the packages actually try to find regions, but if you're looking at a small number of amplicons then you're actually likely to be more interested in single CpGs, so you might just ignore the packages and use betareg. I should note that none of these methods are as of yet that ideal. There are new variants every month it seems and I actually have a tweaked version of beta-binomial regression in mind to implement if no one else has already (the downside to new packages appearing every couple weeks...), so you'll likely find something to work nicely in the not too distant future.

    Comment


    • #3
      In fact, it turns out that MOABS, which just came out, already implements what I had in mind. You'll have to figure out how to get your data into it, but it's likely to give nice results.

      Comment


      • #4
        Thank you for your helpful advice dpryan

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          Today, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:18 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Today, 08:04 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-30-2024, 03:16 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Working...
        X