Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • MBD-seq (or any enrichment-seq) read number <-> biology?

    Let's assume we have the same cell type in conditions A and B.

    A non-sequencing experiment shows that there is overall more DNA methylation in condition A than in condition B.

    Then we perform MBD-seq on conditions A and B to see where this difference in methylation lies in the genome. (To secure that we put exactly the same amount of DNA starting material in A and B, we carefully quantified it using Qubit etc). We align using BWA. Immediately we see that in all biological replicates, in condition A there are more uniquely mapped reads (up to 25 %) than in condition B. (In some cases there were also more raw sequencing reads in condition A). We are talking about 16 - 25 million uniquelly mapped reads / sample, single end, single sample seq on GAII. Inputs perfectly same. A first question is: does this higher number of uniqely mapped reads in A reflect the biology of A and B (more overall methylation in A)?

    However, after loading wig files into a browser, we can't see any differentially methylated regions - A and B look like perfect replicates of eachother . The differential methylation analysis is currently being run (calling peaks with BALM and using MeDIPS to quantify methylation), however preliminary results show there is no much difference between A and B.

    Does anyone have an explanation for this, assuming that the non-sequencing experiment was valid, and that there indeed is quantitative overall difference in methylation between A and B?

    The only things that come to my mind for now are:

    1. differential methylation between A and B occurs in repetitive DNA sequences, which are exluded from the analysis by BWA?

    2. maybe the MBD protein used for enrichment recognizes other citosine modifications, not only methylation, so the difference in methylation between A and B could be in the state of another modification in B, but still recognised by MBD?

    3. Is there a normalisation step in the algorithms used that would divide peak hight (= quantity) by total number of reads (which is higher in A). If more reads in A reflect biological presence of more methylation, would dividing each peak quantity by this higher total number of reads diminish the difference between A and B?
    I'm not a computational person, but in my understanding this normalisation step wouldn't affect the quantitative analysis and identification of differentially methylated regions (DMRs) only if we assume that DMRs will behave like a microarray experiment: most of the regions don't change, and only a few do. But is it possible that the change in methylation is more uniformly distributed across the genome so this normalisation is affecting quantitative analysis?

    We are quite confused with this experiment, as two different experiments that both worked perfectly don't agree: one says there is more methylation overall in A than in B, but then MBD-seq shows that A and B are identical, like they are replicates of each other. The DNA used in both experiments is exactly the same, so no possibility of inter-replicate effect variability.

    Many thanks!

  • #2
    first thing coming to my mind is exactly what you mentioned: differential methylation is happening in repetitive regions.

    what was the non-sequencing technology?


    • #3
      The non-sequencing experiment was dot-blot, imobilizing total DNA on a membrane and staining with anti-methylC antibody. The experiment was very clean and clear. A has more total methylation than B.

      Another question I have in case those changes occur in repetitive elements - does it mean anything that A has more uniquely mapped reads than B? If all the change was in repetitive regions they would be exluded from the uniquely mapped reads list by BWA, but there's still a difference, in all biological replicates?


      • #4
        A couple comments ...

        How are you going about finding DMRs? It seems that "loading wig files into a browser" may not be very fruitful -- BALM and MEDIPS are geared to absolute methylation, which isn't necessary if you want to go direct to differential methylation. Have you produced an "MA" plot? I do a fair amount of DMR analyses in edgeR (being a co-author), simply taking read densities in bins along the genome or in regions close to TSSs and doing standard count analyses.

        I think that normalization could be an issue here, especially if there is "uniformly distributed changes" along the genome. This is similar to comparing genomes, say, where one has 2 copies and one has 4 copies throughout most of the genome. Because of the distribution of sampling of the genomes, they'd look very similar (in relative read density).

        Do you have any positive/negative controls for your two conditions?



        • #5
          Hi Unununium

          I agree with mark, you need to use a method that is better geared to identify differentially bound regions. I personally used MACS peak finder to identify regions of differential binding. It automatically corrects for library size, which should get around your issue of different read numbers. I have to mention that I always disable model building and set the shift size based on fragmentation pattern of the initial sample, as MACS had some trouble in my data to build reliable shift models.

          EdgeR as mentioned by mark however will probably give you the more statistically sound answers. Although it is a bit more tricky to use, it is definitely worth a try. Maybe have a look at the bioconductor DiffBind package, which as far as I understand aims to combine the two approaches.



          • #6
            Thanks for suggestions, that is exactly what we are trying now - combination of BALM with DiffBind and the edgeR-like approach (counting reads in bins), however for now we can't seem to find many (or any) differentialy methylated regions between the samples.

            Yes we have a control for A and B, apart from Inputs. This control is a baseline state in which perturbations A and B are induced.

            More questions might follow when we finish the initial analysis, especially if we don't identify genomic regions where DMRs happen.



            Latest Articles


            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin

              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin

              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM





            Topics Statistics Last Post
            Started by seqadmin, 06-14-2024, 07:24 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 06-13-2024, 08:58 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 06-12-2024, 02:20 PM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 06-07-2024, 06:58 AM
            0 responses
            Last Post seqadmin