Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Methylation calling tools, what works well?

    Hi all,

    I'm looking for a stable standalone tool to call methylation from mapped reads, in my case from RRBS. I also need to be able to call non-CpG methylation in some way.

    [Reason: I'm usually using bismark, which I can also recommend, but it is limited to bowtie. My current issue with some RRBS data sets is that they have very poor mapping efficiency in bowtie (after clipping+trimming) but mapping with other tools, especially RRBSMAP, works well. Perhaps this could be due to ambiguous reads which are less of an issue when mapping only to RRBS-relevant fragments of the genome.] from the rrbsmap package does call methylation, but it is unclear to me whether this is just CpG or also non-CpG methylation (there is no distinction and it's not documented).

    Before long trial and error, I'd be very interested in your own experiences with standalone methylation calling after mapping, and what works.


  • #2
    Hi Mixter,

    Just out of interest: were you doing paired-end sequencing with fairly long reads? If this is the case then I can understand that you see a fairly low mapping efficiency with Bowtie, and this is indeed caused by Bowtie's behavior not to regard completely overlapping reads as valid alignments.

    E.g.: a read pair looking like this

    ---------------------------------------> read 1
    <--------------------------------------- read 2

    is regarded invalid for Bowtie 1.

    As RRBS size-selects for fragments between 40 and 220bp, and there are indeed even shorter fragments passing the size-selection step, you can expect a sizeable proportion of reads to be completely overlapping after adapter trimming (e.g, trimming a 100bp paired-end read that was merely sequencing a 40 bp fragment).

    To get the above shown reads to align with Bowtie 1 it is sufficient to trim the reads by 1 bp so that they are not completely contained within each other but only overlap almost entirely, like so:

    -------------------------------------->. read 1
    .<-------------------------------------- read 2

    Alternatively, running Bismark with Bowtie 2 can also handle reads that are completely contained within each other.

    I am only mentioning this because we just had a similar case at our institute where we sequenced a library with a ~120bp mean fragment length with 2x100 bp paired-end reads. After adapter trimming, Bowtie 1 alignments mapped with ~50% efficiency because lots of fragments were sequenced exactly twice by both reads. If these reads were trimmed by 1bp on the 3'end, the mapping efficiency went up to nearly 80%. The effect is probably even more pronounced for RRBS libraries.

    If you didn't do paired-end alignments, please accept my apologies for this lengthy explanation :P

    As a final remark, in theory it shouldn't be too difficult to adapt Bismark's methylation caller for your initial question. Nevertheless I am understably interested in getting to the bottom of why you were dissatisfied with Bismark's mapping results, since one should easily get ~66% mapping efficiency from a good single-end 40bp library, and this will only get better with longer (good quality) reads or paired-end reads (I recently had 2x40bp RRBS libraries with 73% mapping efficiency).


    • #3
      Thanks, even though I'm having these issues with single-end sample, the paired-end information is valuable.

      I have a rather strange case where a medium-quality sequencing run is yielding almost 0.0% mapping efficiency and similar runs only about 10%. I've done proper trimming and adapter clipping before (there are still some overrepresented sequences left, but only from spike-in DNA from phiX). It's already trimmed but I'm also skipping 5 more of the start bases due to low quality. This is a RRBS digest of the mouse genome with MseI (as can be seen by most reads starting with TAA).

      I've tried bismark with the latest Bowtie and also Bowtie2 (beta 5) with all options possible for making it less strict (even -e 1000, since phred values are low) but the best I can get is about 4% mapping efficiency. I was quite surprised that rrbsmap 1.6 mapped almost all reads (but its methylation calling script is really sparse and I would need an external tool, perhaps adapt bismark to it if possible).

      I also aligned directly to the GA/CT converted genomes with bowtie with more custom options and got the same low efficiency. So I think it's really bowtie's strictness that makes such a difference here. If you're interested and want to take a look at some sample reads to make a guess about the big discrepancy between bowtie vs. rrbs, feel free: (I always mapped against mm9).

      Btw, apart from this issue, a future option in bismark that would be great for RRBS would be a conversion of only parts of the genome relevant for RRBS (e.g. near MspI or MseI sites) and limited mapping to those regions.
      Last edited by mixter; 04-06-2012, 11:15 AM.


      • #4
        Hi mixter,

        I came across your post here while I am doing a google search.

        Y. Xi and I, as authors of RRBSMAP, developed the mSuite tool. It is a methylation analysis pipeline. In brief, it does methylation calling on CG/CH/CGH/CHH. It reports some statistics. It does identification of differentially methylated Cytosines, DMC, identification differentially methylated regions, DMR. It does association of genome features with methylation.

        I haven't mentioned it anywhere because the modules are not completely wrapped together and I am still preparing the manuscripts for the methods description. But you may use the methylation calling module without too much reading. There's no methods involved.


        Install Boost, Samtools, Rcpp(a R library) before compiling mSuite.

        Make sure $SAMTOOLS and $BOOST_ROOT are pointing to the correct location.

        For example on my system,
        export SAMTOOLS=/share/apps/samtools/0.1.16
        export BOOST_ROOT=/share/apps/boost/boost-1.46.1
        You may have to export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib if you installed Boost into /usr/local/lib as a superuser.

        You may just hard code these commands in ~/.bashrc file.

        untar msuite and type make.

        ====following are installation instructions for Boost and Samtools====
        Building mSuite from source
        In order to build mSuite, you must have the Boost C++ libraries (version 1.38 or higher) installed on your system. See below for instructions on installing Boost.

        Installing Boost
        ./b2 install

        You may have to export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib if you installed Boost into /usr/local/lib as a superuser.

        Installing the SAM tools

        Download the SAM tools
        Unpack the SAM tools tarball and cd to the SAM tools source directory.
        Build the SAM tools by typing make at the command line.
        Choose a directory into which you wish to copy the SAM tools binary, the included library libbam.a, and the library headers. A common choice is /usr/local/.
        Copy libbam.a to the lib/ directory in the folder you've chosen above (e.g. /usr/local/lib/)
        Create a directory called "bam" in the include/ directory (e.g. /usr/local/include/bam)
        Copy the headers (files ending in .h) to the include/bam directory you've created above (e.g. /usr/local/include/bam)
        Copy the samtools binary to some directory in your PATH.

        If you install SAM tools in your home dir, then just replace the above string '/usr/local/' by your desired installation directory.

        ./configure (--with-bam=/home/dsun/ if you installed bam/*.h to /home/dsun/include/bam/ and libbam.a to /home/dsun/lib/)
        make install


        • #5
          Dear mixter,
          I hope you read this post even though you wrote about it almost 3 years ago
          I am curently having the same problem as you did with a non-model species analyzed by RRBS.
          Could you solve you low efficiency mapping with Bismark? If so, how did you do it?
          Thanks in advance!


          Latest Articles


          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin

            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin

            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM





          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          Last Post seqadmin