Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • brentp
    Member
    • Apr 2010
    • 72

    MethylCoder: software for bisulfite treated reads

    hi, i have been working on a pipeline that takes from bisulfite treated reads and returns useful methylation summary and output as simply as possible. i'm posting it here to get feedback. the best summary is to read the page here: http://github.com/brentp/methylcode

    it's available for download

    directly from the git repository as: git clone git://github.com/brentp/methylcode.git

    and via tarball: http://github.com/brentp/methylcode/tarball/master


    you'll need:
    * numpy from here: http://sourceforge.net/projects/numpy/files/
    * cython from here: http://pypi.python.org/packages/sour...-0.12.1.tar.gz
    * pyfasta from here: http://pypi.python.org/pypi/pyfasta/
    * bowtie from here: http://bowtie-bio.sourceforge.net/index.shtml

    MethylCoder uses the well-known method of converting all C's to T's in both the reads and the reference in order to map the bisulfite treated reads. Bowtie is used to do the alignments. It requires a FASTQ file for input, but if you have raw reads, you can convert them to FASTQ and use 'I' or whatever for the quality values and adjust the bowtie params and it will work fine.

    We have been using it in the lab for quite a while and I have tested it against published analyses and other software and it matches very closely (but uses less memory and less CPU time), but use at your own risk.

    Currently, it does not handle paired end reads. If someone needs this and provides me with a set of paired-end BS-treated reads, I will likely implement.
    I would appreciate any feedback in terms of usability or features.

    this work is supported the fischer lab (http://epmb.berkeley.edu/facPage/dispFP.php?I=8) at uc berkeley but any problems are my fault. please contact me directly with any questions or problems.
  • krobison
    Senior Member
    • Nov 2007
    • 734

    #2
    Please put an entry in the software Wiki for this tool! Otherwise, I'll have to do it :-)

    Comment

    • brentp
      Member
      • Apr 2010
      • 72

      #3
      done. thanks.

      Comment

      • brentp
        Member
        • Apr 2010
        • 72

        #4
        hi, i've updated MethylCoder with the following:

        + supports paired end reads
        + can use either bowtie or gsnap for the aligner
        + can take either fasta or fastq files as input
        + prints a nice, per-chromsome summary along with the per-base text and binary format and the SAM format.
        + better documented analysis scripts for finding differentially methylated regions between 2 runs of the pipeline. (fisher's exact test)
        + full tracking of the command used to generate each output file.
        + growing test suite.


        please let me know if any questions, comments, or feature requests ( [email protected] )
        code is available at github as before:

        directly from the git repository as: git clone git://github.com/brentp/methylcode.git
        and via tarball: http://github.com/brentp/methylcode/tarball/master

        Comment

        • brentp
          Member
          • Apr 2010
          • 72

          #5
          MethylCoder has been published as a bioinformatics applications note:

          MethylCoder: Software Pipeline for Bisulfite-Treated Sequences
          Brent Pedersen; Tzung-Fu Hsieh; Christian Ibarra; Robert L. Fischer
          Bioinformatics 2011; doi: 10.1093/bioinformatics/btr394

          PDF Link

          Let me know of any questions.

          Comment

          • bisol
            Junior Member
            • Jul 2010
            • 2

            #6
            Hi Brent,

            what are the differences in the alignment between basespace and colorspace data i.e. how do you solve the problem that one can't apply the in-silico conversion of C's to T's in reads for colorspace?

            Comment

            • brentp
              Member
              • Apr 2010
              • 72

              #7
              Originally posted by bisol View Post
              Hi Brent,

              what are the differences in the alignment between basespace and colorspace data i.e. how do you solve the problem that one can't apply the in-silico conversion of C's to T's in reads for colorspace?
              Hi Bisol,
              I basically side-step the problem. I recommend that you do the following:
              1) quality trim your reads
              2) map with methylcoder (+bowtie) allowing 0 (you can also try 1) mismatches.
              3) map the unmapped reads with solid's SOCS tool: http://solidsoftwaretools.com/gf/project/socs/

              MethylCoder does a naive translation of C=>T by converting to base-space, then converting, then converting back to base-space. So it doesn't solve the problem, just tries to provide a solution to quickly map reads with no errors. I welcome suggestions for improvement in that regard.

              -Brent

              Comment

              • dychiang
                Junior Member
                • Oct 2009
                • 7

                #8
                Comparison with BisMark?

                Brent,

                Nice software and publication. Have you tried comparing MethylCode and BisMark on the H1 ES cell line MethylC-seq dataset from Lister et al (2009)?

                Thanks,
                Derek

                Comment

                • brentp
                  Member
                  • Apr 2010
                  • 72

                  #9
                  Originally posted by dychiang View Post
                  Have you tried comparing MethylCode and BisMark on the H1 ES cell line MethylC-seq dataset from Lister et al (2009)?
                  Hi Derek, there is a comparison to other BS-Seq software here:

                  It uses some Arabidopsis thaliana data and shows time, (approximate) memory use, and reads mapped.

                  Felix Kreuger, one of the authors of BisMark suggested some changes to BisMark parameters that I could use to improve its performance, but I have not yet updated the benchmark with those changes.

                  Comment

                  • fkrueger
                    Senior Member
                    • Sep 2009
                    • 627

                    #10
                    As both MethylCoder and Bismark employ a very similar strategy, I would imagine that the results are very similar. By the way my last name is spelled Krueger :P.

                    Comment

                    • brentp
                      Member
                      • Apr 2010
                      • 72

                      #11
                      Originally posted by fkrueger View Post
                      By the way my last name is spelled Krueger :P.
                      As someone who repeatedly has their last name misspelled, I sincerely apologize.

                      And yes, the results between MethylCoder (with bowtie) and BisMark are quite similar.

                      Comment

                      • dychiang
                        Junior Member
                        • Oct 2009
                        • 7

                        #12
                        Originally posted by fkrueger View Post
                        As both MethylCoder and Bismark employ a very similar strategy, I would imagine that the results are very similar. By the way my last name is spelled Krueger :P.
                        Brent and Felix -- thanks very much for your helpful replies. The epigenetics sequencing community needs some good benchmarks, such as RGASP, to test the plethora of algorithms being developed.

                        Will either or both of you be attending the HiTSeq SIG at ISMB next week? I would be delighted to meet up with you.

                        Comment

                        • fkrueger
                          Senior Member
                          • Sep 2009
                          • 627

                          #13
                          Hi dychiang,
                          I won't be attending the HiTSeq SIG but I'll be at ISMB from Sunday til Wednesday. I'm happy to meet up with you, either drop me an email ([email protected]) or find me at my poster (Poster U59: Analysing allele-specific NGS datasets using ASAP)

                          Comment

                          • brentp
                            Member
                            • Apr 2010
                            • 72

                            #14
                            Hi Derek, I wont be at that conference, but feel free to send me an email.

                            Comment

                            • bisol
                              Junior Member
                              • Jul 2010
                              • 2

                              #15
                              So essentially MethylCoder can map only perfect color space reads which align
                              perfectly against the genome.

                              As it has been pointed out in a previous thread you started
                              (http://seqanswers.com/forums/showthread.php?t=7979), it is generally problematic to
                              do a naive translation of color to base space, then converting C=>T and translating
                              back to color space, as a single measurement error in the color space read will be
                              translated into a false nucleotide sequence. Depending on where in the read this
                              measurement error occurs, the sequence either can't be mapped anymore (which means a
                              low mapping efficiency) or it will map to a wrong position in the genome and thus
                              result in false methylation calls.

                              You are now suggesting that all unmapped reads should instead be aligned with
                              SOCS-B, which - even though it is a good tool - is incredibly slow for complex
                              genomes and many reads.

                              Therefore, isn't it a quite bold statement to state in the paper that "MethylCoder
                              is a novel tool that allows ... mapping in both color and nucleotide-space -
                              something that no other BS-Seq software allows"?

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...