Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MethylCoder: software for bisulfite treated reads

    hi, i have been working on a pipeline that takes from bisulfite treated reads and returns useful methylation summary and output as simply as possible. i'm posting it here to get feedback. the best summary is to read the page here: http://github.com/brentp/methylcode

    it's available for download

    directly from the git repository as: git clone git://github.com/brentp/methylcode.git

    and via tarball: http://github.com/brentp/methylcode/tarball/master


    you'll need:
    * numpy from here: http://sourceforge.net/projects/numpy/files/
    * cython from here: http://pypi.python.org/packages/sour...-0.12.1.tar.gz
    * pyfasta from here: http://pypi.python.org/pypi/pyfasta/
    * bowtie from here: http://bowtie-bio.sourceforge.net/index.shtml

    MethylCoder uses the well-known method of converting all C's to T's in both the reads and the reference in order to map the bisulfite treated reads. Bowtie is used to do the alignments. It requires a FASTQ file for input, but if you have raw reads, you can convert them to FASTQ and use 'I' or whatever for the quality values and adjust the bowtie params and it will work fine.

    We have been using it in the lab for quite a while and I have tested it against published analyses and other software and it matches very closely (but uses less memory and less CPU time), but use at your own risk.

    Currently, it does not handle paired end reads. If someone needs this and provides me with a set of paired-end BS-treated reads, I will likely implement.
    I would appreciate any feedback in terms of usability or features.

    this work is supported the fischer lab (http://epmb.berkeley.edu/facPage/dispFP.php?I=8) at uc berkeley but any problems are my fault. please contact me directly with any questions or problems.

  • #2
    Please put an entry in the software Wiki for this tool! Otherwise, I'll have to do it :-)

    Comment


    • #3
      done. thanks.

      Comment


      • #4
        hi, i've updated MethylCoder with the following:

        + supports paired end reads
        + can use either bowtie or gsnap for the aligner
        + can take either fasta or fastq files as input
        + prints a nice, per-chromsome summary along with the per-base text and binary format and the SAM format.
        + better documented analysis scripts for finding differentially methylated regions between 2 runs of the pipeline. (fisher's exact test)
        + full tracking of the command used to generate each output file.
        + growing test suite.


        please let me know if any questions, comments, or feature requests ( [email protected] )
        code is available at github as before:

        directly from the git repository as: git clone git://github.com/brentp/methylcode.git
        and via tarball: http://github.com/brentp/methylcode/tarball/master

        Comment


        • #5
          MethylCoder has been published as a bioinformatics applications note:

          MethylCoder: Software Pipeline for Bisulfite-Treated Sequences
          Brent Pedersen; Tzung-Fu Hsieh; Christian Ibarra; Robert L. Fischer
          Bioinformatics 2011; doi: 10.1093/bioinformatics/btr394

          PDF Link

          Let me know of any questions.

          Comment


          • #6
            Hi Brent,

            what are the differences in the alignment between basespace and colorspace data i.e. how do you solve the problem that one can't apply the in-silico conversion of C's to T's in reads for colorspace?

            Comment


            • #7
              Originally posted by bisol View Post
              Hi Brent,

              what are the differences in the alignment between basespace and colorspace data i.e. how do you solve the problem that one can't apply the in-silico conversion of C's to T's in reads for colorspace?
              Hi Bisol,
              I basically side-step the problem. I recommend that you do the following:
              1) quality trim your reads
              2) map with methylcoder (+bowtie) allowing 0 (you can also try 1) mismatches.
              3) map the unmapped reads with solid's SOCS tool: http://solidsoftwaretools.com/gf/project/socs/

              MethylCoder does a naive translation of C=>T by converting to base-space, then converting, then converting back to base-space. So it doesn't solve the problem, just tries to provide a solution to quickly map reads with no errors. I welcome suggestions for improvement in that regard.

              -Brent

              Comment


              • #8
                Comparison with BisMark?

                Brent,

                Nice software and publication. Have you tried comparing MethylCode and BisMark on the H1 ES cell line MethylC-seq dataset from Lister et al (2009)?

                Thanks,
                Derek

                Comment


                • #9
                  Originally posted by dychiang View Post
                  Have you tried comparing MethylCode and BisMark on the H1 ES cell line MethylC-seq dataset from Lister et al (2009)?
                  Hi Derek, there is a comparison to other BS-Seq software here:

                  It uses some Arabidopsis thaliana data and shows time, (approximate) memory use, and reads mapped.

                  Felix Kreuger, one of the authors of BisMark suggested some changes to BisMark parameters that I could use to improve its performance, but I have not yet updated the benchmark with those changes.

                  Comment


                  • #10
                    As both MethylCoder and Bismark employ a very similar strategy, I would imagine that the results are very similar. By the way my last name is spelled Krueger :P.

                    Comment


                    • #11
                      Originally posted by fkrueger View Post
                      By the way my last name is spelled Krueger :P.
                      As someone who repeatedly has their last name misspelled, I sincerely apologize.

                      And yes, the results between MethylCoder (with bowtie) and BisMark are quite similar.

                      Comment


                      • #12
                        Originally posted by fkrueger View Post
                        As both MethylCoder and Bismark employ a very similar strategy, I would imagine that the results are very similar. By the way my last name is spelled Krueger :P.
                        Brent and Felix -- thanks very much for your helpful replies. The epigenetics sequencing community needs some good benchmarks, such as RGASP, to test the plethora of algorithms being developed.

                        Will either or both of you be attending the HiTSeq SIG at ISMB next week? I would be delighted to meet up with you.

                        Comment


                        • #13
                          Hi dychiang,
                          I won't be attending the HiTSeq SIG but I'll be at ISMB from Sunday til Wednesday. I'm happy to meet up with you, either drop me an email ([email protected]) or find me at my poster (Poster U59: Analysing allele-specific NGS datasets using ASAP)

                          Comment


                          • #14
                            Hi Derek, I wont be at that conference, but feel free to send me an email.

                            Comment


                            • #15
                              So essentially MethylCoder can map only perfect color space reads which align
                              perfectly against the genome.

                              As it has been pointed out in a previous thread you started
                              (http://seqanswers.com/forums/showthread.php?t=7979), it is generally problematic to
                              do a naive translation of color to base space, then converting C=>T and translating
                              back to color space, as a single measurement error in the color space read will be
                              translated into a false nucleotide sequence. Depending on where in the read this
                              measurement error occurs, the sequence either can't be mapped anymore (which means a
                              low mapping efficiency) or it will map to a wrong position in the genome and thus
                              result in false methylation calls.

                              You are now suggesting that all unmapped reads should instead be aligned with
                              SOCS-B, which - even though it is a good tool - is incredibly slow for complex
                              genomes and many reads.

                              Therefore, isn't it a quite bold statement to state in the paper that "MethylCoder
                              is a novel tool that allows ... mapping in both color and nucleotide-space -
                              something that no other BS-Seq software allows"?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X