Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

alignment of bisulfite treated reads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.

    Comment


    • #17
      Originally posted by lh3 View Post
      From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.

      Thanks for the heads-up on GSNAP. I just had a look at the paper. It looks very nice. Particularly if they release a colorspace version, I am stuck with SOLiD colorspace data at present I ended up using SHRiMP with a hypermethylated genome (so C's in CpG context are retained) to match on.

      Re: GSNAP bisulfite seq
      In bisulfite mode the program produces two new hash tables, one with C-to-T substitutions and the other having G-to-A substitutions. From the paper: "When GSNAP processes a bisulfite read, it performs a C-to-T substitution of each 12-mer in the read to check against the C-to-T hash table, and a G-to-A substitution of each 12-mer in the reverse complement of the read to check against the G-to-A hash table."

      So, essentially it creates a bisulfite hypomethylated genome and then looks for seed matches within in silico "hypomethylated reads". So all seed matching is in a three base space with no C's present at all. BSMAP is a little cannier. Reads don't have C's removed. Instead, read C's are matched to C's in the reference while T's can be matched to C's or T's iff they come from the read. Another way of thinking about this is that Illumina reads have T's converted to Y's and are matched against a standard (not in silico bisulfite converted) reference genome. In this respect the C's present in the read help to eliminate more dubious alignment candidates; so a slightly more information dense match than purely 3 base matching. An interesting effect is that improperly bisulfite converted material (that containing many unconverted C's) will align as equally well as properly converted material. More work in downstream filtering perhaps but a better estimate of bisulfite conversion instead of just adding up all the C's in mitochrondrial DNA mapped reads.
      Last edited by sci_guy; 03-22-2010, 03:02 PM.

      Comment


      • #18
        @sci_guy

        Yes, BSMAP is better in mapping strategy, although I do not know how much practical improvement this may lead to. It would be good to see a head-to-head comparison. Thanks for the information.

        Comment


        • #19
          @lh3. I'm going to workshop over the next couple of days. It seems somebody else in my organisation has been using BSMAP with Arabidopsis bisulphite-Seq data. Below is their talk abstract. BSMAP would be particularly good for plant genomes considering all the CNG and CNN methylation. I'll see if I can get any slides.

          "Hua Ying (CSIRO)
          Approaches to mapping high-throughput bisulfite sequencing reads: High-throughput bisulfite sequencing is an attractive approach for analyzing genome-wide methylation patterns at a single-base-pair resolution. Although combining bisulfite conversion and high-throughput sequencing is increasingly widespread, its analysis is still problematic and limited to a few publications. A major challenge is the alignment of bisulfite-converted short reads to the reference genome due to increased search space and reduced sequence complexity as a result of the bisulfite conversion. Here, we took advantage of a recently published mapping algorithm BSMAP and demonstrated that BSMAP is more effective than previously used methods. By applying a two-step mapping strategy, we successfully mapped more than 90% of bisulfite short reads to the Arabidopsis genome."

          Comment


          • #20
            thanks sci_guy
            --
            bioinfosm

            Comment


            • #21
              Hua used an interesting recursive strategy to map more maps back to the Arabidopsis genome. After aligning she took the unmapped reads and chopping off the first base and the last few bases, then with recursive rounds of aligning and progressively chopping off more 3' end bases got 90% of reads to map. It seems the reads mapped back in the 2nd and later rounds were actually meaningful. Quite impressive.

              I also found out Stuart Stephen from the CSIRO plant industry group has also baked up a really nice aligner that is robust to bisulfite. The paper is coming soon...

              Comment


              • #22
                I'm reading the GSNAP paper more throughly now as it looks really good for a project I'm involved with - variant detection in a region of linkage.

                The last sentence of the introduction is: "The data structures in GSNAP allow it to align BS-seq reads with explicit detection of genomic-T to read-C mismatches, against either a reference sequence or a SNP-tolerant reference space."

                From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.

                Comment


                • #23
                  Originally posted by sci_guy View Post
                  From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
                  The way I read it, they both function similarly in this respect. GSNAP hashes with a reduced alphabet, but will only allow C->T changes when it actually assesses the alignments. So they are both making use of reference C information, but neither of them will know the difference between methylation and incomplete conversion.

                  As far as I can tell from the papers, they should theoretically have the same sensitivity and specificity with respect to bisulfite changes.

                  Comment


                  • #24
                    Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.

                    Comment


                    • #25
                      Originally posted by lh3 View Post
                      Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
                      I tried this in GSNAP (you have to pad everything to reach min lengths) and it chose ACGTTCA.

                      I think the confusion is coming from that last sentence of the intro that sci_guy quoted...when they say "explicit detection", I think they just intended that to mean it can tell T->C apart from C->T, and treat T->C appropriately as an error.

                      Comment


                      • #26
                        Yes, this makes sense. Thank you, ondovb.

                        Comment


                        • #27
                          Originally posted by ondovb View Post
                          I tried this in GSNAP (you have to pad everything to reach min lengths) and it chose ACGTTCA.
                          Cool. Thanks for that.

                          There's nothing like empirical data to disprove an hypothesis

                          Comment


                          • #28
                            Originally posted by sci_guy View Post
                            Thanks! I'll take a look. I have more SOLiD data coming my way soon.
                            hi sci_guy,
                            any news on this? are there short read aligner other than SOCS2 capable of mapping BS reads in colorspace?
                            best, volks

                            Comment


                            • #29
                              Originally posted by volks View Post
                              hi sci_guy,
                              any news on this?
                              Not that I've heard of.

                              Comment


                              • #30
                                has anyone tried BSSeeker?

                                From what I can tell it is a Bowtie wrapper but not sure how it compares to the others discussed here.
                                --
                                Jeremy Leipzig
                                Bioinformatics Programmer
                                --
                                My blog
                                Twitter

                                Comment

                                Working...
                                X