Announcement

Collapse
No announcement yet.

alignment of bisulfite treated reads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • alignment of bisulfite treated reads

    Hi,

    I would like to know if any of the available next-gen alignment algorithms like maq, bwa, bowtie or others are able to align bisulfite treated reads from a methylation-seq experiment.

    This is a rather tricky alignment because it requires that C's in the reference sequence be allowed to align against T's in the bisulfite-treated reads, without a penalty.

    Maybe one possiblity is to use alignment algorithms with a custom scoring matrix?

  • #2
    I am aware that novoalign has bisulphite sequencing alignment function built in, but not sure about the performance.

    Comment


    • #3
      Originally posted by fadista View Post
      Hi,

      I would like to know if any of the available next-gen alignment algorithms like maq, bwa, bowtie or others are able to align bisulfite treated reads from a methylation-seq experiment.

      This is a rather tricky alignment because it requires that C's in the reference sequence be allowed to align against T's in the bisulfite-treated reads, without a penalty.

      Maybe one possiblity is to use alignment algorithms with a custom scoring matrix?
      I have tried BFAST and MAQ (and BWA) to do this. For BFAST, there are details in the reference manual.

      Comment


      • #4
        novoalign can do bisulfite sequencing, but novoalign is not free charge.

        Comment


        • #5
          If you're using Illumina the easiest (bias-free) way is to preprocess your bisulphite reads to convert C's to T's (remembering where they are) and align it to a reference with all C's changed to T's. Then write a script to introduce the C's back in, or relate these as tables in a database.

          As for SOLiD, all this is horrible in colorspace. If you're trying to avoid alignment bias due to methylation differences SOLiD has some bioinformatic issues. You're required to permute the reference a great deal or slacken up the mismatches allowed, sorting out the noise later down the track. If you convert SOLiD reads back into basespace you'll pay a fairly reasonable price - any errors in the read will frameshift base calls 3' to the error <Grumble> <Grumble>

          Nils, have you tried aligning SOLiD bisulphite reads?

          Comment


          • #6
            Originally posted by sci_guy View Post
            If you're using Illumina the easiest (bias-free) way is to preprocess your bisulphite reads to convert C's to T's (remembering where they are) and align it to a reference with all C's changed to T's. Then write a script to introduce the C's back in, or relate these as tables in a database.

            As for SOLiD, all this is horrible in colorspace. If you're trying to avoid alignment bias due to methylation differences SOLiD has some bioinformatic issues. You're required to permute the reference a great deal or slacken up the mismatches allowed, sorting out the noise later down the track. If you convert SOLiD reads back into basespace you'll pay a fairly reasonable price - any errors in the read will frameshift base calls 3' to the error <Grumble> <Grumble>

            Nils, have you tried aligning SOLiD bisulphite reads?
            I don't share your disdain for colorspace since it is quite powerful. For example the false positive rate for SNPs is a lot lower since you need two specific errors next to each other to get a SNP.

            Anyhow, under the current chemistry, bisulphite sequencing would be difficult from a bioinformatic perspective for longer reads (>=50bp) on the SOLiD platform. You could consider a targeted pulldown on the Illumina platform to make up for the lower capacity (per dollar).

            Comment


            • #7
              Bis-Seq relies upon counting the C's vs T's in aligned reads so for an unbiased statistic you want alignment potential of a bisulphite-treated DNA read to be equivalent regardless of C density.

              With SOLiD you really want to align to a hypomethylated genome (No C's) and a hypermethylated genome (C's remain at CpG sites) since proprocessing the reads to convert C's to T's in colorspace is not possible. Reads with intermediate levels of methylation will be regarded as having SNPs in the alignment pipeline (two colorspace changes in a row). So, if your read has a fair number of CpG sites (say a read at a CpG island) and it goes over your alignment mismatch threshold it won't align when it is a perfectly good read. This creates a confounder where there is lowered alignment potential to high density CpG regions within the genome and to CpG sites near high population frequency SNPs or INDELs. You can counter for this by relaxing the number of mismatches allowed (and introduce false positive alignments) or align to a number of permuted bisulphite references. Preprocessed reads with Illumina have none of these issues. If you have a plant genome with CNG and CNN methylation then SOLiD is not a wise choice at all.

              I'm not some sort of Illumina fan-boy. I originally chose SOLiD owing to error checking built into colorspace and the increased number of reads per dollar. However for a second experiment I've swapped to Illumina owing to the potential alignment bias issue and Illumina's increases in bandwidth later this year.

              Comment


              • #8
                Originally posted by nilshomer View Post
                I don't share your disdain for colorspace since it is quite powerful. For example the false positive rate for SNPs is a lot lower since you need two specific errors next to each other to get a SNP.

                Anyhow, under the current chemistry, bisulphite sequencing would be difficult from a bioinformatic perspective for longer reads (>=50bp) on the SOLiD platform. You could consider a targeted pulldown on the Illumina platform to make up for the lower capacity (per dollar).
                have you compared the same sample sequenced by illumina vs solid? personally i am quite platform agnostic now that they have comparable levels of throughput and read length, however unless anybody has sequenced the same sample on both platforms i am still not decided as to which gives the best combination of cost vs read length vs throughput

                however, i definitely agree with sci_guy - solid colorspace is currently quite useless for bisulfite sequencing... this can be overcome bioinformatically (computationally expensive) however no-one has attempted this as yet.

                Comment


                • #9
                  We just released SOCS version 2, which has a mode that is fully bisulfite-tolerant for SOLiD data. It's available at:

                  http://solidsoftwaretools.com/gf/project/socs/

                  It will take longer than using a standard algorithm with converted genomes (due to the complexity of the problem), but there won't be any bias in the results.

                  Comment


                  • #10
                    bsmap is another tool. I have used it on bisulphite reads and it seems to work well
                    --
                    bioinfosm

                    Comment


                    • #11
                      Originally posted by ondovb View Post
                      We just released SOCS version 2, which has a mode that is fully bisulfite-tolerant for SOLiD data.
                      Thanks! I'll take a look. I have more SOLiD data coming my way soon.

                      Comment


                      • #12
                        Originally posted by bioinfosm View Post
                        bsmap is another tool. I have used it on bisulphite reads and it seems to work well
                        I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

                        It's probably the best Illumina bisulfite aligner out there at the moment.

                        Comment


                        • #13
                          Originally posted by sci_guy View Post
                          I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

                          It's probably the best Illumina bisulfite aligner out there at the moment.
                          Thats interesting to know. Is it possible for you to share that talk/slides?
                          --
                          bioinfosm

                          Comment


                          • #14
                            novoalign and gsnap (http://www.gene.com/share/gmap/) also do bisulfite alignment. So far as I know all existing programs for bisulfite alignment take very similar strategy.

                            Comment


                            • #15
                              I don't have access to the slides but the material is covered essentially in their BSMAP paper.

                              lh3 - Yes, I forgot about Novoalign. I should qualify my statement and suggest that BSMAP is perhaps the best free bisulfite aligner out there at present.

                              Comment

                              Working...
                              X