alignment of bisulfite treated reads

lh3 replied

03-25-2010, 03:36 PM
Yes, this makes sense. Thank you, ondovb.
Leave a comment:
ondovb replied

03-25-2010, 01:28 PM
Originally posted by lh3 View Post

Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.

I tried this in GSNAP (you have to pad everything to reach min lengths) and it chose ACGTTCA.

I think the confusion is coming from that last sentence of the intro that sci_guy quoted...when they say "explicit detection", I think they just intended that to mean it can tell T->C apart from C->T, and treat T->C appropriately as an error.
Leave a comment:
lh3 replied

03-25-2010, 11:37 AM
Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
Leave a comment:
ondovb replied

03-24-2010, 02:07 PM
Originally posted by sci_guy View Post

From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.

The way I read it, they both function similarly in this respect. GSNAP hashes with a reduced alphabet, but will only allow C->T changes when it actually assesses the alignments. So they are both making use of reference C information, but neither of them will know the difference between methylation and incomplete conversion.

As far as I can tell from the papers, they should theoretically have the same sensitivity and specificity with respect to bisulfite changes.
Leave a comment:
sci_guy replied

03-22-2010, 03:14 PM
I'm reading the GSNAP paper more throughly now as it looks really good for a project I'm involved with - variant detection in a region of linkage.

The last sentence of the introduction is: "The data structures in GSNAP allow it to align BS-seq reads with explicit detection of genomic-T to read-C mismatches, against either a reference sequence or a SNP-tolerant reference space."

From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
Leave a comment:
sci_guy replied

03-22-2010, 02:59 PM
Hua used an interesting recursive strategy to map more maps back to the Arabidopsis genome. After aligning she took the unmapped reads and chopping off the first base and the last few bases, then with recursive rounds of aligning and progressively chopping off more 3' end bases got 90% of reads to map. It seems the reads mapped back in the 2nd and later rounds were actually meaningful. Quite impressive.

I also found out Stuart Stephen from the CSIRO plant industry group has also baked up a really nice aligner that is robust to bisulfite. The paper is coming soon...
Leave a comment:
bioinfosm replied

03-22-2010, 06:18 AM
thanks sci_guy
Leave a comment:
sci_guy replied

03-15-2010, 07:17 PM
@lh3. I'm going to workshop over the next couple of days. It seems somebody else in my organisation has been using BSMAP with Arabidopsis bisulphite-Seq data. Below is their talk abstract. BSMAP would be particularly good for plant genomes considering all the CNG and CNN methylation. I'll see if I can get any slides.

"Hua Ying (CSIRO)
Approaches to mapping high-throughput bisulfite sequencing reads: High-throughput bisulfite sequencing is an attractive approach for analyzing genome-wide methylation patterns at a single-base-pair resolution. Although combining bisulfite conversion and high-throughput sequencing is increasingly widespread, its analysis is still problematic and limited to a few publications. A major challenge is the alignment of bisulfite-converted short reads to the reference genome due to increased search space and reduced sequence complexity as a result of the bisulfite conversion. Here, we took advantage of a recently published mapping algorithm BSMAP and demonstrated that BSMAP is more effective than previously used methods. By applying a two-step mapping strategy, we successfully mapped more than 90% of bisulfite short reads to the Arabidopsis genome."
Leave a comment:
lh3 replied

03-15-2010, 06:19 PM
@sci_guy

Yes, BSMAP is better in mapping strategy, although I do not know how much practical improvement this may lead to. It would be good to see a head-to-head comparison. Thanks for the information.
Leave a comment:
sci_guy replied

03-15-2010, 05:47 PM
Originally posted by lh3 View Post

From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.

Thanks for the heads-up on GSNAP. I just had a look at the paper. It looks very nice. Particularly if they release a colorspace version, I am stuck with SOLiD colorspace data at present I ended up using SHRiMP with a hypermethylated genome (so C's in CpG context are retained) to match on.

Re: GSNAP bisulfite seq
In bisulfite mode the program produces two new hash tables, one with C-to-T substitutions and the other having G-to-A substitutions. From the paper: "When GSNAP processes a bisulfite read, it performs a C-to-T substitution of each 12-mer in the read to check against the C-to-T hash table, and a G-to-A substitution of each 12-mer in the reverse complement of the read to check against the G-to-A hash table."

So, essentially it creates a bisulfite hypomethylated genome and then looks for seed matches within in silico "hypomethylated reads". So all seed matching is in a three base space with no C's present at all. BSMAP is a little cannier. Reads don't have C's removed. Instead, read C's are matched to C's in the reference while T's can be matched to C's or T's iff they come from the read. Another way of thinking about this is that Illumina reads have T's converted to Y's and are matched against a standard (not in silico bisulfite converted) reference genome. In this respect the C's present in the read help to eliminate more dubious alignment candidates; so a slightly more information dense match than purely 3 base matching. An interesting effect is that improperly bisulfite converted material (that containing many unconverted C's) will align as equally well as properly converted material. More work in downstream filtering perhaps but a better estimate of bisulfite conversion instead of just adding up all the C's in mitochrondrial DNA mapped reads.

Last edited by sci_guy; 03-22-2010, 03:02 PM.
Leave a comment:
lh3 replied

03-15-2010, 03:37 PM
From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
Leave a comment:
sci_guy replied

03-15-2010, 02:24 PM
I don't have access to the slides but the material is covered essentially in their BSMAP paper.

lh3 - Yes, I forgot about Novoalign. I should qualify my statement and suggest that BSMAP is perhaps the best free bisulfite aligner out there at present.
Leave a comment:
lh3 replied

03-15-2010, 08:50 AM
novoalign and gsnap (http://www.gene.com/share/gmap/) also do bisulfite alignment. So far as I know all existing programs for bisulfite alignment take very similar strategy.
Leave a comment:
bioinfosm replied

03-15-2010, 08:20 AM
Originally posted by sci_guy View Post

I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

It's probably the best Illumina bisulfite aligner out there at the moment.

Thats interesting to know. Is it possible for you to share that talk/slides?
Leave a comment:
sci_guy replied

03-12-2010, 09:30 PM
Originally posted by bioinfosm View Post

bsmap is another tool. I have used it on bisulphite reads and it seems to work well

I saw Wei Li talk about BSMAP at the AACR 2010 Cancer Epigenetics meeting. It was a nice talk. I like their use of what cytosines are present in the read to extract as much information as possible without creating bias.

It's probably the best Illumina bisulfite aligner out there at the moment.
Leave a comment:

Previous 1 2 3 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 49 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 50 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 43 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News