From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by lh3 View PostFrom the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
Thanks for the heads-up on GSNAP. I just had a look at the paper. It looks very nice. Particularly if they release a colorspace version, I am stuck with SOLiD colorspace data at present I ended up using SHRiMP with a hypermethylated genome (so C's in CpG context are retained) to match on.
Re: GSNAP bisulfite seq
In bisulfite mode the program produces two new hash tables, one with C-to-T substitutions and the other having G-to-A substitutions. From the paper: "When GSNAP processes a bisulfite read, it performs a C-to-T substitution of each 12-mer in the read to check against the C-to-T hash table, and a G-to-A substitution of each 12-mer in the reverse complement of the read to check against the G-to-A hash table."
So, essentially it creates a bisulfite hypomethylated genome and then looks for seed matches within in silico "hypomethylated reads". So all seed matching is in a three base space with no C's present at all. BSMAP is a little cannier. Reads don't have C's removed. Instead, read C's are matched to C's in the reference while T's can be matched to C's or T's iff they come from the read. Another way of thinking about this is that Illumina reads have T's converted to Y's and are matched against a standard (not in silico bisulfite converted) reference genome. In this respect the C's present in the read help to eliminate more dubious alignment candidates; so a slightly more information dense match than purely 3 base matching. An interesting effect is that improperly bisulfite converted material (that containing many unconverted C's) will align as equally well as properly converted material. More work in downstream filtering perhaps but a better estimate of bisulfite conversion instead of just adding up all the C's in mitochrondrial DNA mapped reads.Last edited by sci_guy; 03-22-2010, 03:02 PM.
Comment
-
@lh3. I'm going to workshop over the next couple of days. It seems somebody else in my organisation has been using BSMAP with Arabidopsis bisulphite-Seq data. Below is their talk abstract. BSMAP would be particularly good for plant genomes considering all the CNG and CNN methylation. I'll see if I can get any slides.
"Hua Ying (CSIRO)
Approaches to mapping high-throughput bisulfite sequencing reads: High-throughput bisulfite sequencing is an attractive approach for analyzing genome-wide methylation patterns at a single-base-pair resolution. Although combining bisulfite conversion and high-throughput sequencing is increasingly widespread, its analysis is still problematic and limited to a few publications. A major challenge is the alignment of bisulfite-converted short reads to the reference genome due to increased search space and reduced sequence complexity as a result of the bisulfite conversion. Here, we took advantage of a recently published mapping algorithm BSMAP and demonstrated that BSMAP is more effective than previously used methods. By applying a two-step mapping strategy, we successfully mapped more than 90% of bisulfite short reads to the Arabidopsis genome."
Comment
-
Hua used an interesting recursive strategy to map more maps back to the Arabidopsis genome. After aligning she took the unmapped reads and chopping off the first base and the last few bases, then with recursive rounds of aligning and progressively chopping off more 3' end bases got 90% of reads to map. It seems the reads mapped back in the 2nd and later rounds were actually meaningful. Quite impressive.
I also found out Stuart Stephen from the CSIRO plant industry group has also baked up a really nice aligner that is robust to bisulfite. The paper is coming soon...
Comment
-
I'm reading the GSNAP paper more throughly now as it looks really good for a project I'm involved with - variant detection in a region of linkage.
The last sentence of the introduction is: "The data structures in GSNAP allow it to align BS-seq reads with explicit detection of genomic-T to read-C mismatches, against either a reference sequence or a SNP-tolerant reference space."
From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
Comment
-
Originally posted by sci_guy View PostFrom my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
As far as I can tell from the papers, they should theoretically have the same sensitivity and specificity with respect to bisulfite changes.
Comment
-
Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
Comment
-
Originally posted by lh3 View PostSuppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
I think the confusion is coming from that last sentence of the intro that sci_guy quoted...when they say "explicit detection", I think they just intended that to mean it can tell T->C apart from C->T, and treat T->C appropriately as an error.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment