From the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by lh3 View PostFrom the gsnap paper, it seems also a decent open-source tool. I have not tried, though.
Thanks for the heads-up on GSNAP. I just had a look at the paper. It looks very nice. Particularly if they release a colorspace version, I am stuck with SOLiD colorspace data at present I ended up using SHRiMP with a hypermethylated genome (so C's in CpG context are retained) to match on.
Re: GSNAP bisulfite seq
In bisulfite mode the program produces two new hash tables, one with C-to-T substitutions and the other having G-to-A substitutions. From the paper: "When GSNAP processes a bisulfite read, it performs a C-to-T substitution of each 12-mer in the read to check against the C-to-T hash table, and a G-to-A substitution of each 12-mer in the reverse complement of the read to check against the G-to-A hash table."
So, essentially it creates a bisulfite hypomethylated genome and then looks for seed matches within in silico "hypomethylated reads". So all seed matching is in a three base space with no C's present at all. BSMAP is a little cannier. Reads don't have C's removed. Instead, read C's are matched to C's in the reference while T's can be matched to C's or T's iff they come from the read. Another way of thinking about this is that Illumina reads have T's converted to Y's and are matched against a standard (not in silico bisulfite converted) reference genome. In this respect the C's present in the read help to eliminate more dubious alignment candidates; so a slightly more information dense match than purely 3 base matching. An interesting effect is that improperly bisulfite converted material (that containing many unconverted C's) will align as equally well as properly converted material. More work in downstream filtering perhaps but a better estimate of bisulfite conversion instead of just adding up all the C's in mitochrondrial DNA mapped reads.Last edited by sci_guy; 03-22-2010, 03:02 PM.
Comment
-
@lh3. I'm going to workshop over the next couple of days. It seems somebody else in my organisation has been using BSMAP with Arabidopsis bisulphite-Seq data. Below is their talk abstract. BSMAP would be particularly good for plant genomes considering all the CNG and CNN methylation. I'll see if I can get any slides.
"Hua Ying (CSIRO)
Approaches to mapping high-throughput bisulfite sequencing reads: High-throughput bisulfite sequencing is an attractive approach for analyzing genome-wide methylation patterns at a single-base-pair resolution. Although combining bisulfite conversion and high-throughput sequencing is increasingly widespread, its analysis is still problematic and limited to a few publications. A major challenge is the alignment of bisulfite-converted short reads to the reference genome due to increased search space and reduced sequence complexity as a result of the bisulfite conversion. Here, we took advantage of a recently published mapping algorithm BSMAP and demonstrated that BSMAP is more effective than previously used methods. By applying a two-step mapping strategy, we successfully mapped more than 90% of bisulfite short reads to the Arabidopsis genome."
Comment
-
Hua used an interesting recursive strategy to map more maps back to the Arabidopsis genome. After aligning she took the unmapped reads and chopping off the first base and the last few bases, then with recursive rounds of aligning and progressively chopping off more 3' end bases got 90% of reads to map. It seems the reads mapped back in the 2nd and later rounds were actually meaningful. Quite impressive.
I also found out Stuart Stephen from the CSIRO plant industry group has also baked up a really nice aligner that is robust to bisulfite. The paper is coming soon...
Comment
-
I'm reading the GSNAP paper more throughly now as it looks really good for a project I'm involved with - variant detection in a region of linkage.
The last sentence of the introduction is: "The data structures in GSNAP allow it to align BS-seq reads with explicit detection of genomic-T to read-C mismatches, against either a reference sequence or a SNP-tolerant reference space."
From my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
Comment
-
Originally posted by sci_guy View PostFrom my interpretation GSNAP will penalise improperly converted bisulfite reads, but will not make use of the "C" information present in the read, while BSMAP will happily align improperly converted reads but can make use of "C" information.
As far as I can tell from the papers, they should theoretically have the same sensitivity and specificity with respect to bisulfite changes.
Comment
-
Suppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
Comment
-
Originally posted by lh3 View PostSuppose the original genomic sequence is ACGTTCA and another position has sequence ATGTTCA. The 2nd C is unmethylated. One of the possible reads you can get is ACGTTtA. According to sci_guy's description, BSMAP prefers ACGTTCA in mapping, but gsnap regards the alignment ambiguous.
I think the confusion is coming from that last sentence of the intro that sci_guy quoted...when they say "explicit detection", I think they just intended that to mean it can tell T->C apart from C->T, and treat T->C appropriately as an error.
Comment
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Comment