Why we develop Bis-SNP?
Identification and proper handling of SNPs in Bisulfite-seq are important for accurate quantication of methylation levels, especially so given the fact that C>T is the most common substitution in the human population (65% of all SNPs in dbSNP), and these usually occur in the CpG context. It is also required to identify SNPs for sequence dependent allele specific methylation analysis.
SNP calling of bisulte sequencing data has significant complications. First, reads from the two genomic strands are not complementary, and this assumption of complementarity is made by all SNP calling algorithms. Second, true (evolutionary) C>T SNPs in the sample cannot be distinguished from C>T substitutions that are caused by bisulfite conversion and can thus be misidentied as unmethylated Cs.
Currently, there is no other public available tool for SNP calling in Bisulfite-seq data. We implement and test all the methods described in current published methylome papers. None of their SNP detection methods works well unless applying additional matched non-bisulte sequencing data in the same sample/strain.
We have therefore developed a new tool, called Bis-SNP, for the accurate SNP and methylation analysis of BS-seq data. Bis-SNP is a software package mainly written in Java that is based on the GATK map-reduce framework. All associated files can be downloaded at:
How does it work?
Bis-SNP uses Bayesian inference to evaluate a model of strand-specic base calls and base call quality scores, along with prior information on population SNP frequencies, experiment-specic bisulte conversion efficiency, and site-specic DNA methylation estimates.
It also enable base call quality score recalibration in Bisulfite-seq, an addition that has greatly improved SNP calling in the non-bisulte context. Since very few Bisulfite-mapping tool right now could do gapped alignment to detect indels, which would cause a lot of fake SNPs around indels, Bis-SNP also enables a local indel realignment in Bisulfite-seq. Bis-SNP is open-source and based on the Genome Analysis Toolkit (GATK) framework, in order to take advantage of the parallel Map-Reduce computation strategy and
provide practical execution times.
Bis-SNP accepts either single-end or paired-end mapped Bisulte-seq/NOMe-seq/RRBS data in the form of BAM files, and outputs SNP and methylation information using standard VCF formats and bed/bedDetail/bedGraph/wig formats.
Bis-SNP allows to call and summarize methylation of any cytosine context user provided (CpG, CHH, CHG, GCH et.al.), which enables its widely adaptation to different kinds of bisulfite treated sequencing data, e.g. Bisulte-seq/NOMe-seq/RRBS.
Bis-SNP provides a bunch of perl scripts to easy handel the output file format conversion and the whole genotyping and methylation calling pipeline.
Bis-SNP performance?
We have validated the specificity and sensitivity of SNP detection by Bisulfite-seq and Illumina 1M SNP array in the same sample. In default threshold (Phred scale score > 20) and test sample sequence depth(30X), it could detect 92.21% heterozygous SNPs with 0.14% false positive rate (90.88% sensitivity in C/T SNPs with 0.16% false positive rate, 98.51% sensitivity in non C/T SNPs with 0.16% false positive rate). In 10X sequence depth single sample, it could still detect 80% of the heterozygous SNPs and 98% of homozygous cytosines within FDR<0.05.
We show that Bis-SNP is a practical tool that can both (1) improve DNA methylation calling accuracy by detecting SNPs at cytosines and adjacent positions and (2) identify heterozygous SNPs that can be used to investigate mono-allelic DNA methylation and polymorphisms in cis-regulatory sequences.
Publication:
Liu Y, Siegmund KD, Laird PW, Berman BP. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biology 2012 Jul 11;13(7):R61.
Identification and proper handling of SNPs in Bisulfite-seq are important for accurate quantication of methylation levels, especially so given the fact that C>T is the most common substitution in the human population (65% of all SNPs in dbSNP), and these usually occur in the CpG context. It is also required to identify SNPs for sequence dependent allele specific methylation analysis.
SNP calling of bisulte sequencing data has significant complications. First, reads from the two genomic strands are not complementary, and this assumption of complementarity is made by all SNP calling algorithms. Second, true (evolutionary) C>T SNPs in the sample cannot be distinguished from C>T substitutions that are caused by bisulfite conversion and can thus be misidentied as unmethylated Cs.
Currently, there is no other public available tool for SNP calling in Bisulfite-seq data. We implement and test all the methods described in current published methylome papers. None of their SNP detection methods works well unless applying additional matched non-bisulte sequencing data in the same sample/strain.
We have therefore developed a new tool, called Bis-SNP, for the accurate SNP and methylation analysis of BS-seq data. Bis-SNP is a software package mainly written in Java that is based on the GATK map-reduce framework. All associated files can be downloaded at:
How does it work?
Bis-SNP uses Bayesian inference to evaluate a model of strand-specic base calls and base call quality scores, along with prior information on population SNP frequencies, experiment-specic bisulte conversion efficiency, and site-specic DNA methylation estimates.
It also enable base call quality score recalibration in Bisulfite-seq, an addition that has greatly improved SNP calling in the non-bisulte context. Since very few Bisulfite-mapping tool right now could do gapped alignment to detect indels, which would cause a lot of fake SNPs around indels, Bis-SNP also enables a local indel realignment in Bisulfite-seq. Bis-SNP is open-source and based on the Genome Analysis Toolkit (GATK) framework, in order to take advantage of the parallel Map-Reduce computation strategy and
provide practical execution times.
Bis-SNP accepts either single-end or paired-end mapped Bisulte-seq/NOMe-seq/RRBS data in the form of BAM files, and outputs SNP and methylation information using standard VCF formats and bed/bedDetail/bedGraph/wig formats.
Bis-SNP allows to call and summarize methylation of any cytosine context user provided (CpG, CHH, CHG, GCH et.al.), which enables its widely adaptation to different kinds of bisulfite treated sequencing data, e.g. Bisulte-seq/NOMe-seq/RRBS.
Bis-SNP provides a bunch of perl scripts to easy handel the output file format conversion and the whole genotyping and methylation calling pipeline.
Bis-SNP performance?
We have validated the specificity and sensitivity of SNP detection by Bisulfite-seq and Illumina 1M SNP array in the same sample. In default threshold (Phred scale score > 20) and test sample sequence depth(30X), it could detect 92.21% heterozygous SNPs with 0.14% false positive rate (90.88% sensitivity in C/T SNPs with 0.16% false positive rate, 98.51% sensitivity in non C/T SNPs with 0.16% false positive rate). In 10X sequence depth single sample, it could still detect 80% of the heterozygous SNPs and 98% of homozygous cytosines within FDR<0.05.
We show that Bis-SNP is a practical tool that can both (1) improve DNA methylation calling accuracy by detecting SNPs at cytosines and adjacent positions and (2) identify heterozygous SNPs that can be used to investigate mono-allelic DNA methylation and polymorphisms in cis-regulatory sequences.
Publication:
Liu Y, Siegmund KD, Laird PW, Berman BP. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biology 2012 Jul 11;13(7):R61.
Comment