Header Leaderboard Ad


Bis-SNP: An accurate SNP and methylation calling tool for Bisulfite-Seq/NOMe-seq/RRBS



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bis-SNP: An accurate SNP and methylation calling tool for Bisulfite-Seq/NOMe-seq/RRBS

    Why we develop Bis-SNP?

    Identifi cation and proper handling of SNPs in Bisulfite-seq are important for accurate quanti cation of methylation levels, especially so given the fact that C>T is the most common substitution in the human population (65% of all SNPs in dbSNP), and these usually occur in the CpG context. It is also required to identify SNPs for sequence dependent allele specific methylation analysis.
    SNP calling of bisul te sequencing data has signifi cant complications. First, reads from the two genomic strands are not complementary, and this assumption of complementarity is made by all SNP calling algorithms. Second, true (evolutionary) C>T SNPs in the sample cannot be distinguished from C>T substitutions that are caused by bisulfi te conversion and can thus be misidenti ed as unmethylated Cs.

    Currently, there is no other public available tool for SNP calling in Bisulfite-seq data. We implement and test all the methods described in current published methylome papers. None of their SNP detection methods works well unless applying additional matched non-bisul te sequencing data in the same sample/strain.

    We have therefore developed a new tool, called Bis-SNP, for the accurate SNP and methylation analysis of BS-seq data. Bis-SNP is a software package mainly written in Java that is based on the GATK map-reduce framework. All associated files can be downloaded at:


    How does it work?

    Bis-SNP uses Bayesian inference to evaluate a model of strand-speci c base calls and base call quality scores, along with prior information on population SNP frequencies, experiment-speci c bisul te conversion efficiency, and site-speci c DNA methylation estimates.

    It also enable base call quality score recalibration in Bisulfite-seq, an addition that has greatly improved SNP calling in the non-bisul te context. Since very few Bisulfite-mapping tool right now could do gapped alignment to detect indels, which would cause a lot of fake SNPs around indels, Bis-SNP also enables a local indel realignment in Bisulfite-seq. Bis-SNP is open-source and based on the Genome Analysis Toolkit (GATK) framework, in order to take advantage of the parallel Map-Reduce computation strategy and
    provide practical execution times.

    Bis-SNP accepts either single-end or paired-end mapped Bisul te-seq/NOMe-seq/RRBS data in the form of BAM fi les, and outputs SNP and methylation information using standard VCF formats and bed/bedDetail/bedGraph/wig formats.

    Bis-SNP allows to call and summarize methylation of any cytosine context user provided (CpG, CHH, CHG, GCH et.al.), which enables its widely adaptation to different kinds of bisulfite treated sequencing data, e.g. Bisul te-seq/NOMe-seq/RRBS.

    Bis-SNP provides a bunch of perl scripts to easy handel the output file format conversion and the whole genotyping and methylation calling pipeline.

    Bis-SNP performance?

    We have validated the specificity and sensitivity of SNP detection by Bisulfite-seq and Illumina 1M SNP array in the same sample. In default threshold (Phred scale score > 20) and test sample sequence depth(30X), it could detect 92.21% heterozygous SNPs with 0.14% false positive rate (90.88% sensitivity in C/T SNPs with 0.16% false positive rate, 98.51% sensitivity in non C/T SNPs with 0.16% false positive rate). In 10X sequence depth single sample, it could still detect 80% of the heterozygous SNPs and 98% of homozygous cytosines within FDR<0.05.

    We show that Bis-SNP is a practical tool that can both (1) improve DNA methylation calling accuracy by detecting SNPs at cytosines and adjacent positions and (2) identify heterozygous SNPs that can be used to investigate mono-allelic DNA methylation and polymorphisms in cis-regulatory sequences.


    Liu Y, Siegmund KD, Laird PW, Berman BP. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biology 2012 Jul 11;13(7):R61.
    USC Epigenome Center

  • #2
    I have some PCR/amplicon derived NGS data for a single small gene region. It is 100bp PE Illumina data for only on the origninal top bisulphite converted strand.

    I assume I am unable to use Bis-SNP to call CpG SNP's as I have no data on the original bottom bisulphite converted strand, but am I able to use Bis-SNP to call SNP's at non-CpG bases? If not are you able to suggest any apps that allow me to get this non-CpG SNP data at the same time as the methylation data?



    • #3
      yes, in your condition, you can call all Non C/T SNPs and part of C/T SNP (when C on the reverse strand)
      USC Epigenome Center


      • #4
        I have some huge bam files I want to process using bissnp. I set the nt flag to 2 in order to use only 2 CPU. Running bissnp as shown in 3.3 in the manual with the additonal flag -nt 2, bissnp uses still all Cpu avaliable.
        Is there another possiblity to reduce the Cpu use?



        • #5
          It sounds weird..
          Could you please post your running message in our google group? I will see it and give you the solution there. Thanks!

          USC Epigenome Center


          • #6

            I am getting two different sets of snp result by same bam file when i used two different version. I am very much confused, which should i used for my further analysis.

            Thank You


            • #7
              Apologized. Do you mean the newest 1.0 version? I just migrated it to the new GATK framework and have not got time to benchmark the performance yet.
              USC Epigenome Center


              • #8
                Yes..And when i intersect both the file generated from two different version...I got only 5700 common snp.


                • #9
                  And the latest version output gave more number of snp as compared to previous one. I don,t know what is the reason. And very much confused. which file i should use for further analysis.


                  • #10
                    One more thing ..The raw vcf file generated by latest version is too large( Size in Gb) while previous version gave 80-90 Mb raw vcf files.

                    Thank You


                    • #11

                      I ran Bis-SNP1.0.0 from he prompt line using the following command:
                      (Java version: JDK8)
                      java -Xmx10g -jar BisSNP-1.0.0.jar
                      -R mm10.fa
                      -I file.bam
                      -T BisulfiteCountCovariates
                      -knownSites dbSNP-150.vcf
                      -cov ReadGroupCovariate
                      -cov QualityScoreCovariate
                      -cov CycleCovariate
                      -recalFile File.csv

                      and got the following result:
                      ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
                      ERROR This means that one or more arguments or inputs in your command are incorrect.
                      ERROR The error message below tells you what is the problem.
                      ERROR If the problem is an invalid argument, please check the online documentation guide
                      ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
                      ERROR Visit our website and forum for extensive documentation and answers to
                      ERROR commonly asked questions https://software.broadinstitute.org/gatk
                      ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
                      ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: BisulfiteCountCovariates

                      I can't find the BisulfiteCountCovariate.java walker in BisSNP files could that at least part of the probelm?

                      Thanks for helping


                      Latest Articles


                      • seqadmin
                        A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                        by seqadmin

                        ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                        01-24-2023, 01:19 PM
                      • seqadmin
                        Introduction to Single-Cell Sequencing
                        by seqadmin
                        Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                        The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                        01-09-2023, 03:10 PM