This is an announcement of the release of BSMAP v2.4, a powerful bisulfite mapping program. This version improved a lot on the run time performance, while maintaining high accuracy and flexibility of previous versions.
Using 8 threads, BSMAP-2.4 can map 28M 76nt pair-end WGBS reads to the human genome in about 2 hours (allow up to 6 mismatches), including the genome indexing time. The memory usage is ~9GB. (CPU: Intel Xeon X5690)
We tested the mapping accuracy using simulated bisulfite reads. BSMAP have significantly higher mapping accuracy than most other bisulfite mapping programs, especially for reads with more than 3 mismatches.
BSMAP is freely available at http://code.google.com/p/bsmap/
Main features of BSMAP-2.4:
1. Reads are directly mapped to original reference genome sequence, no need to preprocess the reads and reference genome to convert C to T.
2. Support both whole genome bisulfite sequencing (WGBS) mode and reduced representation bisulfite sequencing (RRBS) mode. In RRBS mode, reads are guaranteed to be mapped to digestion sites to increase accuracy. The digestion site information can be also changed to support different digestion enzymes.
3. Support both "Lister protocol" (sequence 2 forward strands only) and "Cokus protocol" (sequence all 4 bisulfite converted strands)
4. Support trimming adapter sequences and low quality nucleotides from 3'end of reads
5. Allow trade off between speed/memory usage/mapping sensitivity. For human genome, the RRBS mode uses ~3GB. In WGBS mode, the typical memory usage is ~9GB, but can be as low as 5GB.
6. Allow alignment for other nucleotide transitions, for example, can be set to detect the A=>I(G) transition in RNA editing.
7. Include down stream script to extract methylation ratios from mapping results.
8. Fasta/Fastq/SAM format input, text/SAM output. single/pair-end mapping. read length up to 144nt, max 15 mismatches allowed.
We encourage you try this new version for short bisulfite reads mapping.
Any comments/suggestions/bug reports will be appreciated.
Thank you,
Yuanxin Xi
Using 8 threads, BSMAP-2.4 can map 28M 76nt pair-end WGBS reads to the human genome in about 2 hours (allow up to 6 mismatches), including the genome indexing time. The memory usage is ~9GB. (CPU: Intel Xeon X5690)
We tested the mapping accuracy using simulated bisulfite reads. BSMAP have significantly higher mapping accuracy than most other bisulfite mapping programs, especially for reads with more than 3 mismatches.
BSMAP is freely available at http://code.google.com/p/bsmap/
Main features of BSMAP-2.4:
1. Reads are directly mapped to original reference genome sequence, no need to preprocess the reads and reference genome to convert C to T.
2. Support both whole genome bisulfite sequencing (WGBS) mode and reduced representation bisulfite sequencing (RRBS) mode. In RRBS mode, reads are guaranteed to be mapped to digestion sites to increase accuracy. The digestion site information can be also changed to support different digestion enzymes.
3. Support both "Lister protocol" (sequence 2 forward strands only) and "Cokus protocol" (sequence all 4 bisulfite converted strands)
4. Support trimming adapter sequences and low quality nucleotides from 3'end of reads
5. Allow trade off between speed/memory usage/mapping sensitivity. For human genome, the RRBS mode uses ~3GB. In WGBS mode, the typical memory usage is ~9GB, but can be as low as 5GB.
6. Allow alignment for other nucleotide transitions, for example, can be set to detect the A=>I(G) transition in RNA editing.
7. Include down stream script to extract methylation ratios from mapping results.
8. Fasta/Fastq/SAM format input, text/SAM output. single/pair-end mapping. read length up to 144nt, max 15 mismatches allowed.
We encourage you try this new version for short bisulfite reads mapping.
Any comments/suggestions/bug reports will be appreciated.
Thank you,
Yuanxin Xi
Comment