Seqanswers Leaderboard Ad

**fkrueger** · 06-17-2010, 07:41 AM

Thanks to some feedback which I got so far I fixed some of the initial release bugs.

I also changed the output format for both single-end and paired-end alignments slightly so that it is more concise and smaller. More details can be found at http://seqanswers.com/wiki/Custom_Bismark_output_format.

I'd like to mention that in order to get the most reliable output I recommend specifying the "--best" option for bowtie alignments. This will however take more time to complete. (For most applications we used Bismark so far we found the difference of specifying "--best" to be only marginal (less than 1% difference in mapped reads), therefore running it with the default parameters means a 2-3 x faster alignment altogether).

I am still happy for any kind of feedback!

**zee** · 06-17-2010, 07:24 PM

Is there support for SAM/BAM alignment format in Bismark. If so you could just plug and play with other bisulfite aligners.

**fkrueger** · 06-18-2010, 02:24 AM

Bismark is not a mere bisulfite alignment application, but it is also a methylation caller at the same time. To perform the methylation calls correctly it needs to know the conversion state of both the bisulfite read and the genome, as this will determine whether we have to look for C->T or G->A substitutions and whether we need to look up- or downstream to determine if the C was in CpG context or not.

To our knowledge most methylation mappers do only align reads in a C->T converted form and do not perform all four possible alignments in parallel (CT converted reads to CT converted genome, GA converted reads to CT converted genome, CT converted reads to GA converted genome and GA converted reads to GA converted genome), making this 4-alignment output unique to Bismark. As the SAM/BAM output of other mapping programs can't supply this information I don't see an easy way to mix in the output of other aligners and perform the methylation calls in Bismark.

Conversely, even though it might be possible to modify the code to get the four instances of Bowtie to report SAM output instead (or convert it to SAM/BAM format later on), this would possibly not make any difference to the downstream methylation call as it is not based on the alignment format but on read/genome conversion state. The Bismark output format does now contain only essential information, and as far as we are aware SAM format doesn't offer to store methylation information as such.

We experienced that wet lab scientists want to know the methylation state of Cytosines at base pair-resolution rather than getting read alignments which do not necessarily tell them much about the underlying methylation levels. Thus, the final format which will ultimately be handed out to the researchers themselves just contains information about the chromosome/position of an individual C, and whether it was methylated or not.

In summary , Bismark uses a unique logic for its alignments and methylation calling procedure which won't allow a lot of plug and play with other aligner outputs.

**olivertam** · 06-18-2010, 07:53 AM

Although I agree that methylation states can't be utilized in SAM/BAM, we (in our insubstantial experience) have found that people wish to visualize both the mappings and methylation status on a system such as the UCSC genome browser. While I'm aware of your program (SeqMonk) and fully agree with your approach in providing visualization through that means, it might be worthwhile to provide the mapping results obtained by Bismark as a SAM/BAM format through a simple conversion script to generate the necessary information (including the CIGAR string showing conversion relative to genome), thus giving an option for people to visualize their results on a system of their choice (which can handle BAM, for example).

On an unrelated note:
How do you present methylation status to the wet-lab biologists? One of the ways we analyze our data is to calculate bisulfite conversion rate by identifying C --> T conversion in a non-CpG context to get an idea of the quality of bisulfite conversion. We also like to give methylation "levels" per CpG based on the genomic position, and I'm wondering if you have a good way of converting your output into these variables, or if it's completely embedded into SeqMonk. Our ultimate goal is to show methylation as a BEDgraph-like track over the full genome.

Thanks for a great program. Looking forward to seeing the results.

**fkrueger** · 06-18-2010, 08:23 AM

Hi Oliver,

thanks for your comments. I am aware that many people are using the UCSC genome browser, therefore I am going look into the option of generating a more universal output in SAM/BAM format.

After an analysis is completed, Bismark will give a methylation call summary which will provides a very general overview of the methylation state in either CpG context or non-CpG context which might look like this:

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 170715821
Total methylated C's in non-CpG context: 10243954
Total methylated C's in CpG context: 8576784
Total C to T conversions in non-CpG context: 116666488
Total C to T conversions in CpG context: 35228595

C methylated but not in CpG context: 8.1%
C methylated in CpG context: 19.6%

This can be used to estimate the bisulfite conversion rate.

But as you guessed correctly, SeqMonk provides us with all the data quantitation and calculation (we) can currently think of. This includes filtering for a certain amount of reads per C, calculating methylation levels e.g. as % methylated, trends over certain features (e.g. CpG islands) etc.

I'll probably put up a quick example file of a methylation analysis to illustrate some of these aspects next week. Our goal is that the wet-lab scientists can - after the initial Bismark analysis - toy around with their data in SeqMonk as much as they like, which takes off work from our shoulders and gives them the feeling that they are analysing their data themselves rather than just handing it over to the Bioinformatics department.

**fkrueger** · 06-20-2010, 11:21 AM

I have made a few screenshots of different ways to analyse and present methylation data. We basically use SeqMonk for most of our graphical as well as quantitative tasks as it offers a large repertoire of useful tools to handle this kind of data.

All graphical results can be exported into annotated report files which can be used for plotting purposes or used in further analyses. Just have a look at the attached example file. (The data shown was taken from Meissner et al, Nature, 2008, PMID: 18600261; GSE11034)

Attached Files

presentation of methylation.ppt (151.0 KB, 576 views)

**olivertam** · 06-24-2010, 06:38 AM

Hi,

I have attached a beta version of a perl script that can convert Bismark single-end mapping to SAM format.

Usage:
bismark_to_SAM.pl -c [chrom sizes file] -i [bismark mapping output] -o [SAM output]

-c [chrom sizes file] - file containing length of chromosomes/sequences used for Bowtie mapping
-i [bismark mapping output] - file containing bismark mapping output
-o [SAM output] - name for output file in SAM format (default: [input].sam)

The chrom sizes file should be in the format of:

<chr> TAB <length> TAB <2bit file>

This file is typically obtained from UCSC genome browser download if you're using a model organism genome. However, as long as you have the file with the chromosome/sequence name in column 1 and length in column 2, the program should work. Please ensure the name of the chromosome in the chrom sizes file is the same as the name that bismark outputs (i.e. The name of the sequences you mapped against).

Please let me know if there's an issue.

Cheers,
Oliver

Attached Files

bismark_to_SAM.pl (6.3 KB, 163 views)

**fkrueger** · 08-03-2010, 07:27 AM

Thanks to feedback we got both via email and at ISMB in Boston we worked on some bugfixes and suggestions for Bismark, which are now available for download from http://www.bioinformatics.bbsrc.ac.uk/projects/.

The following features received some attention:

Bismark Genome Preparation

- If the specified genome directory does already contain a bisulfite genome folder, all contents of this directory will be removed before creating and indexing a new bisulfite genome

- The genome indexer will now convert DNA ambiguity code into N's before making the bisulfite genomes (anything other than C, A, T or G will appear as N afterwards)

- The indexer will now also handle fastA files with mutltiple sequence entries in addition to (a list of) fastA files in the specified genome folder

Methylation Extractor

- Fixed a bug whereby the single-end strand-specific output got two of the four possible strands mixed up. Also, the --ignore <int> option did previously offset some of the positions of the methylation calls by the <int> specified. Both features should now work as intended

- For paired-end alignments with rather short fragment length it is theoretically possible to read stretches of overlapping sequence with both read 1 and read 2. In order not to score the methylation calls for overlapping sequence twice, we added an option (--no_overlap) to score overlapping methylation calls only from the first read of a given alignment

Further comments and suggestions are most welcome!

**natstreet** · 10-06-2010, 10:13 AM

I'm interested to know if anyone looked further into getting data from bismark into different genome browsers. I have used bismark with paired end reads and I need to visualise the data in GBrowse. Usually I do this using a BAM file. The bismark2sam perl script attached above in this thread was designed for single end reads and I also had a problem when using it if I treat my data as single end because it didn't like the non ACGTN characters in my reference genome.

As well as needing to get the mapping results into GBrowse I also need to supply the methylated sequence of regions found to be differentially methylated. For non BS data I generate a pileup file and call a consensus but that relies on having a SAM file. How are other people handling this is give the lab people the BS converted sequence for subsequent PCR confirmation etc?

I also really like the suggestion made previously to have a bedgraph (or I would prefer BigWig) file for showing % conversion/methylation across chromosomes.

**olivertam** · 10-06-2010, 10:34 AM

Hi Natstreet,

I have included an updated version of the script that will handle any degenerate nucleotide in the reference. I'm still assuming the input (from Illumina output) is still A, C, T, G or N.
As for making the tool to handle paired end Bismark output, it's yet to be done. I'm afraid that I don't have much experience with paired-ends output, but if you have some Bismark paired-ends output that you don't mind using as a test dataset, I'd be happy to try and make it work.

Please let me know if there are problems with the new script

Cheers,
Oliver

Attached Files

bismark_to_SAM.pl (6.5 KB, 140 views)

**olivertam** · 10-06-2010, 11:31 AM

To follow up on the BEDGraph/BigWig idea, we have developed a workaround for this. Again, this is based on single-end analysis, so I haven't tested paired-ends

1) Use methylation-extractor on the Bismark output, with '--comprehensive' option ('--merge_non_CpG' is optional)
2) Run the following script (genome_methylation_bismark2bedGraph.pl). This script sorts the methylation extractor output, then parses the results to generate an "overall methylation level" as a BEDGraph file, with one sampled cytosine site per line.
3) Use the bedGraphToBigWig program (available online, I believe) to convert the BEDGraph to BigWig.

Here's the usage for the genome_methylation_bismark2bedGraph.pl

Usage: genome_methylation_count.pl (--cutoff [threshold] ) [Bismark methylation caller output] > [output]

--cutoff [threshold] - The minimum number of times a methylation state was
seen for that nucleotide before its methylation
percentage is reported.
Default is no threshold

The output file is a tab-delimited BedGraph file with the following information:

<Chromosome> <Start Position> <End Position> <Methylation Percentage>

Bismark methylation caller (v0.2.0 or later) should produce three output files
(CpG, CHG and CHH) when using the "--comprehensive" option
(Two files if using the "--merge_non_CpG" option).
To count both CpG and Non-CpG, combine the output files.

Bismark methylation caller (v0.1.5 or earlier) should produce two output files
(CpG and Non-CpG) when using the "--comprehensive" option.
To count both CpG and Non-CpG, combine the two output files.

Let me know if you have any questions, issues or bugs (since this is a workaround)

Cheers,
Oliver

Attached Files

genome_methylation_bismark2bedGraph.pl (4.4 KB, 178 views)

**natstreet** · 10-07-2010, 02:21 AM

Thanks for the replies, it's very much appreciated (and shows the power of this forum!). I'll test both scripts today. I can re-map the data as single ends anyway because I make no use of the fact that it's paired - this was just the easiest way to increase the sequencing output.

[QUOTE=olivertam;26595I'm afraid that I don't have much experience with paired-ends output, but if you have some Bismark paired-ends output that you don't mind using as a test dataset, I'd be happy to try and make it work.[/QUOTE]

I've put an example bismark paired end output file here. If you have a chance to take a look it would be great.

**olivertam** · 10-07-2010, 06:57 AM

What organism is this?
If possible, could you provide all the chromosome names that you used for mapping, plus their length?

Thanks heaps.

Cheers,
Oliver

**natstreet** · 10-07-2010, 09:18 AM

Sorry - I should have realised some basic info would help!

The data is form Arabidopsis thaliana. I've just added a file called length.tab to the same ftp location that has then chrom lengths.

I've tried the bismark2sam and genome_methylation2bed scripts and both seem to be working perfectly. I used bedGraph2BigWig and the track looks great in GBrowse.

Do you also have any advice about the best way to extract the consensus BS-treated sequence from the bismark results files? I was thinking to use samtools after making a pileup file but I haven't actually tried it yet.

Again, all your help is much appreciated.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News