Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

gene_x replied

11-17-2014, 10:27 PM
bismark report

Hi,

I'm wondering if I run several instances of bismark alignment separately and then later merged the bam files together. I'd like to get a report on the final bam file, like the PE report file. It's required to run bismark2report command, so I'm wondering if there is a program to run just that, say, given a bam file and get the necessary info from it as a stand-alone command, just like bismark2report? Currently, I don't see an easy way to get this.. Could it be a good addition to the next release?

Thanks.
Leave a comment:
fkrueger replied

10-23-2014, 06:29 AM
That's a perfectly fine answer as far as I am concerned, doesn't require me to fix anything :P Good luck!
Leave a comment:
AnneH replied

10-23-2014, 06:09 AM
Hi Felix,

now that is embarassing, it's true, I used a slightly different reference genome...
My bad!

Thanks for the fast reply though!
Leave a comment:
fkrueger replied

10-23-2014, 02:48 AM
Hi Anne,

The length of the genome sequences is not changed at any point during the genome preparation, the only thing that does happen is that nucleotides get replaced. The length of the chromosome sequences should be printed to the SAM header (the @SQ lines, eg:
@SQ SN:1 LN:197195432
@SQ SN:10 LN:129993255
@SQ SN:11 LN:121843856
@SQ SN:12 LN:121257530
...).

If these sequences don't match up in CLC, could it be that it is using slighlty different reference sequences and/or genome build?
Leave a comment:
AnneH replied

10-23-2014, 02:32 AM
Hi,

I ran Bismark on some of my data and tried subsequently to import the SAM files into CLC genomics workbench.
I encountered some problems since the length of the sequences in the reference genome (which I also have in the CLC and which should be coupled with the SAM file that is to be imported) doesn't match with the length of the sequences reported in the SAM file.

I guess this has something to do with the genome preparation step.
Is there some way to avoid the sequence length to be changed in this process?
Leave a comment:
fkrueger replied

10-01-2014, 04:42 AM
Multithreading the methylation extractor

We have just released a new version of Bismark (v0.13.0), which is available from the Babraham Bioinformatics website. This version adds a couple of useful options and changes some default behavior. Perhaps most notably the methylation extractor may now optionally be run in a multithreaded manner which greatly reduces its processing time (in a preliminary benchmark the elapsed time went down almost linearly when more cores were being used for this process, see below for more details). Here is a list of all changes:

o Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam)

o Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use ~3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes

o Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end alignments) to remove positions that display a methylation call bias on the 3' end of reads

o Methylation Extractor: The option --no_overlap is now the default for paired-end data. One may explicitly choose to include overlapping data with the option '--include_overlap'

o Methylation Extractor: The splitting report will now be written out by default (previously optional --report)

o Methylation Extractor: In paired-end mode, read-pairs which had been skipped because either read was shorter than a specified (very high) value of '--ignore' or '--ignore_r2' will now have the information of the other read extracted if it meets the length criteria (if applicable). Thanks to Andrew Dei Rossi for contributing a patch

o bismark2bedGraph: Fixed the location of the sorting directory which could have failed if an output directory had been specified
Leave a comment:
gene_x replied

09-15-2014, 10:13 AM
Originally posted by dpryan View Post

You can probably skip the sorting/merging and just use "samtools cat". The alignments will then be in the same order as they would have had you merged prior to alignment and this will be significantly faster.

Good to know that. Thanks!
Leave a comment:
dpryan replied

09-15-2014, 10:09 AM
You can probably skip the sorting/merging and just use "samtools cat". The alignments will then be in the same order as they would have had you merged prior to alignment and this will be significantly faster.
Leave a comment:
gene_x replied

09-15-2014, 07:38 AM
Originally posted by fkrueger View Post

If done correctly either way is fine. I find it easier to merge FastQ up front because that way you can set off a pipeline without having to intervene until you get the final reports.

Hi, Felix,

I've tried a few different ways of doing it but the resulting .zero.cov file still has methylation calls on non-CpG sites. I couldn't figure out how this could happen..

Can you take a look at my pipeline and see if there is anything suspicious?

1. Run adapter trimming and PE bismark(bowtie1) mapping on individual lanes

Code:

bismark -n 2 -l 50 --chunkmbs 1024 -X 800 -un --ambiguous --bam $index -1 lane1_read1.fastq -2 lane1_read2.fastq

2. Sort bam files based on read name

Code:

samtools sort -n lane1.bam lane1_sort

Do step 1 and step 2 for all lanes.

3. Merge all sorted bam files

Code:

samtools merge all_lanes.bam lane1_sort.bam lane2_sort.bam ...

4. remove duplication

Code:

deduplicate_bismark -p --bam all_lanes.bam

5. extract methylation calls

Code:

bismark_methylation_extractor -p --no_overlap --ignore 3 --ignore_r2 3 --bedGraph --counts --zero_based --report all_lanes.deduplicated.bam

Does this look right to you?

Also I'm wondering if you actually have tried to map individual lanes first and them merge the bam files etc? Is it possible that there might be some hidden bug in the process?

Thank you very much!
Leave a comment:
jnhutchinson replied

09-15-2014, 06:37 AM
Originally posted by fkrueger View Post

Hi John,
To see if it is the difference of the captured regions I would suggest looking at non-CG context only in CpG islands. This could be done with a subset of your reads (maybe methylation calls from some 10M reads), import them into SeqMonk, design probes over CGIs and then look at the average methylation (shouldn't take more than 5 mins to find out). Alternatively you could try to look at overall methylation levels on the mitochondrium which is normally very lowly methylated, I am not sure however how well the MT is covered in an RRBS setting... Basically whatever lowest average methylation level you find in any larger aggregate of regions can be considered the upper limit of bisulfite conversion error.

Thanks Felix, I'll give those a try. Our RRBS protocol is modified to (ideally) capture non-CpG island regions so I'll look into the chrM methods first.
Leave a comment:
fkrueger replied

09-13-2014, 02:20 AM
If done correctly either way is fine. I find it easier to merge FastQ up front because that way you can set off a pipeline without having to intervene until you get the final reports.
Leave a comment:
gene_x replied

09-12-2014, 12:51 PM
Originally posted by fkrueger View Post

Hi gene_x,

not quite sure what is going here or which genome you are aligning your reads to, but for at least human or mouse chromosome 2 the first couple of megabases are masked by Ns, and there is no way that Bismark would map any reads to these sequences or extract methylation calls from it....

Right, there should not be methylation calls on those regions.

A related question, if you have multiple lanes of data (fastq), do you run alignment on each individual lane first and them merge the resulting bam file or do you merge all fastq first and then do a big alignment on the merged fast file?
Leave a comment:
fkrueger replied

09-12-2014, 11:30 AM
Hi John,
To see if it is the difference of the captured regions I would suggest looking at non-CG context only in CpG islands. This could be done with a subset of your reads (maybe methylation calls from some 10M reads), import them into SeqMonk, design probes over CGIs and then look at the average methylation (shouldn't take more than 5 mins to find out). Alternatively you could try to look at overall methylation levels on the mitochondrium which is normally very lowly methylated, I am not sure however how well the MT is covered in an RRBS setting... Basically whatever lowest average methylation level you find in any larger aggregate of regions can be considered the upper limit of bisulfite conversion error.
Leave a comment:
jnhutchinson replied

09-12-2014, 07:33 AM
Using fill-in position to determine bisulfite conversion efficiency?

Has anyone here actually tried this?

I'm seeing some strange RRBS results with high non-CpG methylation levels (~4%) in my samples. I suspect it's due to poor(er) conversion, but my client thinks it's due to the non-standard areas we selected for RRBS (larger fragments targetted to non-CpG island areas). I could use a more objective measure of conversion to help decide the issue.

Would appreciate any and all pointers/scripts to get this to work.

Best,
John
Leave a comment:
fkrueger replied

09-12-2014, 05:02 AM
Hi gene_x,

not quite sure what is going here or which genome you are aligning your reads to, but for at least human or mouse chromosome 2 the first couple of megabases are masked by Ns, and there is no way that Bismark would map any reads to these sequences or extract methylation calls from it....
Leave a comment:

Previous 1 5 12 13 14 15 16 17 18 25 34 template Next

Best Practices for Single-Cell Sequencing Analysis

by seqadmin

While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
- Channel: Articles
06-06-2024, 07:15 AM
Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, Yesterday, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News