Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

fkrueger replied

06-13-2014, 11:26 PM
Hi Reco,
RRBS reads should start (and end) at MspI restriction sites in the genome. The read distribution is therefore governed by the length of the genomic MspI fragments, and whether reads overlap or not depends on the read length you used. If you would generate an annotation track of all theoretical MspI fragments in the genome (which are typically in the size range of 70-220bp) you should see that all reads line up nicely with these fragments. RRBS is very different to 'normal' sequencing, and you wouldn't necessarily expect reads to form contigs.
Leave a comment:
reco replied

06-13-2014, 04:29 PM
low coverage of mapping of reads from RRBS library

The problem is with low coverage (NOT a significant overlapping between the single-end reads) with RRBS library. Bismark was run with the options -N 1 -L 20 --bowtie2 with an outcome of 40% mapping efficiency (Uniquely mapped) and about 25% Ambiguous reads in all the 7 samples in the RRBS Experiment. Adapters were removed before the mapping but it turned out to be that mapping efficiency was the same even without removal.

In all locations that were checked there is often multiple copies of a read aligning at a location but not a overlap between different reads (such as formation of a contig). So the mapping seems to be erroneous somehow.

Thanks.

Last edited by reco; 06-13-2014, 04:32 PM.
Leave a comment:
fkrueger replied

06-11-2014, 11:46 AM
Originally posted by reco View Post

Please set the number of multiseed mismatches for Bowtie 2 with '-N <int>' (where <int> can be 0 or 1

As the error message hinted already, -N can only be 0 or 1, but not 3 (nor 5...). Cheers, Felix

Last edited by fkrueger; 06-13-2014, 11:27 PM. Reason: The closing bracket was in fact there, so not a bug...
Leave a comment:
reco replied

06-11-2014, 10:32 AM
I seem to be having a rare problem with bismark; it might not be accepting the Bowtie2 option, the multiseed alignment mismatch Number, greater than 1 (one). Following is the command line run and the reported error,

bismark -L 10 -N 3 -p 4 --un --non_bs_mm --ambiguous --bowtie2 Sequence/Chromosomes/ SRR001725.fastq -o Cleandata/Output/OutputL15

ERROR reported : Please set the number of multiseed mismatches for Bowtie 2 with '-N <int>' (where <int> can be 0 or 1

Thanks.
Leave a comment:
fkrueger replied

06-11-2014, 01:31 AM
Even if you wanted to use the --no-mixed option you would still have to supply paired-end files that have the same number of sequences in Read 1 and Read 2. Bowtie 2 would then report singleton alignments if a read pair fails to align in paired-end mode. This is however not supported by the current way Bismark works, but you could run Bismark in paired-end mode first, write out unmapped reads using --unmapped and then run it again in SE mode for R1 (default) and R2 (--pbat mode) individually if you really wanted to.

First you need to get your paired-end files fixed though. The easiest way would be to use Trim Galore (with --paired) or another adapter/quality trimmer that is paired-end aware and does not leave orphan singleton reads in the FastQ file. Cheers, Felix
Leave a comment:
sunggong replied

06-11-2014, 01:21 AM
how to disable '--no-mixed'

Hi,
Is it possible to disable '--no-mixed' option? I wanted to rescue some short reads filtered in previous bismark run, but not all reads are paired. With '--no-mixed' on, bowtie2 spits following errors:
Error, fewer reads in file specified with -1 than in file specified with -2
terminate called after throwing an instance of 'int'
(ERR): bowtie2-align died with signal 6 (ABRT)

If '--no-mixed' cannot be disabled, I might need to remove those problematic orphan reads.

Thanks.
Leave a comment:
blakeoft replied

06-09-2014, 06:01 AM
With the help of Felix, I determined that the problem had to do with the way I saved the test dataset. To anyone who also wants to use the test data, I recommend right clicking the link to the test dataset, and selecting "Save Link As..." instead of opening the link, selecting all text and pasting it into a word processor.

Thank you Felix!
Leave a comment:
fkrueger replied

06-05-2014, 06:27 AM
Originally posted by blakeoft View Post

I've downloaded the test dataset from the Bismark homepage and tried aligning the reads. The mapping efficiency is 0%. Here's my command line:

Code:

bismark -n 1 -l 50 /home/user/genome/hg19 test_dataset.fastq

Any idea why none of the reads mapped?

Can you please link or send me the text you see on your screen to [email protected]?

Cheers, Felix
Leave a comment:
blakeoft replied

06-05-2014, 06:25 AM
I've downloaded the test dataset from the Bismark homepage and tried aligning the reads. The mapping efficiency is 0%. Here's my command line:

Code:

bismark -n 1 -l 50 /home/user/genome/hg19 test_dataset.fastq

Any idea why none of the reads mapped?
Leave a comment:
fkrueger replied

06-05-2014, 02:12 AM
Hi Andrew,

thanks for this, it looks alright to me so I already changed it in the current development version. I wanted to change a few things regarding the NM, MD and XX tags in the SAM output, but once this is done I'll put out a new release.

Cheers, Felix
Leave a comment:
adeirossi replied

06-04-2014, 05:01 PM
Hi Felix,

On the topic of bismark_methylation_extractor, I have run into an error when I try to run it on BAM files with very few reads:

$ perl bismark_methylation_extractor --paired-end --no_overlap --ignore 2 --ignore_r2 33 --output $outDir --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --gzip --bedGraph --buffer_size 4G 2_267_5_CTTGTA_021.bam
*** Bismark methylation extractor version v0.12.2 ***

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark paired-end SAM format specified (default)
First 2 bp will be disregarded when processing the methylation call string of Read 1
First 33 bp will be disregarded when processing the methylation call string of Read 2
Output path specified as: meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/

...

Processed 8 lines from meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/2_267_5_CTTGTA_021.bam in total
Total number of methylation call strings processed: 16

Determining maximum read lengths for M-Bias plots
Maximum read length of Read 1: 73
Maximum read length of Read 2: 0

No data sets or points at /mnt/ngs/analysis/software/bismark/bismark_v0.12.2/bismark_methylation_extractor line 649, <IN> line 20.

The script dies when plot() fails to generate a read 2 M-bias plot. Since the absence of M-bias plots does not particularly bother me, I came up with this patch (also see attached) so that plot() failure causes a warning instead of a fatal error:

Code:

--- a/bismark_v0.12.2/bismark_methylation_extractor +++ b/bismark_methylation_extractor-0.12.2-mod @@ -558,11 +558,14 @@ sub produce_mbias_plots{ $graph1->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls'); - my $gd1 = $graph1->plot(\@mbias_read1) or die $graph1->error; + if (my $gd1 = $graph1->plot(\@mbias_read1)) { + open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n"; + binmode MBIAS_G1; + print MBIAS_G1 $gd1->png; + } else { + warn "WARNING: Cannot generate read 1 M-bias plot: " . $graph1->error; + } - open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n"; - binmode MBIAS_G1; - print MBIAS_G1 $gd1->png; } if ($paired){ @@ -646,11 +649,13 @@ sub produce_mbias_plots{ ) or die $graph2->error; $graph2->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls'); - my $gd2 = $graph2->plot(\@mbias_read2) or die $graph2->error; - - open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n"; - binmode MBIAS_G2; - print MBIAS_G2 $gd2->png; + if (my $gd2 = $graph2->plot(\@mbias_read2)) { + open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n"; + binmode MBIAS_G2; + print MBIAS_G2 $gd2->png; + } else { + warn "WARNING: Cannot generate read 2 M-bias plot: " . $graph2->error; + } } }

There may be a more elegant way to avoid the error, but this patch at least allowed the extractor to get past the plotting step and generate bedGraph output, so I thought I'd share. I'd be curious to hear your thoughts.

Thanks,
Andrew
Attached Files

bismark_methylation_extractor-0.12.2-mod.pl (182.2 KB, 30 views)
Leave a comment:
blancha replied

06-02-2014, 06:52 AM
Hi Felix,

Looks great. Thanks.
I'll try it out.

Thanks.

Alexis
Leave a comment:
fkrueger replied

06-02-2014, 05:01 AM
Originally posted by blancha View Post

Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.

I have just changed the '--zero_based' option of the methylation extractor to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard. The cytosine report only writes out a single position (as it a single cytosine), which uses 1-based coords by default but can be changed to 0-based if desired. Please find the files attached, I hope this is what you were after.
Attached Files

zero-based-half-open.zip (40.0 KB, 31 views)
Leave a comment:
fkrueger replied

05-30-2014, 11:38 AM
Hi blancha, that sounds like a sensible suggestion, I'll look into this when I am back from my holidays.
Leave a comment:
blancha replied

05-30-2014, 10:13 AM
Hi,

Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

That way, I could do a downstream analysis with bedtools directly.
After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

Thank you.
Leave a comment:

Previous 1 10 17 18 19 20 21 22 23 30 34 template Next

Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 216 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News