Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

dpryan replied

03-17-2015, 12:22 AM
That's correct. You essentially act as though you have a single-end dataset. If the mapping efficiency jumps to a more reasonable level when doing that, then either the fastq files are out of sync or there's something weird with fastq_2.fq.
Leave a comment:
barbarian replied

03-17-2015, 12:20 AM
oh, do you mean only use the first file, not together with the second file?
Leave a comment:
dpryan replied

03-17-2015, 12:19 AM
"by itself", not "to itself", big difference. This is purely to diagnose the cause of the low mapping efficiency.
Leave a comment:
barbarian replied

03-17-2015, 12:17 AM
If mapping fastq_1.fq to itself, is there any biological meaning behind that? Will the result still represent the actual methylation condition? Thank you.
Leave a comment:
dpryan replied

03-17-2015, 12:13 AM
I just replied to you on Biostars, but producing 1 BAM file from paired-end reads is the appropriate result. The reads from each file are indicated appropriately in the BAM format.

The low mapping efficiency is a different question then. There are a number of likely causes of that, the most common being fastq files that are out of sync. Try mapping fastq_1.fq by itself and see if the mapping efficiency jumps up.
Leave a comment:
barbarian replied

03-16-2015, 05:38 PM
Hello Felix,

Thank you for your response. I have installed samtools. I found another problem. From Biostar, I found it that I should generate 2 different fastq files for paired end reads. So, I use fastq-dump --split files to extract from my SRA. I got 2 files of fastq and seems I have no problem so far (before, I only dump it to 1 files and bismark found duplicate ID error). The only problem is, I only got 1 files of BAM from Bismark. The file name is the same as the first fastq file with bam extension so I assume it should have the bam file with the second file name,But it's only one, for the first fastq. Is it normal or wrong? I use bismark <options> -1 first_1.fq -2 second_2.fq. The result is only first_1.bam.

This is my Final ALignment report :

Sequence pairs analysed in total: 29829521
Number of paired-end alignments with a unique best hit: 4156425
Mapping efficiency: 13.9%
Sequence pairs with no alignments under any condition: 23649277
Sequence pairs did not map uniquely: 2023819
Sequence pairs which were discarded because genomic sequence could not be extracted: 0

Mapping efficiency is really low. What do you think it caused?

For Bismark example data, I got this result:
Final Alignment report
======================
Sequences analysed in total: 10000
Number of alignments with a unique best hit from the different alignments: 4732
Mapping efficiency: 47.3%
Sequences with no alignments under any condition: 4279
Sequences did not map uniquely: 989
Sequences which were discarded because genomic sequence could not be extracted: 0

So I think my human genome reference is not bad.

Last edited by barbarian; 03-16-2015, 05:44 PM.
Leave a comment:
fkrueger replied

03-16-2015, 10:44 AM
Ah good that explains it. As a said a few posts before --gzip was a corner case that wasn't handled properly, so it was not intended that the merging went wrong... If you use the development version I attached in the last post --gzip should be working now.
Leave a comment:
chxu02 replied

03-16-2015, 10:15 AM
Sorry Felix, I'm bad. I checked my running history yesterday and found I used --gzip. But why did it happen though, if the purpose of --gzip is just to zip temp conversion files?
Leave a comment:
fkrueger replied

03-16-2015, 06:20 AM
Originally posted by chxu02 View Post

Hi Felix,

I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

Youyou

Hmm, in case you didn't use --gzip I don't think I quite understand the error you are reporting then. Both running files ending in .fastq or .fastq.gz works fine for me here. Would you mind sending me the entire error message you are seeing as email?

Attached is the latest development version of Bismark which should also understand the option --gzip.
Attached Files

bismark_0.14.1_devel.zip (72.0 KB, 28 views)
Leave a comment:
chxu02 replied

03-16-2015, 05:49 AM
I didn't because the manual says --gzip is for zipping the temp files, not for unzipping the input fastq.gz files. And without --gzip, the running was successful until the end.
Leave a comment:
fkrueger replied

03-16-2015, 05:39 AM
Have you specified --gzip for this run?
Leave a comment:
chxu02 replied

03-16-2015, 05:36 AM
Hi Felix,

I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

Youyou
Leave a comment:
fkrueger replied

03-16-2015, 02:32 AM
Hi Barbarian,

I am afraid the problem you are seeing is indeed caused by the fact that you don't have Samtools installed. In a bid to get multicore processing working in a reasonable time I assumed that everyone is running Samtools already so it is currently not designed to deal with sam.gz files.

Just as a heads-up, the initial release (v0.14.0) also doesn't deal correctly with some corner cases such as the --gzip option for temp files (which is already fixed in the development version), and the -B (basename) option (which is yet to be looked at).

So bottom line: If you install Samtools and don't try to run all cornercases at the same time (--gzip, -B) it should work nicely.
Leave a comment:
barbarian replied

03-15-2015, 05:18 PM
Hello,I use the recent Bismark application. I tried to execute multicore command

bismark --multicore 4 <renome> -n 1 <filename>

The result is 4 split file of sam.gz (I haven't installed samtools). After that, I tried to call the methylation extractor

bismark_methylation_extractor -p --comprehensive *.sam.gz

The parameter *.sam.gz is to select all 4 sam.gz file but this is failed. Which one I should use to call the methylation extractor command? Thank you.

*note :
I notice there is a one .bam file. I don't know how this file generated because I haven't installed samtools, but this bam file is only 16 kb size and it's nothing
And I notice that the alignment process is done assuming it's single end and my dat ais paired end, how can I set the alignment so that it use paired end?

Last edited by barbarian; 03-15-2015, 05:53 PM.
Leave a comment:
fkrueger replied

03-05-2015, 02:34 PM
Bismark finally supporting parallel alignments

We would like to announce that a new version of Bismark (v0.14.0) has just been released. This version adds a parallelization switch to the Bismark alignment step, and also changes a couple of other issues detailed below:

o Bismark: Eventually added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.

If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned...

o Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam

o Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway

o Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly

o deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required

o deduplicate_bismark: Added option --version so that Clusterflow can report a version number

o bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well

o coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running

Even though we have tried out several corner cases this release is still somewhat experimental and we would appreciate any comments! Bismark can be downloaded from the Babraham Bioinformatics website.
Leave a comment:

Previous 1 2 9 10 11 12 13 14 15 22 34 template Next

Best Practices for Single-Cell Sequencing Analysis

by seqadmin

While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
- Channel: Articles
06-06-2024, 07:15 AM
Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, 06-07-2024, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-07-2024, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 21 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 14 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News