Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

fkrueger replied

03-05-2015, 02:52 AM
Originally posted by Dipro View Post

Hi Felix,

I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

Thanks.

Hi Dipro,

This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).

A couple of things about the command you used:

bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder

'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?

-s: not necessary (will be determined automatically)
-o /requires/path/to/output/folder
--samtools_path /requires/path/to/samtools/executable
--counts: not necessary (used by default)
--remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
--buffer_size: requires input, e.g. 10G
--genome_folder /requires/path/to/genome/folder

input file is required

If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix
Leave a comment:
Dipro replied

03-05-2015, 01:05 AM
Hi Felix,

I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

Thanks.
Leave a comment:
fkrueger replied

03-02-2015, 10:15 AM
The default mode would pick the first alignment per position, while representative would always pick the most highly amplified and thus most biased one.
Leave a comment:
chxu02 replied

03-02-2015, 10:11 AM
same as what I understand. That's why I chose to run it in --representative mode. Not sure if there would be any bias by random pick.
Leave a comment:
fkrueger replied

03-02-2015, 10:05 AM
The default command would pick a random one for that location, the representative command would take the most highly amplified one. Both would only leave a single alignment for the position.
Leave a comment:
chxu02 replied

03-02-2015, 10:00 AM
If two fragments have the same ends but distinct methylation state, would they be regarded as duplicates by this command?
Leave a comment:
fkrueger replied

03-02-2015, 09:46 AM
Yes that might be the case since in representative mode it first slurps the entire file into memory.

In any case you should not be using --representative anyway unless you want to find the most highly represented PCR artefact in your data. Maybe I should simply remove it as an option because people keep getting it wrong. I would suggest rerunning it in the default mode.
Leave a comment:
chxu02 replied

03-02-2015, 09:11 AM
Hi Felix,
I'm sorry for keeping bothering you. One more obstacle. I'm running

Code:

deduplicate_bismark -p --representative XXX.sam

I tried either the SAM(~40G) generated by bismark, or the sorted BAM generated by samtools. In both cases, the "skipping SAM header lines" was quickly printed but nothing happened after that, even after 24 hours' running. My PC has 32G ram and 8 cores. Is it insufficient for running deduplicate_bismark in representative mode?
Leave a comment:
fkrueger replied

02-25-2015, 01:41 PM
Originally posted by chxu02 View Post

Hi Felix,
PS: Does OT necessarily mean + (Watson) strand?

Yes, that's right.
Leave a comment:
fkrueger replied

02-25-2015, 01:40 PM
If you limit the input to CpG context it should be fairly quick, so just run bismark2bedGraph CpG*, followed by coverage2cytosine on the .cov file. You could then either use a clever awk command or a tiny script like the one attached to filter out only reads that were covered.
Attached Files

filter_covered_positions.pl (824 Bytes, 31 views)
Leave a comment:
chxu02 replied

02-25-2015, 01:00 PM
Hi Felix,
Is there a way to segregate cytosines into those on + strand and on - strand? The "-/+" in bismark methylation extractor output files specifies methylation state instead of strand info. I think generating that optional genome-wide cytocine report and then grepping can do this job. But that will generate tons of data... Any idea?

PS: Does OT necessarily mean + (Watson) strand?

Last edited by chxu02; 02-25-2015, 01:09 PM.
Leave a comment:
fkrueger replied

02-10-2015, 01:36 AM
Hi kentawan,
unless you are interested in looking at events for top and bottom strand separately you can indeed just merge the two outputs. You can then use this output to find DMRs.

Some tools even go one step further and recommend that you merge the top and bottom strand information of a CpG dinucleotide. I you wanted to do this you could use the option --merge_CpG of the coverage2cytosine script to do this (version 0.13.1 required; look at the latest release notes here).

Regarding the question of whether 56.6% mapping efficiency is 'normal', well this difficult to answer as it depends very much on what you have done. Among the factors that play a role are: genome used (repeat content), read length, rigorous adapter/quality trimming, single-end, paired-end, library strategy (directional, PBAT, enrichment, shotgun), contaminations etc. As a guideline for 100bp single-end shotgun reads you would probably expect efficiencies of ~70-75% against the mouse and ~80-85% against the human genome (in Bowtie2 mode).
Leave a comment:
kentawan replied

02-09-2015, 07:43 PM
Hi All,

I have gotten my final methyl_extractor output files based on my bismark output files which have a mapping efficiency of 56.6% (is this normal?).

Right now I am stuck on the research relevancy of OB and OT CpG Mehylation. If I just want to search for DMRs, can I merge both OB and OT together to make a total CpG file?

many thanks in advance.
Leave a comment:
jeni replied

01-29-2015, 11:48 PM
Thanks....
Leave a comment:
fkrueger replied

01-29-2015, 09:35 AM
I would certainly interpret it that way, yes. You could potentially also look at spike in controls, but since 0.8% is you lowest value the efficiency must have been >= 99.1%.
Leave a comment:

Previous 1 3 10 11 12 13 14 15 16 23 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News