Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

fkrueger replied

02-01-2016, 03:47 AM
Hi Magdalena,

Code:

bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq

does not work in the way you think it will as it would really only do PE alignments of the overlapping reads. So if you wanted to split this up manually (but why?) then you can run the PE on non-overlapping reads first and the SE on the overlapping reads, and then run a PE and SE methylation extraction separately. If you wanted to you could then merge the data again for the bismark2bedGraph step, just feed in all the CpG* files from both PE and SE mapping.

I am not quite sure however if merging and making things complicated isn’t exactly doing exactly what the methylation extractor is doing anyway: mapping non-overlapping reads and getting the information from both reads, and only getting the information once from overlapping reads because of the --no_verlap option (isn’t this the same as your ‘merghing’ step?)
Leave a comment:
MagdalenaZ replied

02-01-2016, 03:05 AM
Mapping SE and PE reads

Hi,

I have PE reads, some 20% of which overlap. I usually overlap these reads before mapping, so that I have the following files:

Non-overlappling_reads_1.fastq
Non-overlappling_reads_2.fastq
Merged_overlappling_reads.fastq

I would submit those reads to bismark as:

bismark -1 Non-overlappling_reads_1.fastq -2 Non–overlappling_reads_2.fastq Merged_overlappling_reads.fastq

….but then my resulting BAM-file contains both SE and PE reads, so do I use the –p or –s flag on bismark_methylation_extractor ?

Also, from the results it doesn't really look like the Merged_overlappling_reads.fastq actually gets read.

Or I could run bismark AND bismark_methylation_extractor twice,; once for SE and one for PE – but then at what point do I merge the results?
Last option – just not do the overlap, but feed in all PE as is, and use: bismark_methylation_extractor —include_overlap and hope that all will be well. But then I loose the SE reads.

So many options, but hopefully only one optimal solution!
Very grateful for your advice!

Cheers,
Magdalena
Leave a comment:
fkrueger replied

02-01-2016, 01:32 AM
Thanks for reporting this, I have filed an issue on the Bismark GitHub page and will address it as soon as I find some time. Cheers, Felix
Leave a comment:
biocomputer replied

01-31-2016, 06:51 PM
When I try to run bismark2bedGraph from a directory that doesn't directly contain the input file it fails to find the input file, even though I've specified the path to the input file. For example, if I'm in the directory that contains the input file and use this command it works properly (note that I've taken the output from the methylation extractor and split it by chromosome for batch processing but the same thing happens if I use the original unsplit file):

bismark2bedGraph -o ./output/chr10.bg ./CpG_chr10.txt

If I move up a directory and run this command it doesn't work:

bismark2bedGraph -o ./directory/output/chr10.bg ./directory/CpG_chr10.txt

The programs ends with:

Using the following files as Input:
CpG_chr10.txt

Writing bedGraph to file: ./directory/output/chr10.bg.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: ./directory/output/chr10.bg.gz.bismark.cov.gz

Couldn't find file 'CpG_chr10.txt': No such file or directory
Leave a comment:
fkrueger replied

01-09-2016, 05:55 AM
Originally posted by Tlexander View Post

I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.

The error is as the following:

Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
Finished BedGraph conversion ...

Thanks for reporting this, could you please also post the exact command you used when you called the methylation extractor so I can reproduce it more easily?
Leave a comment:
Tlexander replied

01-08-2016, 09:49 AM
Bismark Bug v0.14.4 --remove_spaces in bismark_methylation_extractor.

I think there is some bugs in the option of --remove_spaces in bismark_methylation_extractor.

The error is as the following:

Changed directory to /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/
Now replacing whitespaces in the sequence ID field of the Bismark methylation extractor output /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt prior to bedGraph conversion

Couldn't write to file /media/LTS_33T/SG_LTS33T/monod/haib/methyfreq/CpG_context_ENCFF000LWP_trimmed.fq_bismark_bt2.txt.spaces_removed.txt: No such file or directory
Finished BedGraph conversion ...
Leave a comment:
bmartinez replied

12-14-2015, 04:08 PM
problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

Hi Felix and everyone,

Thanks a lot for your help, Felix, the problem is solved.
Otherwise, I have and additional issue related to the sorting of the CpG_context and Non_CpG_context files in my operating system. I am reporting it in case someone else can be affected and to ask for your opinion.

I get an error when sorting these files (this is done within the script bismark2bedgraph). The line that does the sorting in the script is the following one:

open $ifh, "sort -S $sort_size -T $sort_dir -k3,3V -k4,4n $in |" or die "Input file could not be sorted. $!\n";

I have solved the issue sorting just with "3,3". Otherwise, I am still trying to confirm that this is not affecting the final results. If anyone can give a hint on this, I would greatly appreciate.

Begoña
Leave a comment:
fkrueger replied

12-09-2015, 04:00 AM
It appears that GZIP-compressed input files were streamed directly into the Unix sort command (e.g. when using the option --scaffolds/--gazillion), but sort cannot read compressed files and thus it would produce an empty output. I have opened an issue of Github for that (https://github.com/FelixKrueger/Bismark/issues/9) and fixed the way GZIP compressed files are streamed to sort and it seems to work fine on my end. The latest version can be cloned straight from Github.
Leave a comment:
fkrueger replied

12-08-2015, 12:58 PM
Hi Begona,

It looks like the bismark2bedGraph step is somehow failing, can you check the error logs (or what appears on screen) to see what is going wrong exactly?

The CpG report simply puts the coverage file into genomic context, so if the coverage file is empty then the CpG report will show 0 0 only as well.

I'm happy to look at this in more detail, you can also send me email with the error logs. Best, Felix
Leave a comment:
bmartinez replied

12-08-2015, 12:34 PM
problems with bismark2bedGraph and coverage2cytosine to get methylations extracted

Hi everyone,

I am analysing WGBS data with Bismark v0.14.5. I have trimmed, aligned the data with Bowtie and deduplicated with no issues. However, I am having problems to extract the methylations. My genome is in 47,100 scaffolds. The command I am using is:

bismark_methylation_extractor -p --comprehensive --merge_non_CpG --samtools_path /opt/samtools-0.1.19 --genome_folder /path_to_genome/Bisulfite_Genome_BowtieOne --buffer_size 10G --report --bedGraph --cytosine_report --scaffolds --gzip --multicore 3 -o /path/file.fastq.gz_bismark_pe.deduplicated.sam

I get proper CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz and Non_CpG_context_file.fastq.gz_bismark_pe.deduplicated.txt.gz files, see the first lines for an example:

Bismark methylation extractor version v0.14.4
HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT + scaffold.s31344 999331 Z
HWI-ST539:249:C7BDRACXX:6:1101:1460:1903_1:N:0:GCCAAT - scaffold.s31344 999181 z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT - scaffold.s10570 115561 z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115578 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115590 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115616 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115624 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115639 Z
HWI-ST539:249:C7BDRACXX:6:1101:1586:1973_1:N:0:GCCAAT + scaffold.s10570 115632 Z

However, it does not convert properly into bed files (I get and empty file) and the cytosine reports I get is like this, with no methylation at all (columns 3 and 4 are all 0’s):

scaffold.s00001 45 + 0 0 CG CGC
scaffold.s00001 46 - 0 0 CG CGT
scaffold.s00001 49 + 0 0 CG CGG
scaffold.s00001 50 - 0 0 CG CGT
scaffold.s00001 1095 + 0 0 CG CGT
scaffold.s00001 1096 - 0 0 CG CGG
scaffold.s00001 1481 + 0 0 CG CGC
scaffold.s00001 1482 - 0 0 CG CGG
scaffold.s00001 1560 + 0 0 CG CGC
scaffold.s00001 1561 - 0 0 CG CGG

I have tried to do things step by step, but I get the same result. I have been working on this for more than a week now and I do not find were is the error. Could someone help me with this?

Thanks a lot in advance!

Begoña
Leave a comment:
fkrueger replied

11-09-2015, 12:25 AM
Originally posted by daanum View Post

Hi,

I am unable to run the bismark_genome_preparation step yet.
I get an error "Command not found'.
Any idea? I am trying since yesterday, not sure what am i doing wrong?

I admire your perseverance but you might want to consider doing a basic Linux operation tutorial, I think you might benefit.

Here you've got a couple of options:
1) either you move to the folder containing the Bismark installation and then run ./bismark_genome_preparation (./ prepends the path to the current genome)
2) you can type /path/to/Bismark/bismark_genome_preparation which should work from anywhere.
Leave a comment:
daanum replied

11-09-2015, 12:04 AM
Hi,

I am unable to run the bismark_genome_preparation step yet.
I get an error "Command not found'.
Any idea? I am trying since yesterday, not sure what am i doing wrong?
Leave a comment:
fkrueger replied

11-07-2015, 01:27 PM
Bismark just needs to be extracted as is outlined step by step in the manual (http://www.bioinformatics.babraham.a...User_Guide.pdf). I believe Bowtie 2 also only needs to be unzipped, then either you place the bowtie2 executable in the PATH (just google how to do this), or you specify the path with --path_to_bowtie in Bismark.

All other steps including the genome preparation (

Code:

bismark_genome_preparation [options] <path_to_genome_folder>

) are explained in detail in the manual, this protocol, or this methylation analysis course. Good luck, Felix
Leave a comment:
daanum replied

11-07-2015, 01:12 PM
genome preparation

hi,
I am trying to run bismark genome preparation but unable to do so.
I have bismark v 14.5 unzipped folder on server and have bowtie-2.2.2.6 version unzipped folder and genome files for human grch38- all these three folders in one folder. Do i need to run any installation step for bismark/bowtie before i run genome preparation ?

I am new to methylation analysis so will be great if you could please help.

thanks in advance.
Leave a comment:
fkrueger replied

09-01-2015, 11:16 AM
Oh it seems I need to update the manual because we very recently changed the default aligner to Bowtie 2, and the command in the manual still refers to bowtie1 (if you use --bowtie1 you can use the command as in the manual). I'll have this changed soon, thanks for spotting this.

If you want to run the test dataset just leave out all options and try using the defaults. Best, Felix
Leave a comment:

Previous 1 5 6 7 8 9 10 11 18 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, Today, 08:06 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 14 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News