If you limit the input to CpG context it should be fairly quick, so just run bismark2bedGraph CpG*, followed by coverage2cytosine on the .cov file. You could then either use a clever awk command or a tiny script like the one attached to filter out only reads that were covered.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Felix,
I'm sorry for keeping bothering you. One more obstacle. I'm running
Code:deduplicate_bismark -p --representative XXX.sam
Comment
-
Yes that might be the case since in representative mode it first slurps the entire file into memory.
In any case you should not be using --representative anyway unless you want to find the most highly represented PCR artefact in your data. Maybe I should simply remove it as an option because people keep getting it wrong. I would suggest rerunning it in the default mode.
Comment
-
Hi Felix,
I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.
Thanks.
Comment
-
Originally posted by Dipro View PostHi Felix,
I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.
Thanks.
This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).
A couple of things about the command you used:
bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder
'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?
-s: not necessary (will be determined automatically)
-o /requires/path/to/output/folder
--samtools_path /requires/path/to/samtools/executable
--counts: not necessary (used by default)
--remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
--buffer_size: requires input, e.g. 10G
--genome_folder /requires/path/to/genome/folder
input file is required
If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix
Comment
-
Bismark finally supporting parallel alignments
We would like to announce that a new version of Bismark (v0.14.0) has just been released. This version adds a parallelization switch to the Bismark alignment step, and also changes a couple of other issues detailed below:
o Bismark: Eventually added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.
If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned...
o Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam
o Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway
o Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly
o deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required
o deduplicate_bismark: Added option --version so that Clusterflow can report a version number
o bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well
o coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running
Even though we have tried out several corner cases this release is still somewhat experimental and we would appreciate any comments! Bismark can be downloaded from the Babraham Bioinformatics website.
Comment
-
Hello,I use the recent Bismark application. I tried to execute multicore command
bismark --multicore 4 <renome> -n 1 <filename>
The result is 4 split file of sam.gz (I haven't installed samtools). After that, I tried to call the methylation extractor
bismark_methylation_extractor -p --comprehensive *.sam.gz
The parameter *.sam.gz is to select all 4 sam.gz file but this is failed. Which one I should use to call the methylation extractor command? Thank you.
*note :
I notice there is a one .bam file. I don't know how this file generated because I haven't installed samtools, but this bam file is only 16 kb size and it's nothing
And I notice that the alignment process is done assuming it's single end and my dat ais paired end, how can I set the alignment so that it use paired end?Last edited by barbarian; 03-15-2015, 05:53 PM.
Comment
-
Hi Barbarian,
I am afraid the problem you are seeing is indeed caused by the fact that you don't have Samtools installed. In a bid to get multicore processing working in a reasonable time I assumed that everyone is running Samtools already so it is currently not designed to deal with sam.gz files.
Just as a heads-up, the initial release (v0.14.0) also doesn't deal correctly with some corner cases such as the --gzip option for temp files (which is already fixed in the development version), and the -B (basename) option (which is yet to be looked at).
So bottom line: If you install Samtools and don't try to run all cornercases at the same time (--gzip, -B) it should work nicely.
Comment
-
Hi Felix,
I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.
Youyou
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment