Originally posted by fkrueger
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by fkrueger View PostIf done correctly either way is fine. I find it easier to merge FastQ up front because that way you can set off a pipeline without having to intervene until you get the final reports.
I've tried a few different ways of doing it but the resulting .zero.cov file still has methylation calls on non-CpG sites. I couldn't figure out how this could happen..
Can you take a look at my pipeline and see if there is anything suspicious?
1. Run adapter trimming and PE bismark(bowtie1) mapping on individual lanes
Code:bismark -n 2 -l 50 --chunkmbs 1024 -X 800 -un --ambiguous --bam $index -1 lane1_read1.fastq -2 lane1_read2.fastq
Code:samtools sort -n lane1.bam lane1_sort
3. Merge all sorted bam files
Code:samtools merge all_lanes.bam lane1_sort.bam lane2_sort.bam ...
Code:deduplicate_bismark -p --bam all_lanes.bam
Code:bismark_methylation_extractor -p --no_overlap --ignore 3 --ignore_r2 3 --bedGraph --counts --zero_based --report all_lanes.deduplicated.bam
Also I'm wondering if you actually have tried to map individual lanes first and them merge the bam files etc? Is it possible that there might be some hidden bug in the process?
Thank you very much!
Comment
-
-
Multithreading the methylation extractor
We have just released a new version of Bismark (v0.13.0), which is available from the Babraham Bioinformatics website. This version adds a couple of useful options and changes some default behavior. Perhaps most notably the methylation extractor may now optionally be run in a multithreaded manner which greatly reduces its processing time (in a preliminary benchmark the elapsed time went down almost linearly when more cores were being used for this process, see below for more details). Here is a list of all changes:
o Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam)
o Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use ~3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes
o Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end alignments) to remove positions that display a methylation call bias on the 3' end of reads
o Methylation Extractor: The option --no_overlap is now the default for paired-end data. One may explicitly choose to include overlapping data with the option '--include_overlap'
o Methylation Extractor: The splitting report will now be written out by default (previously optional --report)
o Methylation Extractor: In paired-end mode, read-pairs which had been skipped because either read was shorter than a specified (very high) value of '--ignore' or '--ignore_r2' will now have the information of the other read extracted if it meets the length criteria (if applicable). Thanks to Andrew Dei Rossi for contributing a patch
o bismark2bedGraph: Fixed the location of the sorting directory which could have failed if an output directory had been specified
Comment
-
Hi,
I ran Bismark on some of my data and tried subsequently to import the SAM files into CLC genomics workbench.
I encountered some problems since the length of the sequences in the reference genome (which I also have in the CLC and which should be coupled with the SAM file that is to be imported) doesn't match with the length of the sequences reported in the SAM file.
I guess this has something to do with the genome preparation step.
Is there some way to avoid the sequence length to be changed in this process?
Comment
-
Hi Anne,
The length of the genome sequences is not changed at any point during the genome preparation, the only thing that does happen is that nucleotides get replaced. The length of the chromosome sequences should be printed to the SAM header (the @SQ lines, eg:
@SQ SN:1 LN:197195432
@SQ SN:10 LN:129993255
@SQ SN:11 LN:121843856
@SQ SN:12 LN:121257530
...).
If these sequences don't match up in CLC, could it be that it is using slighlty different reference sequences and/or genome build?
Comment
-
bismark report
Hi,
I'm wondering if I run several instances of bismark alignment separately and then later merged the bam files together. I'd like to get a report on the final bam file, like the PE report file. It's required to run bismark2report command, so I'm wondering if there is a program to run just that, say, given a bam file and get the necessary info from it as a stand-alone command, just like bismark2report? Currently, I don't see an easy way to get this.. Could it be a good addition to the next release?
Thanks.
Comment
-
low mapping rate in SR reads
Hi,
We recently encounter very poor mapping efficiency for our RRBS libraries sequenced with Hi-Seq Rapid mode 1x51bp. Using the default parameter in Bismark we only get 9-45% mapping efficiency. We do observe that about 10% of the reads had NN bases in first 3 bases of the reads. Would this affect the mapping rate and any suggestion to increase the efficiency?
Comment
-
Originally posted by Wonghe View PostHi,
We recently encounter very poor mapping efficiency for our RRBS libraries sequenced with Hi-Seq Rapid mode 1x51bp. Using the default parameter in Bismark we only get 9-45% mapping efficiency. We do observe that about 10% of the reads had NN bases in first 3 bases of the reads. Would this affect the mapping rate and any suggestion to increase the efficiency?
trim_galore --clip_r1 2 --rrbs file.fastq.gz
More information on the recommended RRBS trimming can be found here.
Comment
-
Originally posted by gene_x View PostHi,
I'm wondering if I run several instances of bismark alignment separately and then later merged the bam files together. I'd like to get a report on the final bam file, like the PE report file. It's required to run bismark2report command, so I'm wondering if there is a program to run just that, say, given a bam file and get the necessary info from it as a stand-alone command, just like bismark2report? Currently, I don't see an easy way to get this.. Could it be a good addition to the next release?
Thanks.
What you could do in the meantime though is sum the numbers from 2 or more files up e.g. in Excel and then provide this 'merged' report file to bismark2report using the option --alignment_report FILE. I am still aiming to implement an option that will perform multiple alignments as well as automatic merging once everything has completed, but I wasn't planning to write something to merge reports at the current time (something that sounds very trivial but might take quite some time )
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
50 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment