This may become moot as bowtie2 can now support large genomes so separate alignments for different strands of the genome could be a thing of the past and simplify bisulfite alignment quite a bit! Of course this will take some work upgrading the various bisulfite alignment packages to be compatible with the new bowtie.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you need MAPQ values then you have a number of different options, including:
- BSmooth: Like bismark, it's a perl wrapper around bowtie2. It's always been slower than bismark when I've used it. It simply passes through the MAPQ of the best alignment, rather than recalculating it.
- bwa-meth: A python wrapper around bwa mem that happens to work like frozenlyse just mentioned (i.e., it concatenates two versions of the converted genome together and then aligns against that in one go). Only supports directional and paired-end libraries.
- Bison: A C wrapper around bowtie2 that otherwise functions similar to bismark, but recalculates MAPQ scores according to alignments to different strands using the same MAPQ algorithm in bowtie2. The setup can be non-trivial, so I wouldn't bother for a small one-off dataset (you might try bwa-meth in that case, unless you have a non-directional library).
There are a number of others out there in addition to these, of course.Last edited by dpryan; 03-13-2014, 01:52 AM.
Comment
-
Devon, can the calculation of MAPQ be summarized in a straight forward formula (described in detail in the Bowtie2 documentation?) or does one have to delve into the source code to find it? It shouldn't be hard to implement since all the mapping values are available and being processed anyway...
Comment
-
You have to go through the source code to find it, unfortunately. It would seem straight-forward enough to add to bismark (more or less, there are some oddities so implementing it in perl might produce ever so slightly different results, though they'd be close enough). You can find a C version of the algorithm here (starting on line 109). It's never been clear to me how they came up with that MAPQ algorithm, so don't ask me to explain why it is the way it is
Comment
-
glad to see bismark outputting the sorted header, that does simplify things for downstream analyses.
what filtering to bismark/bison methylation extractors do?
Comment
-
I can't speak to what bismark does at the moment, but bison filters by (1) read/pair MAPQ (2) base phred score and (3) position in a read (to ignore biased regions). There's also some filtering in paired-end reads if the pairs disagree on a methylation call (this is relatively rare).
Edit: I forgot to mention that flagged PCR duplicates are ignored.
Comment
-
Hi Felix,
A seemingly harmless error is raised right at the end of running the methylation extractor with the --mbias_only flag (v0.10.1):
bismark_methylation_extractor -p --mbias_only sampleName.bam
# Various
# Lines
# Of
# Output
Failed to read from methylation extractor output file CpG_OT_sampleName.txt: No such file or directory
As far as I can tell nothing has actually gone wrong and there are no "orphan" files left behind by the methylation extractor.
Cheers,
Pete
Comment
-
We just released a new version of Bismark (v0.11.1) which adds the following functionality/fixes:
o Bismark: The option --pbat now also works for use with Bowtie 2, in both single-end and paired-end mode. The only limitation to this is that it only works with FastQ files and uncompressed temporary files
o Bismark: Changed the order in which the @SQ lines are written out to the SAM/BAM header from random to the same order they are being read in from the genomes folder (or the order of the files in which they occur within a multi-FastA file)
o Bismark: Included a new option '-B/--basename basename' for output files instead of deriving these names from the input file. --basename takes precedence over the option --prefix.
o Bismark: Unmapped or ambiguous files now end in .fq or.fa for FastA or FastQ files, respectively (instead of .txt files)
o Methylation extractor: willl no longer attempt to delete unused files if --mbias_only was speficied
o Methylation extractor: Added a test to see if a file that does not end in .bam is in fact a BAM file, and if this succeeds open the file using Samtools view
Bismark can be downloaded here: www.bioinformatics.babraham.ac.uk/projects/
Comment
-
Hi Felix,
I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.
The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.
Cheers,
AndrewAttached Files
Comment
-
Originally posted by adeirossi View PostHi Felix,
I took a crack at adding MAPQ values to Bismark. Following Devon's suggestion, I tried to apply the same MAPQ calculation used in bowtie2. I only made changes to paired-end mode.
The attached Perl script was modified from bismark-0.11.1. Let me know what you think if you have time and interest.
Cheers,
Andrew
Thanks a lot for having a go at this, I'll try to have a look next week and implement it for the next release. Let's hope this serves as an incentive for me to extend it to single-end mode as well.
Best, Felix
Comment
-
Hi Andrew,
Thanks very much for your initiative with the MAPQ values! I have reviewed your suggestions and extended their implementation to single-end mode (for use with Bowtie 2 only). I have uploaded a tentative version 0.12.1 (attached), would you mind testing it to see if it works as expected?
Cheers, FelixAttached Files
Comment
-
Hi Felix,
Thank you for taking a look and extending the MAPQ value reporting to single-end mode so quickly. I have tested your bismark.pl script in both paired- and single-end modes and it works as expected.
There is, however, one minor issue that does not affect the SAM output. (It was not caused by the recent changes; it happens in v0.11.1 as well.) Occasionally I see warnings like this:
Use of uninitialized value in numeric ne (!=) at ./bismark line 3201, <$__ANONIO__> line 578.
Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 107
The issue occurs in paired-end mode when read 2 maps to the first base of a chromosome (in this case, lambda DNA). When this happens, extract_corresponding_genomic_sequence_paired_ends_bowtie2() returns without defining $methylation_call_params->{$identifier}->{unmodified_genomic_sequence_1}, which is then tested on line 3201, resulting in the "uninitialized value" warning and the printing of position_1 instead of position_2. I added one line to your bismark.pl script (see attached) to fix this behavior; now only (what I assume to be) the intended warning is printed, showing that position_2 is equal to 1:
Chromosomal sequence could not be extracted for SEQUENCER12:47:H7FFHADXX:1:1206:6069:83277_1:N:0:AGATGT J02459.1 1
Can you confirm that this change is an improvement?
Thanks,
AndrewAttached Files
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment