I have a RNA data set with a read lenght of 76 bp. I want to allow for more mismatches when aligning in BWA. How many mismatches does BWA allow with default setting and which parameter(s) should I change if I want to allow e.g. the mismatch number to be twice as high?? I have been playing around with aln -n, -l and -M, without any success.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
are you aligning against a transcript database? if not, you might consider using a splice aware aligner like tophat or star:
Comment
-
if you are certain that BWA is your only option ..
the parameters are pretty clear:
Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
-o INT maximum number or fraction of gap opens [1]
-e INT maximum number of gap extensions, -1 for disabling long gaps [-1]
-i INT do not put an indel within INT bp towards the ends [5]
-d INT maximum occurrences for extending a long deletion [10]
-l INT seed length [32]
-k INT maximum differences in the seed [2]
-M INT mismatch penalty [3]
-O INT gap open penalty [11]
-E INT gap extension penalty [4]
-L log-scaled gap penalty for long deletions
as far as i understand it is not possible to have less reads aligned allowing for more mismatches (-n).
Comment
-
Thanks volks. Yes, I am almost 100 percent sure that BWA is my only option. However, I am really a newbie to BWA, so I'm not sure that I understand your post. Most of the parameter settings, that you list, are default, right?
E.g. -n is 0.04 by default, and I thought that this parameter was one of the parameters that I should change, when allowing BWA to align with more mismatches? Sorry - but can you explain me again which parameters are default and which parameters I should change?
Comment
-
defaults are given in brackets [].
for starters i would disable gapped alignment (-o 0), keep the seed at length and two mismatches (-l 32, -k 2) and try various different overall mismatches (e.g. -n 3 to 6). higher -n should give you more aligned reads.
Comment
-
Ok, thanks. I will try to use the guidelines that you have given me.
So I should concentrate on changing -n (the one that is set to 0.04 as default)? I will try to set it between 3 and 6. How should this parameter be set if I want to allow e.g. twice as many mismatches per read compared to default?
I have read somewhere that it is a good a idea to also disable seeding by setting -l (10000) when allowing more mismatches - but I don't know if I should do this?
Comment
-
Hi, Karenj,
I did some test.
First thing, if you don't give any parameter to adjust, then:
Default value for n, which you saw at the beginning of output:
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
My data is 83bp thus n = 4, if I run with n = 8 or n = 16, I can see more reads mapped.
Now -l changes the seed length, seems doesn't work, it runs 100 times slower, and map less, -k change the mismatch within seed, giving a large number doesn't work either.
There are many more parameters you can change e.g. -o, -e, -i, -d, -M, -O, -E, the point is you do need understanding of it.
But the point of BWA is to align very fast with low error reads, if you adjust any of those listed above, it might align some hard reads, but the run time is significant LOOOOOOOONGER. Which you might better just use BWA to align first round and use another tool to align those unmapped, (like many re-aligner do).
Comment
Latest Articles
Collapse
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:25 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 06:25 AM
|
||
Started by seqadmin, Yesterday, 01:02 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:02 PM
|
||
Started by seqadmin, 09-18-2024, 06:39 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-18-2024, 06:39 AM
|
||
Started by seqadmin, 09-11-2024, 02:44 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-11-2024, 02:44 PM
|
Comment