Seqanswers Leaderboard Ad

**mastal** · 05-25-2014, 10:45 AM

--no-1mm-upfront

Below is an excerpt from the Bowtie manual:

http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#alignment-options

"By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed."

I don't think you can tell Bowtie to find exactly 1 or 2 mismatches,
I think you can only tell it the maximum number of mismatches to allow.

**carolW** · 05-25-2014, 10:55 AM

So are you confirming that --no-1mm-upfront should be used as --no-1mm-upfront 1 or --no-1mm-upfront 2? Or should N and L be used?

Once > 1 time aligned reads are reported by bowtie, how is it possible to separate reads that aligned exactly once from those that aligned > 1 times?

Thanks

**dpryan** · 05-25-2014, 11:09 AM

It's just "--no-1mm-upfront" (it doesn't take an argument).

Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.

**carolW** · 05-25-2014, 11:43 AM

but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

I meant mapping to repetitive regions by > 1 times alignment because in stats report, I get > 50% of > 1 times alignments. So the value of MAPQ is heureustic. In a given interval, how to choose the best?

**mastal** · 05-25-2014, 03:54 PM

Originally posted by carolW View Post

but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

No, -no-1mm-upfront disables bowtie's default behaviour (which is to find alignments with 0 or 1 mismatches).
You can set -N 2 if you want to allow up to 2 mismatches in the seed region.

**carolW** · 05-26-2014, 01:25 AM

When I set -N 2, I get error message:

Error: -N was set to 2, but cannot be set greater than 1
Error: Encountered internal Bowtie 2 exception (#1)

Is there any other parameter that should be set, too?

**dpryan** · 05-26-2014, 01:31 AM

Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).

**carolW** · 05-28-2014, 12:59 AM

Originally posted by dpryan View Post

It's just "--no-1mm-upfront" (it doesn't take an argument).

Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.

How could we judge a threshold as a reasonable? Does it depend of the data? All info is welcome.

**dpryan** · 05-28-2014, 01:31 AM

The MAPQ relates to the probability that the alignment is correct, so just pick a value that you're happy with depending on your downstream applications. For RNAseq, I usually use a theshold of 5, since there's enough coverage that a small amount of error won't have any considerable effect. For bisulfite sequencing data, on the other hand, I've found that a MAPQ threshold of 10 is usually the sweet spot, since there's less coverage per site, so one can't accept as much error. For variant calling, many of the callers utilize MAPQ and Phred scores in their call algorithms, so you may either not bother filtering or might just remove the highly unreliable alignments, which for bowtie2 are those with MAPQ of 0 or 1.

If you're looking for some objectively perfect filtering algorithm there is none, it's just a question of how much error your requirements can accept.

**carolW** · 05-28-2014, 06:08 AM

so it seems to be easy with my data as I have 0, 1, 42. 0 must corresponds to 0 time alignment as there is u in the strand column. 1 must be ambigous or aligned > 1 time and 42 unambigous, or aligned exactly once.

**dpryan** · 05-28-2014, 06:17 AM

Yeah, life is easy when you have just 3 values. A value of 42 is given when there's a perfect match and there's no valid next-best alignment. If you played with --score-min then you'd eventually get a larger variety of MAPQ scores, though that'd just overcomplicate your life

**dpryan** · 05-28-2014, 06:22 AM

BTW, there are actually 5 ways in which bowtie2 will yield a MAPQ of 0, only one of which is due to a read not being mapped (it's an unreliable alignment in any case). It's actually possible to have a "unique" alignment with a MAPQ of 0, assuming the definition of "unique" is having only one valid alignment given the --score-min and penalty settings.

**Lv Ray** · 06-30-2014, 09:49 PM

agree with you

**gongjing** · 12-23-2014, 12:16 AM

Originally posted by dpryan View Post

Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).

so, what is the right way to set the overall permitted mismatches while mapping to the reference genome index with bowtie2? looking forward to your answer!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

bowtie number of mismatches and multiple aligned reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News