Seqanswers Leaderboard Ad

**tongsandiego** · 12-05-2013, 09:13 AM

Does anybody have any clue?

**westerman** · 12-05-2013, 12:40 PM

No idea.

Ignore the following, I just left it in for historical purposes and to remind myself how stupid I can be ... It is interesting that the MD field is different from the CIGAR field. As per the Bowtie2 manual: The MD field ought to match the CIGAR string. Which it obviously does not. '12T13' vs 26M.

Out of stupidity mode, the rest of my original comment ....

Out of curiosity, and perhaps to help troubleshooting, what does the reference look like at the match position?

**tongsandiego** · 12-05-2013, 12:57 PM

26M in CIGAR string means 26 match or mismatch. So CIGAR string is consistent with MD field.

**westerman** · 12-05-2013, 01:13 PM

Ah, so correct. Must be the end of a long day. I'm getting dangerous in not thinking fast enough. Anyway I am as mystified as you are. If I have time (hah!) I'll try out your command myself and see if 'playing around' reveals anything. Once again thanks for the correction.

**dpryan** · 12-05-2013, 01:20 PM

This is human sequence I take it? I might play around with the bowtie2 source code tomorrow to see why this is happening if no one comes up with the reason beforehand. I imagine that this sort of issue affects more than a few people, especially since even the default settings shouldn't allow this!

**tongsandiego** · 12-05-2013, 02:50 PM

Yes. It is human sequence. And I used hg19

**gringer** · 12-05-2013, 07:50 PM

Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

**jp.** · 12-05-2013, 11:03 PM

Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
I am totally confused, since I didnt touch this field for 1 year.
May somebody like to answer ?
thanks in advance
jp.

Originally posted by gringer View Post

Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

**gringer** · 12-06-2013, 12:50 AM

Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed.

This question should really be posted in a new thread, but given that it's marginally related...

Bowtie2 can handle Ns in the map index and in the reads, and happily align any base at that location. They're not removed, but are probably treated in a similar way to a read with a very low Q score. It may also "correct" a read mapping to a non-N position for the read record in the SAM output.

[FWIW, Bowtie v1 can't handle Ns. I think it will replace Ns with As when doing indexing and alignment]

**dpryan** · 12-06-2013, 02:06 AM

Originally posted by gringer View Post

Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

Yeah, but the mismatch is in the seed region.

**dpryan** · 12-06-2013, 02:10 AM

Originally posted by jp. View Post

Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
I am totally confused, since I didnt touch this field for 1 year.
May somebody like to answer ?
thanks in advance
jp.

See the --np and --n-ceil options for how bowtie2 handles Ns. By default, Ns decrease the alignment score and reads with too many Ns will be skipped altogether. If you have Ns at one end of a read, then you might as well trim them off.

**gringer** · 12-06-2013, 05:34 AM

Originally posted by dpryan View Post

Yeah, but the mismatch is in the seed region.

Bowtie2 seeds across the entire read length:

Bowtie 2 begins by extracting substrings ("seeds") from the read and its reverse complement and aligning them in an ungapped fashion with the help of the FM Index. This is "multiseed alignment" and it is similar to what Bowtie 1 does, except Bowtie 1 attempts to align the entire read this way.

Although now I notice that you've got a 26bp read, and a 22bp seed, so any seed will overlap with the mismatch. Thinking again about jp.'s question, perhaps there is an N (or other ambiguous base) at that position in the reference sequence. Otherwise, yes, very odd.

**dpryan** · 12-06-2013, 05:50 AM

Yeah, if the read were long enough that the mismatch could not be in the seed then that would make total sense. There are no Ns in the reference in that area (the sequence there is "ttaaaggaaccctgagagatatttca"). My guess at the moment is that either the scoring matrix that's fed to al.exactSweep() isn't set properly or the output of that (which contains whether a seed maps with 0, 1, or 2 mismatches) just isn't being dealt with properly. I guess it'd be faster to just email Ben Langmead :P

**dpryan** · 12-06-2013, 07:36 AM

While I haven't traced things completely through the code, I can't see that bowtie2 reliably follows the -N option. It sets it internally and does do some computation dependent upon it, but it seems to not set a read as unalignable if -N 0 is used and there are no perfect seeds (the easiest fix (presumably) would be to just flag a read as unmapped if bestmin > 0 in the multiseedSearchWorker if multseedMms == 0). Either way, this is a bug and should get reported (in fact, I've just done so).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Bowtie2.1.0 gave mismatch with -N=0 option

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News