Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2.1.0 gave mismatch with -N=0 option

    Dear all,

    I am using Bowtie2.1.0 to analyze the reads from Illumina machines. The parameter I used are --end-to-end -D 5 -R 1 -N 0 -L 22 -i S,0,2.50.

    One of my reads is TTAAAGGAACCCAGAGAGATATTTCA, and Bowtie gave me
    HWI-ST1225 0 chr2 227981508 24 26M * 0 0 TTAAAGGAACCCAGAGAGATATTTCA BBBFFFFFFFFFFIIFFFIIIIIFII AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:12T13 YT:Z:UU

    Since I set max number of mismatches in seed alignment as 0 (-N 0) and length of seed is 22 (-L 22), I didn’t expect any mismatch within 22 bases. However, Bowtie gave me 12T13. Did I misinterpreted the Bowtie options?

    Thank you so much.

  • #2
    Does anybody have any clue?

    Comment


    • #3
      No idea.

      Ignore the following, I just left it in for historical purposes and to remind myself how stupid I can be ... It is interesting that the MD field is different from the CIGAR field. As per the Bowtie2 manual: The MD field ought to match the CIGAR string. Which it obviously does not. '12T13' vs 26M.

      Out of stupidity mode, the rest of my original comment ....

      Out of curiosity, and perhaps to help troubleshooting, what does the reference look like at the match position?
      Last edited by westerman; 12-05-2013, 01:17 PM. Reason: Stupidity

      Comment


      • #4
        26M in CIGAR string means 26 match or mismatch. So CIGAR string is consistent with MD field.

        Comment


        • #5
          Ah, so correct. Must be the end of a long day. I'm getting dangerous in not thinking fast enough. Anyway I am as mystified as you are. If I have time (hah!) I'll try out your command myself and see if 'playing around' reveals anything. Once again thanks for the correction.

          Comment


          • #6
            This is human sequence I take it? I might play around with the bowtie2 source code tomorrow to see why this is happening if no one comes up with the reason beforehand. I imagine that this sort of issue affects more than a few people, especially since even the default settings shouldn't allow this!

            Comment


            • #7
              Yes. It is human sequence. And I used hg19

              Comment


              • #8
                Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

                Comment


                • #9
                  Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
                  I am totally confused, since I didnt touch this field for 1 year.
                  May somebody like to answer ?
                  thanks in advance
                  jp.

                  Originally posted by gringer View Post
                  Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

                  Comment


                  • #10
                    Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed.
                    This question should really be posted in a new thread, but given that it's marginally related...

                    Bowtie2 can handle Ns in the map index and in the reads, and happily align any base at that location. They're not removed, but are probably treated in a similar way to a read with a very low Q score. It may also "correct" a read mapping to a non-N position for the read record in the SAM output.

                    [FWIW, Bowtie v1 can't handle Ns. I think it will replace Ns with As when doing indexing and alignment]

                    Comment


                    • #11
                      Originally posted by gringer View Post
                      Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.
                      Yeah, but the mismatch is in the seed region.

                      Comment


                      • #12
                        Originally posted by jp. View Post
                        Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
                        I am totally confused, since I didnt touch this field for 1 year.
                        May somebody like to answer ?
                        thanks in advance
                        jp.
                        See the --np and --n-ceil options for how bowtie2 handles Ns. By default, Ns decrease the alignment score and reads with too many Ns will be skipped altogether. If you have Ns at one end of a read, then you might as well trim them off.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          Yeah, but the mismatch is in the seed region.
                          Bowtie2 seeds across the entire read length:

                          Bowtie 2 begins by extracting substrings ("seeds") from the read and its reverse complement and aligning them in an ungapped fashion with the help of the FM Index. This is "multiseed alignment" and it is similar to what Bowtie 1 does, except Bowtie 1 attempts to align the entire read this way.
                          Although now I notice that you've got a 26bp read, and a 22bp seed, so any seed will overlap with the mismatch. Thinking again about jp.'s question, perhaps there is an N (or other ambiguous base) at that position in the reference sequence. Otherwise, yes, very odd.

                          Comment


                          • #14
                            Yeah, if the read were long enough that the mismatch could not be in the seed then that would make total sense. There are no Ns in the reference in that area (the sequence there is "ttaaaggaaccctgagagatatttca"). My guess at the moment is that either the scoring matrix that's fed to al.exactSweep() isn't set properly or the output of that (which contains whether a seed maps with 0, 1, or 2 mismatches) just isn't being dealt with properly. I guess it'd be faster to just email Ben Langmead :P

                            Comment


                            • #15
                              While I haven't traced things completely through the code, I can't see that bowtie2 reliably follows the -N option. It sets it internally and does do some computation dependent upon it, but it seems to not set a read as unalignable if -N 0 is used and there are no perfect seeds (the easiest fix (presumably) would be to just flag a read as unmapped if bestmin > 0 in the multiseedSearchWorker if multseedMms == 0). Either way, this is a bug and should get reported (in fact, I've just done so).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 06-21-2024, 07:49 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-20-2024, 07:23 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-17-2024, 06:54 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-14-2024, 07:24 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X