Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie number of mismatches and multiple aligned reads

    Hi
    Should the --no-1mm-upfront parameter be used with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

    Should 1 as cutoff for MAPQ be used to discriminate the exactly 1 time aligned vs >1 time aligned reads?

    Look forward to your reply,

    Carol

  • #2
    --no-1mm-upfront

    Below is an excerpt from the Bowtie manual:



    "By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed."


    I don't think you can tell Bowtie to find exactly 1 or 2 mismatches,
    I think you can only tell it the maximum number of mismatches to allow.

    Comment


    • #3
      So are you confirming that --no-1mm-upfront should be used as --no-1mm-upfront 1 or --no-1mm-upfront 2? Or should N and L be used?

      Once > 1 time aligned reads are reported by bowtie, how is it possible to separate reads that aligned exactly once from those that aligned > 1 times?

      Thanks

      Comment


      • #4
        It's just "--no-1mm-upfront" (it doesn't take an argument).

        Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.

        Comment


        • #5
          but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

          I meant mapping to repetitive regions by > 1 times alignment because in stats report, I get > 50% of > 1 times alignments. So the value of MAPQ is heureustic. In a given interval, how to choose the best?

          Comment


          • #6
            Originally posted by carolW View Post
            but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?
            No, -no-1mm-upfront disables bowtie's default behaviour (which is to find alignments with 0 or 1 mismatches).
            You can set -N 2 if you want to allow up to 2 mismatches in the seed region.

            Comment


            • #7
              When I set -N 2, I get error message:

              Error: -N was set to 2, but cannot be set greater than 1
              Error: Encountered internal Bowtie 2 exception (#1)

              Is there any other parameter that should be set, too?

              Comment


              • #8
                Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).

                Comment


                • #9
                  Originally posted by dpryan View Post
                  It's just "--no-1mm-upfront" (it doesn't take an argument).

                  Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.
                  How could we judge a threshold as a reasonable? Does it depend of the data? All info is welcome.

                  Comment


                  • #10
                    The MAPQ relates to the probability that the alignment is correct, so just pick a value that you're happy with depending on your downstream applications. For RNAseq, I usually use a theshold of 5, since there's enough coverage that a small amount of error won't have any considerable effect. For bisulfite sequencing data, on the other hand, I've found that a MAPQ threshold of 10 is usually the sweet spot, since there's less coverage per site, so one can't accept as much error. For variant calling, many of the callers utilize MAPQ and Phred scores in their call algorithms, so you may either not bother filtering or might just remove the highly unreliable alignments, which for bowtie2 are those with MAPQ of 0 or 1.

                    If you're looking for some objectively perfect filtering algorithm there is none, it's just a question of how much error your requirements can accept.

                    Comment


                    • #11
                      so it seems to be easy with my data as I have 0, 1, 42. 0 must corresponds to 0 time alignment as there is u in the strand column. 1 must be ambigous or aligned > 1 time and 42 unambigous, or aligned exactly once.

                      Comment


                      • #12
                        Yeah, life is easy when you have just 3 values. A value of 42 is given when there's a perfect match and there's no valid next-best alignment. If you played with --score-min then you'd eventually get a larger variety of MAPQ scores, though that'd just overcomplicate your life

                        Comment


                        • #13
                          BTW, there are actually 5 ways in which bowtie2 will yield a MAPQ of 0, only one of which is due to a read not being mapped (it's an unreliable alignment in any case). It's actually possible to have a "unique" alignment with a MAPQ of 0, assuming the definition of "unique" is having only one valid alignment given the --score-min and penalty settings.

                          Comment


                          • #14
                            agree with you

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).
                              so, what is the right way to set the overall permitted mismatches while mapping to the reference genome index with bowtie2? looking forward to your answer!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              104 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              112 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              1 response
                              116 views
                              0 likes
                              Last Post EmiTom
                              by EmiTom
                               
                              Started by seqadmin, 09-26-2024, 12:57 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X