Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie number of mismatches and multiple aligned reads

    Hi
    Should the --no-1mm-upfront parameter be used with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

    Should 1 as cutoff for MAPQ be used to discriminate the exactly 1 time aligned vs >1 time aligned reads?

    Look forward to your reply,

    Carol

  • #2
    --no-1mm-upfront

    Below is an excerpt from the Bowtie manual:



    "By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like -L and -N. For instance, if the user specifies -N 0 and -L equal to the length of the read, the user will be surprised to find 1-mismatch alignments reported. This option prevents Bowtie 2 from searching for 1-mismatch end-to-end alignments before using the multiseed heuristic, which leads to the expected behavior when combined with options such as -L and -N. This comes at the expense of speed."


    I don't think you can tell Bowtie to find exactly 1 or 2 mismatches,
    I think you can only tell it the maximum number of mismatches to allow.

    Comment


    • #3
      So are you confirming that --no-1mm-upfront should be used as --no-1mm-upfront 1 or --no-1mm-upfront 2? Or should N and L be used?

      Once > 1 time aligned reads are reported by bowtie, how is it possible to separate reads that aligned exactly once from those that aligned > 1 times?

      Thanks

      Comment


      • #4
        It's just "--no-1mm-upfront" (it doesn't take an argument).

        Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.

        Comment


        • #5
          but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?

          I meant mapping to repetitive regions by > 1 times alignment because in stats report, I get > 50% of > 1 times alignments. So the value of MAPQ is heureustic. In a given interval, how to choose the best?

          Comment


          • #6
            Originally posted by carolW View Post
            but -no-1mm-upfront attempts to find 0 or 1 mismatch. How about 2 mismatches?
            No, -no-1mm-upfront disables bowtie's default behaviour (which is to find alignments with 0 or 1 mismatches).
            You can set -N 2 if you want to allow up to 2 mismatches in the seed region.

            Comment


            • #7
              When I set -N 2, I get error message:

              Error: -N was set to 2, but cannot be set greater than 1
              Error: Encountered internal Bowtie 2 exception (#1)

              Is there any other parameter that should be set, too?

              Comment


              • #8
                Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).

                Comment


                • #9
                  Originally posted by dpryan View Post
                  It's just "--no-1mm-upfront" (it doesn't take an argument).

                  Your goal isn't to filter out "unique" vs. "non-unique" mappers, because there's no such thing (the terms are simply wrong and bowtie should just be changed to not use them, no reads are unique if you consider a large enough edit distance). Rather, your goal is to filter out alignments that are/aren't reliable. The normal way to do that is by MAPQ score, with reasonable thresholds being somewhere between 5 and 10.
                  How could we judge a threshold as a reasonable? Does it depend of the data? All info is welcome.

                  Comment


                  • #10
                    The MAPQ relates to the probability that the alignment is correct, so just pick a value that you're happy with depending on your downstream applications. For RNAseq, I usually use a theshold of 5, since there's enough coverage that a small amount of error won't have any considerable effect. For bisulfite sequencing data, on the other hand, I've found that a MAPQ threshold of 10 is usually the sweet spot, since there's less coverage per site, so one can't accept as much error. For variant calling, many of the callers utilize MAPQ and Phred scores in their call algorithms, so you may either not bother filtering or might just remove the highly unreliable alignments, which for bowtie2 are those with MAPQ of 0 or 1.

                    If you're looking for some objectively perfect filtering algorithm there is none, it's just a question of how much error your requirements can accept.

                    Comment


                    • #11
                      so it seems to be easy with my data as I have 0, 1, 42. 0 must corresponds to 0 time alignment as there is u in the strand column. 1 must be ambigous or aligned > 1 time and 42 unambigous, or aligned exactly once.

                      Comment


                      • #12
                        Yeah, life is easy when you have just 3 values. A value of 42 is given when there's a perfect match and there's no valid next-best alignment. If you played with --score-min then you'd eventually get a larger variety of MAPQ scores, though that'd just overcomplicate your life

                        Comment


                        • #13
                          BTW, there are actually 5 ways in which bowtie2 will yield a MAPQ of 0, only one of which is due to a read not being mapped (it's an unreliable alignment in any case). It's actually possible to have a "unique" alignment with a MAPQ of 0, assuming the definition of "unique" is having only one valid alignment given the --score-min and penalty settings.

                          Comment


                          • #14
                            agree with you

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              Bowtie2 doesn't allow more than 1 mismatch in the seed. Note that the number of mismatches in the seed is not the same as the number allowed for the whole alignment (unless your reads are the same length as the seeds).
                              so, what is the right way to set the overall permitted mismatches while mapping to the reference genome index with bowtie2? looking forward to your answer!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                11-06-2024, 07:24 PM
                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 11-08-2024, 11:09 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-08-2024, 06:13 AM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X