Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem mapping quality on ambiguous mapping reads

    Hello,

    I used bwa mem to align 125bp single end reads to human decoy reference genome. I know bwa will assign mapping quality as zero when one read mapped to two or more locations in the genome. However, I noticed some reads which are mapped equally well to different genomic locations, e.g. one read is mapped to equally well to autosome chromosome (chr16) and one of the patches (GL000192). CIGAR for both alignments are 125M. However, mapping quality for the alignment on chr16 is 23, while the alignment mapped to GL000192 got mapping quality of zero. I thought both of them should have mapping quality as zero? Is this right or not?

    thanks!

  • #2
    It is also my understanding that mapping quality in that case would be zero for both.

    Is there an option to randomly keep one of the multiple mappings rather than discard all of them in bwa mem?

    Comment


    • #3
      I just put some more detail about this question:

      The fastq file used in the alignment is not a fastq file from sequencer. I sliced HYDIN2 sequence into small pieces, each is 125 bp long. I assigned base quality as 30 ("I") for all bases. So all bases have a high base quality. When I did alignment, I asked bwa to output also secondary alignment (using -a option). The record I mentioned here are as following:

      b38_1:146691684-146691808 16 16 71053369 23 125M * 0 0 AGCTGAAA.... IIIIIIIIIIII.... NM:i:1 MD:Z:88T36 AS:i:120 XS:i:110
      b38_1:146691684-146691808 272 GL000192.1 263206 0 125M * 0 0 * * NM:i:3 MD:Z:5G31G50T36 AS:i:110

      Comment


      • #4
        I can't find anywhere a formal definition for the meaning of MAPQ set to 0 by BWA.
        There are only forum posts saying that a MAPQ set to 0 means that a read has multiple hits.

        In your example, the second alignment has the NM tag set to 3, meaning the edit distance to the reference (number of nucleotide differences) is 3.
        The NM tag is set to 1 in the first alignment.

        One could surmise that the 1st alignment is unique in the sense that the second alignment is of such poor quality that it doesn't count.

        Admittedly, this is just wild speculation.
        There should be a formal definition of MAPQ set to 0 to which aligners should adhere, to make the interpretation of the mapping quality less arduous.

        It is certain that the second alignment is of far lesser quality than the first, so it does make sense that the mapping quality is much lower.

        Comment


        • #5
          Hi blancha,

          Thanks for the explanation! But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?

          Comment


          • #6
            But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?
            If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.


            At least, that is my understanding of the convoluted SAM format.
            Attached Files
            Last edited by blancha; 10-29-2015, 11:06 AM.

            Comment


            • #7
              Thanks! This is more clear! It seems 'M' and 'X','=' giving some redundant information.

              Comment


              • #8
                Originally posted by blancha View Post
                If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.

                At least, that is my understanding of the convoluted SAM format.
                Yep, that's correct. But the most recent SAM specification reports mismatches in the cigar string, as well. You can see this by mapping with BBMap, which uses the 'X' and '=' symbols.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:54 AM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-14-2024, 07:24 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-13-2024, 08:58 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-12-2024, 02:20 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Working...
                X