Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem mapping quality on ambiguous mapping reads

    Hello,

    I used bwa mem to align 125bp single end reads to human decoy reference genome. I know bwa will assign mapping quality as zero when one read mapped to two or more locations in the genome. However, I noticed some reads which are mapped equally well to different genomic locations, e.g. one read is mapped to equally well to autosome chromosome (chr16) and one of the patches (GL000192). CIGAR for both alignments are 125M. However, mapping quality for the alignment on chr16 is 23, while the alignment mapped to GL000192 got mapping quality of zero. I thought both of them should have mapping quality as zero? Is this right or not?

    thanks!

  • #2
    It is also my understanding that mapping quality in that case would be zero for both.

    Is there an option to randomly keep one of the multiple mappings rather than discard all of them in bwa mem?

    Comment


    • #3
      I just put some more detail about this question:

      The fastq file used in the alignment is not a fastq file from sequencer. I sliced HYDIN2 sequence into small pieces, each is 125 bp long. I assigned base quality as 30 ("I") for all bases. So all bases have a high base quality. When I did alignment, I asked bwa to output also secondary alignment (using -a option). The record I mentioned here are as following:

      b38_1:146691684-146691808 16 16 71053369 23 125M * 0 0 AGCTGAAA.... IIIIIIIIIIII.... NM:i:1 MD:Z:88T36 AS:i:120 XS:i:110
      b38_1:146691684-146691808 272 GL000192.1 263206 0 125M * 0 0 * * NM:i:3 MD:Z:5G31G50T36 AS:i:110

      Comment


      • #4
        I can't find anywhere a formal definition for the meaning of MAPQ set to 0 by BWA.
        There are only forum posts saying that a MAPQ set to 0 means that a read has multiple hits.

        In your example, the second alignment has the NM tag set to 3, meaning the edit distance to the reference (number of nucleotide differences) is 3.
        The NM tag is set to 1 in the first alignment.

        One could surmise that the 1st alignment is unique in the sense that the second alignment is of such poor quality that it doesn't count.

        Admittedly, this is just wild speculation.
        There should be a formal definition of MAPQ set to 0 to which aligners should adhere, to make the interpretation of the mapping quality less arduous.

        It is certain that the second alignment is of far lesser quality than the first, so it does make sense that the mapping quality is much lower.

        Comment


        • #5
          Hi blancha,

          Thanks for the explanation! But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?

          Comment


          • #6
            But both alignments says 125 base pair matching (CIGAR), so there is no base differences. It seems the SAM record gives different information? Or something I understand wrong?
            If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.


            At least, that is my understanding of the convoluted SAM format.
            Attached Files
            Last edited by blancha; 10-29-2015, 11:06 AM.

            Comment


            • #7
              Thanks! This is more clear! It seems 'M' and 'X','=' giving some redundant information.

              Comment


              • #8
                Originally posted by blancha View Post
                If you check the official SAM format specification, you'll see that M is for alignment match, and "can be a sequence match or mismatch". 125 bases aligned, but there still can be mismatches, in this case 3.

                At least, that is my understanding of the convoluted SAM format.
                Yep, that's correct. But the most recent SAM specification reports mismatches in the cigar string, as well. You can see this by mapping with BBMap, which uses the 'X' and '=' symbols.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Exploring the Dynamics of the Tumor Microenvironment
                  by seqadmin




                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                  07-08-2024, 03:19 PM
                • seqadmin
                  Exploring Human Diversity Through Large-Scale Omics
                  by seqadmin


                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                  06-25-2024, 06:43 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 07-19-2024, 07:20 AM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-16-2024, 05:49 AM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-15-2024, 06:53 AM
                0 responses
                61 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-10-2024, 07:30 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X