Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA : XM tag is sometimes wrong

    Hi,
    I've been using the XM tag to find reads with no mismatches but sometimes this tag doesn't give the right number of mismatches.
    Has someone else had this problem ? How did you fix it ?

    Here is a few examples :

    HWI-ST0787:100:C02F9ACXX:7:2307:2404:186548 163 gi|83578099:1-1090946 95044 60 37M1D2M1D62M = 95049 108 GGGGTTTCGGAAAACAAACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTC +1+4+0=D+<CFADB9E@@99:CG:BF)*9?DDDC@D?'-<;@=FHCHDB1?EEBCFEFDDCC;?B=8<35@C9?AA?A:?(:4<8ACBB<995>>158 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:2 MD:Z:0C1T34^G2^A62

    HWI-ST0787:100:C02F9ACXX:7:2307:9817:186685 147 gi|83578099:1-1090946 95060 60 21M1D2M1D78M = 94975 -188 AACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTCTTTCTTTCCCCCACTT @9B@DDBBDDEEDDDDDDDDBB<@DCDDDAB<@?:3EDC?<8DDDDBDCA??EBHHHHHHIIIIJIIGGJIHIGIIGJJJIIJIGEIHF;@?1FDDBB?B? XT:A:U NM:i:2 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:1 XG:i:2 MD:Z:21^G2^A78

    HWI-ST0787:100:C02F9ACXX:7:2307:17522:186893 83 gi|83578099:1-1090946 999268 60 68M1I31M2D1M = 999191 -179 CCCTGTATAATGAAATTTCAAAAATATTTTCGTGAATAGTGATTTATTTAATTTAAGCACTAAATTATCCTTACGGACTTGGGCTACATTCATGTTTGCAC BCCCCDDCADCCCCCCCCEED?3HEEA;4EAHHEG>FDCB=CIHGIIGGIHFF<DBGIHEGEIGGFF<EGGBCFAFAB?B3HB<BFE9B>DDHD??DA@?1 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:1 XG:i:1 MD:Z:3C95^AG1

    HWI-ST0787:100:C02F9ACXX:7:2307:12781:188676 83 gi|50593115:1-813178 389360 60 61M1D3M1D37M = 389271 -192 TTTAACTTATGAATGTACTTTACTGGCCAAGAATCCGTCTGGAACCATTCTACGGTGCTCTTGCTAGCGCTAAAGACAGCTATAGTGGATATTCAGACGGT >DDCCCCC@DDFDCCCBCCCCDBCCAECECDB8HHHIHDA@==)GCGDCFC8GEJIFJFIGHDJGIIGGHIIGGEBJJFHIJIGGHFGGGHHDFDDBF@C@ XT:A:U NM:i:3 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:1 XG:i:2 MD:Z:61^C1T1^G37

  • #2
    BWA : XM tag is sometimes wrong

    from the CIGAR strings for the reads in your examples, it looks like you have some deletions or insertions, but not mismatches.

    Comment


    • #3
      More details

      HWI-ST0787:100:C02F9ACXX:7:2307:2404:186548 163 gi|83578099:1-1090946 95044 60 37M1D2M1D62M = 95049 108 GGGGTTTCGGAAAACAAACTCGCTCGATACAGTAATTGCGTTTTATTTACGGAAATTACCGTTCTCGGTTCCAAGAAGGTTAGAAAAATCGGTTGTCGCTC +1+4+0=D+<CFADB9E@@99:CG:BF)*9?DDDC@D?'-<;@=FHCHDB1?EEBCFEFDDCC;?B=8<35@C9?AA?A:?(:4<8ACBB<995>>158 XT:A:U NM:i:4 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:1 XG:i:2 MD:Z:0C1T34^G2^A62

      For example in this line, we have 2 deletions and 2 mismatches (see the MD tag). It sums to an edit distance of 4 (in accordance to the NM tag)
      However, the XM tag is equal to 3, whereas it should be 2.

      Comment


      • #4
        BWA : XM tag is sometimes wrong

        Originally posted by mastal View Post
        from the CIGAR strings for the reads in your examples, it looks like you have some deletions or insertions, but not mismatches.
        sorry, my error, M in the CIGAR string means match or mismatch.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X