Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

SAM/BAM MD tag

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM/BAM MD tag

    Dear all,

    I am writing a parser for the MD tag in SAM/BAM files because I couldn't find one. I am interested in tallying the alignment mismatches and the MD field contains the information I need.

    In the example of the SAM manual:

    The MD field aims to achieve SNP/indel calling without looking at the reference. For example, a string "10A5^AC6" means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD field ought to match the CIGAR string.

    I was wondering how the MD field would describe a 2bp deletion that is followed by a mismatch e.g.


    R: AAAAAAAAAAATTTTT--GTTTTT
    Q: AAAAAAAAAAGTTTTTACATTTTT


    since this would be "10A5^ACG5".

    Perhaps I need to incorporate the CIGAR information to properly parse these cases or these cases never happen? Of course if a parser is already available for doing this, I would much prefer that.

    Thank you in advance,

    Dave

  • #2
    Should have looked before I posted my question but nevertheless this post could still be useful for someone else.

    30M8D6M 27T2^ATGCATTT0G3T1

    There is a 0 separating the 8 deletions and the single mismatch after the deletions.

    Comment


    • #3
      Could you please share the source code of the parser ? Would be sooo nice of you

      Comment


      • #4
        bio::db::sam has a parser that does this.

        Comment


        • #5
          Originally posted by Mad_bess View Post
          Could you please share the source code of the parser ? Would be sooo nice of you
          Hello,

          As nilshomer suggested, use the Perl module use Bio:B::Sam. I wrote some code which uses the module:

          http://davetang.org/muse/2011/01/28/perl-and-sam/

          The code is written by me, so caveat emptor.

          Cheers,

          Dave

          Comment


          • #6
            What is the meaning of '0' in MD tag

            I could never understand what the purpose of 0 in MD tag. Could you help understanding that please.
            thanks.
            Adrian

            Comment


            • #7
              I haven't found why they are necessary, but sometimes it helps to have them visually. They generally occur between SNPs, or between a deletion then a SNP.

              For example, "5^AC0C5" with a cigar "5M2D6M", or "5A0C5" with a cigar "12M". In the former it is easy to see where the deletion ends (the 0) and the next base (a C SNP) starts.

              The SAMtools code puts them in, so other's follow the same lead. You could ask the samtools help list.

              Comment


              • #8
                For clarification on the 0, see this blog: https://lh3.github.io/2018/03/27/the...and-the-md-tag

                As nilshomer indicates this is to delineate mismatches from the reference in the context of deletions.

                Comment

                Working...
                X