Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • davetang
    Member
    • Jul 2010
    • 11

    SAM/BAM MD tag

    Dear all,

    I am writing a parser for the MD tag in SAM/BAM files because I couldn't find one. I am interested in tallying the alignment mismatches and the MD field contains the information I need.

    In the example of the SAM manual:

    The MD field aims to achieve SNP/indel calling without looking at the reference. For example, a string "10A5^AC6" means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD field ought to match the CIGAR string.

    I was wondering how the MD field would describe a 2bp deletion that is followed by a mismatch e.g.


    R: AAAAAAAAAAATTTTT--GTTTTT
    Q: AAAAAAAAAAGTTTTTACATTTTT


    since this would be "10A5^ACG5".

    Perhaps I need to incorporate the CIGAR information to properly parse these cases or these cases never happen? Of course if a parser is already available for doing this, I would much prefer that.

    Thank you in advance,

    Dave
  • davetang
    Member
    • Jul 2010
    • 11

    #2
    Should have looked before I posted my question but nevertheless this post could still be useful for someone else.

    30M8D6M 27T2^ATGCATTT0G3T1

    There is a 0 separating the 8 deletions and the single mismatch after the deletions.

    Comment

    • Mad_bess
      Junior Member
      • Mar 2012
      • 1

      #3
      Could you please share the source code of the parser ? Would be sooo nice of you

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        bio::db::sam has a parser that does this.

        Comment

        • davetang
          Member
          • Jul 2010
          • 11

          #5
          Originally posted by Mad_bess View Post
          Could you please share the source code of the parser ? Would be sooo nice of you
          Hello,

          As nilshomer suggested, use the Perl module use Bio:B::Sam. I wrote some code which uses the module:

          Lincoln Stein has written a bunch of modules to deal with SAM/BAM files. Check out the CPAN module. If you are having trouble installing Bio::DB::Sam, you may have to recompile SAMTools with the following command: To install the Perl module on a machine where you don't have root access, follow these instructions. Using this module,...


          The code is written by me, so caveat emptor.

          Cheers,

          Dave

          Comment

          • adrian
            Member
            • Oct 2009
            • 90

            #6
            What is the meaning of '0' in MD tag

            I could never understand what the purpose of 0 in MD tag. Could you help understanding that please.
            thanks.
            Adrian

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              I haven't found why they are necessary, but sometimes it helps to have them visually. They generally occur between SNPs, or between a deletion then a SNP.

              For example, "5^AC0C5" with a cigar "5M2D6M", or "5A0C5" with a cigar "12M". In the former it is easy to see where the deletion ends (the 0) and the next base (a C SNP) starts.

              The SAMtools code puts them in, so other's follow the same lead. You could ask the samtools help list.

              Comment

              • ihoskins
                Junior Member
                • Jun 2016
                • 4

                #8
                For clarification on the 0, see this blog: https://lh3.github.io/2018/03/27/the...and-the-md-tag

                As nilshomer indicates this is to delineate mismatches from the reference in the context of deletions.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                30 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                96 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                116 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                109 views
                0 reactions
                Last Post SEQadmin2  
                Working...