Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bio.X2Y
    Member
    • Apr 2010
    • 46

    SAM Format - Hard Clip

    Hiya,

    I was wondering can shed light on the following question on the hard-clipping within the SAM format (I'm not able to pinpoint an answer in the spec):

    If a query read aligns with H's on either end (i.e. hard-clipped), I understand the query sequence itself is truncated in the SEQ field.

    My question is - in these cases, should the POS reflect
    (a) the index of the first aligned base in the reference, or b
    (b) the index of the first base in the reference correspond to the first H.

    Thanks for your time.
    Bio.
  • mrawlins
    Member
    • Apr 2010
    • 63

    #2
    I don't recall having seen a SAM file with an H at the beginning of the CIGAR, only at the end. If the H is at the end (or somehow in the middle) then it shouldn't affect the position.

    I would guess the position reflects the first base corresponding to the first H, though that may not be true. My reasoning is, if you're not starting from the first H, why even include the hard clipping in the first place? Since hard clipped sequence isn't included in the reported alignment, the only real reason for including it is to indicate the offset from the state position.

    That being said, I'm just guessing here. If somebody knows conclusively, go with whatever they tell you.

    Good Luck

    Comment

    • Bio.X2Y
      Member
      • Apr 2010
      • 46

      #3
      Thanks mrawlins! I'm leaning towards that interpretation too.

      I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

      Anybody have something conclusive?
      Thanks for your time

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        Originally posted by Bio.X2Y View Post
        Thanks mrawlins! I'm leaning towards that interpretation too.

        I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

        Anybody have something conclusive?
        Thanks for your time
        Since the spec. is ambiguous in this case, and it probably should not be, could you send your email to the samtools help list ([email protected])? I think the spec. could benefit from this question.

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            Originally posted by lh3 View Post
            SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?
            I was reading 1.4.3, since this is where you would ask such a question. Very clear, sorry .

            Comment

            • Bio.X2Y
              Member
              • Apr 2010
              • 46

              #7
              Thanks guys.

              lh3, I'm afraid I still find that a bit ambiguous -

              does "clipped sequence" refer to the "sequence that is clipped" (implying the whole thing, including the clipped part) or the "region of the sequence that remains after clipping".

              Which way are you taking it?

              Thanks

              Comment

              • Bio.X2Y
                Member
                • Apr 2010
                • 46

                #8
                As I think about it, the second one seems to make the most sense - "region of the sequence that remains after clipping".

                Thanks for your input everyone.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                7 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...