Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM Format - Hard Clip

    Hiya,

    I was wondering can shed light on the following question on the hard-clipping within the SAM format (I'm not able to pinpoint an answer in the spec):

    If a query read aligns with H's on either end (i.e. hard-clipped), I understand the query sequence itself is truncated in the SEQ field.

    My question is - in these cases, should the POS reflect
    (a) the index of the first aligned base in the reference, or b
    (b) the index of the first base in the reference correspond to the first H.

    Thanks for your time.
    Bio.

  • #2
    I don't recall having seen a SAM file with an H at the beginning of the CIGAR, only at the end. If the H is at the end (or somehow in the middle) then it shouldn't affect the position.

    I would guess the position reflects the first base corresponding to the first H, though that may not be true. My reasoning is, if you're not starting from the first H, why even include the hard clipping in the first place? Since hard clipped sequence isn't included in the reported alignment, the only real reason for including it is to indicate the offset from the state position.

    That being said, I'm just guessing here. If somebody knows conclusively, go with whatever they tell you.

    Good Luck

    Comment


    • #3
      Thanks mrawlins! I'm leaning towards that interpretation too.

      I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

      Anybody have something conclusive?
      Thanks for your time

      Comment


      • #4
        Originally posted by Bio.X2Y View Post
        Thanks mrawlins! I'm leaning towards that interpretation too.

        I'm trying to get a definitive stance since I'm in the process of writing some code that supports sam, and I'd prefer not to add another non-compliant application to the mix!

        Anybody have something conclusive?
        Thanks for your time
        Since the spec. is ambiguous in this case, and it probably should not be, could you send your email to the samtools help list ([email protected])? I think the spec. could benefit from this question.

        Comment


        • #5
          SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?

          Comment


          • #6
            Originally posted by lh3 View Post
            SAM spec: POS is "1-based leftmost POSition/coordinate of clipped sequence". Isn't it clear?
            I was reading 1.4.3, since this is where you would ask such a question. Very clear, sorry .

            Comment


            • #7
              Thanks guys.

              lh3, I'm afraid I still find that a bit ambiguous -

              does "clipped sequence" refer to the "sequence that is clipped" (implying the whole thing, including the clipped part) or the "region of the sequence that remains after clipping".

              Which way are you taking it?

              Thanks

              Comment


              • #8
                As I think about it, the second one seems to make the most sense - "region of the sequence that remains after clipping".

                Thanks for your input everyone.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X