Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Coryza
    Member
    • Feb 2014
    • 29

    [Bowtie2] CIGAR string calculation.

    Hi All,

    The SAM output gives the 1-based leftmost mapping POSition of the first matching base of the reference. I am wondering if it is possible to calculate the last most mapping POSition of the reference? If yes, how? What should I sum and what should I extract?

    Op BAM Description
    M 0 alignment match (can be a sequence match or mismatch)
    I 1 insertion to the reference
    D 2 deletion from the reference
    N 3 skipped region from the reference
    S 4 soft clipping (clipped sequences present in SEQ)
    H 5 hard clipping (clipped sequences NOT present in SEQ)
    P 6 padding (silent deletion from padded reference)
    = 7 sequence match
    X 8 sequence mismatch
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Yes, you can do that. In fact, in the samtools C API there's a function (bam_calend) that does exactly that given a starting position and CIGAR string. The only CIGAR operations you have to worry about are 'M', '=', 'X', 'D', and 'N'. In each of those cases, just increment the position by the length of the operation (so 30M would increment by 30).

    Remember to decrement the value by 1 at some point, or else you'll end up being 1 base off (if you were dealing with a BAM, this wouldn't be needed, since the coordinate is 0-based then and the result would then be correct in 1-based coordinates).

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      BTW, there's also a 'B' operation (value 9, or BAM_CBACK), which I've never actually seen and seems to have been intended for Complete Genomics data. You can likely ignore it, since it's never made its way into actual use.

      Comment

      • Coryza
        Member
        • Feb 2014
        • 29

        #4
        Ok thanks! I'll do that. I do have an other question perhaps you can answer me, otherwise I'll make a new threat.

        I've got Paired-End Illumina data mapped against the Human Hg19. When viewing the SAM output, how can I check if a pair mapped against the forward Hg19 genome sequence or against the reverse Hg19 genome sequence?

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.

          Comment

          • Coryza
            Member
            • Feb 2014
            • 29

            #6
            Originally posted by dpryan View Post
            Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.
            As far as I know this are all the cDNA sequences, forward and reverse data. I was hoping that I could see whenever a pair-end of sequences matches to the forward hg19 genome, or reverse hg19 genome. It matters because I want to look at a few + stranded genes and - stranded genes, and I would be handy if I can sort that during my analysis.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              You're best off just opening things in IGV and having a look at a couple genes. Then you'll know how the library prep was done and if you can use the 0x10 bit in the flag or not.

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 07-02-2026, 11:08 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...