Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • charlie_sequencing
    Member
    • Jan 2011
    • 13

    samtools XS:A optional field

    Does anyone know what XS:A field in samtools output mean by any chance?

    XS:A (A is the SAM attribute type) can take the values of "+" or "-". I thought "+" or "-" might be related to how the read is mapped to the ref (forward or reverse complement), but it turned out not the case, because forward or reverse complement read can take both values.

    Thanks!
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by charlie_sequencing View Post
    Does anyone know what XS:A field in samtools output mean by any chance?

    XS:A (A is the SAM attribute type) can take the values of "+" or "-". I thought "+" or "-" might be related to how the read is mapped to the ref (forward or reverse complement), but it turned out not the case, because forward or reverse complement read can take both values.

    Thanks!
    The optional tags are specific to the aligner that generated them. What aligner did you use? Take a look at the manual from that aligner.

    Comment

    • dagarfield
      Member
      • Aug 2010
      • 39

      #3
      There are two places in a SAM file that, on the surface, *seem* to provide equivalent information -- the second column (0 vs 16, or something) and the XS:A field.

      I have a vague memory that the first tells you about mapping relative to the reference sequence. The second (XS:A) tells you something about the direction/strand of the *read itself*.

      I wish I could find the post I read that told me that. I first came across the distinction while trying to understand the input fields required for TopHat. You might give a try searching for "TopHat" and "XS field" together. Please post back if you figure it out.

      Best of luck,

      DG

      Comment

      • charlie_sequencing
        Member
        • Jan 2011
        • 13

        #4
        Thanks so much for your reply, nilshomer and dagarfield. I found the Cufflinks manual http://cufflinks.cbcb.umd.edu/manual.html where XS field was specified. When a read is mapped to the ref across splicing junctions, the boundary of intron (GT-AG, or CT-AC) will tell you the value of XS: "+" for GT-AG and "-" for CT-AC. It tells you "which strand the RNA that produced this read came from".

        This led me to think about another question: is TopHat able to report the strand of the read if it is completely contained within an exon, i.e., without N in the CIGAR string?

        Comment

        • frozenlyse
          Senior Member
          • Sep 2008
          • 135

          #5
          Early RNA-seq protocols were unstranded ie the RNA may have originated on the plus strand, but because the library was created from double stranded cDNA you obtain reads from both the plus and minus strand - in this case the samtools flag for which strand the read mapped to doesnt tell you which strand the RNA it originated from. However, if the read is across a splice junction you can work it out (as charlie_sequencing posted) hence the tophat XS flag. Reads from an unstranded library which do not cross a splice cannot be assigned a strand.

          However, lots of people are doing stranded RNA-seq these days where the reads only come from one strand which is a lot nicer.

          Comment

          • flobpf
            Member
            • Apr 2010
            • 76

            #6
            Other posts that may be of help

            Please check out these posts for a discussion of directional sequencing and XS:A tags:

            Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

            and
            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Comment

            • Bioinyang
              Junior Member
              • May 2013
              • 1

              #7
              Originally posted by frozenlyse View Post
              Early RNA-seq protocols were unstranded ie the RNA may have originated on the plus strand, but because the library was created from double stranded cDNA you obtain reads from both the plus and minus strand - in this case the samtools flag for which strand the read mapped to doesnt tell you which strand the RNA it originated from. However, if the read is across a splice junction you can work it out (as charlie_sequencing posted) hence the tophat XS flag. Reads from an unstranded library which do not cross a splice cannot be assigned a strand.

              However, lots of people are doing stranded RNA-seq these days where the reads only come from one strand which is a lot nicer.
              Hi,frozenlyse, As you said, unstranded library we don't konw reads come from which strand. Now I have a dataset of dUTP and I use --library firststrand .Tophat said,"it is assumed that only the strand generated during first strand synthesis is sequenced.",that means, the reads come from minus strand. But I found a half of reads was taged "XS:A:+"

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              30 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              96 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              115 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              108 views
              0 reactions
              Last Post SEQadmin2  
              Working...