Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zorph
    Member
    • May 2010
    • 40

    manual XS:A[-|+] assignment for cufflinks

    Hi,
    I am using GSNAP and I have assigned the strand to my reads. I have tried to format my bam file in same way tophat's accepted_hits.bam file is formatted (I have placed an example of this below). However, when I run my sorted bam file through cufflinks, I get the following lines:

    "BAM record error: found spliced alignment without XS attribute"

    Does anyone have any idea as to what I could be doing wrong with my formatting? Thanks.

    *Altered GSNAP file: output.bam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:1207:12974:19520       147     chr1    4610    40      83M140N16M      =       4311    0       CGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGTGCCTCATGAC     DDCADDC@CCDA>CC@@@DDDCADDDDDDDDCDCCCB?DCDDCFFHHBHHDGC=CJIIJGHCHEIIGEHE3JJJJIIHFEGCJIIIIFHHGHDDFDD?C     NM:i:1  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      163     chr1    15      40      75M     =       41      101     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC     144=BDHHHHHJJJJJJJJJJJJJJJJJIJJJJJJGIJJFHJJJJJJJIJIJJIJIGIHHHHFFFFFEEDEDCDB     NM:i:0  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      83      chr1    41      40      69M6S   =       15      -101    CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTCCAC     DABBCDEEDBDFFFFHAHHJIGDJIIHFFJJJIHFJJHHGBJJIHHDJJIGHHJJIHHGJJJIHHHHFCFEFFFC     NM:i:0  XS:A:-  NH:i:1
    *TOPHAT File: accepted_hits.sam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:2206:17700:177847      113     chr1    5783    1       28M659N71M      =       5783    0       TCGACCACTTCCCTGGCAGCTCCCTGGACTGAAGGAGACGTGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGTT     DDDCDDCDDDBBCC<3(BDDBDDDDDDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFHHHHJJJJJJJJIIJIJJIJJJJJJJJJHHHHHFFDB=+@     NM:i:4  XS:A:-  NH:i:3  CC:Z:chrX       CP:i:154906249
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        129     chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7  CC:Z:chr15      CP:i:100331773
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        65      chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7
  • ParthavJailwala
    Member
    • Oct 2009
    • 27

    #2
    Hi zorph,

    I too am trying to get cufflinks to read a GSNAP generated SAM file. Just curious, were those XS:A:- or XS:A:+ tags automatically inserted by GSNAP OR you manually inserted them?
    If you manually inserted these tags, how did you figure out strand information (+ or -) ?
    Thanks

    Comment

    • jnfass
      Member
      • Aug 2008
      • 88

      #3
      I'll third this complaint ... with SAM generated by GMAP. I've checked, and all of the records with N's in CIGAR strings do have XS:A:[+-] tags. I had wondered if there was a specific order that cufflinks is expecting the tags to be in, but the OP's example doesn't deviate from the example SAM lines given in the manual, so that seems unlikely.

      Please post if either of you crack this case.

      Comment

      • rnaseek
        Member
        • Nov 2011
        • 22

        #4
        The latest version of GSNAP says that ( version released on 2012-04-27 ) it adds the XS tags, so one doe not have to do this manually.

        The XS tag is added to spliced reads and it tells information about which strand the read came from (not the strand it aligned to.) The cufflinks manual says that

        This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).
        Note the strand it aligned to is easy to get from sam flag. But, getting the strand info of RNA it came from is tricky in unstranded sequencing. TopHat uses splice junction information to infer that. One can manually try to add the XS tag based on the sequence info at the splice junction of the alignment. TopHat manual says that

        With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns

        Comment

        • jnfass
          Member
          • Aug 2008
          • 88

          #5
          I should have mentioned that I found my issue. I was careless before in saying that all my spliced alignments had XS:A:[+-] tags. Some of them instead have XS:A:? tags (presumably where the transcript's strand couldn't be determined from the sequence at the edges of the splice?) - and when I removed these undetermined XS tags, Cufflinks doesn't give me that error anymore. Hope this helps someone.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 08:59 AM
          0 responses
          7 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          29 views
          0 reactions
          Last Post SEQadmin2  
          Working...