Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • manual XS:A[-|+] assignment for cufflinks

    Hi,
    I am using GSNAP and I have assigned the strand to my reads. I have tried to format my bam file in same way tophat's accepted_hits.bam file is formatted (I have placed an example of this below). However, when I run my sorted bam file through cufflinks, I get the following lines:

    "BAM record error: found spliced alignment without XS attribute"

    Does anyone have any idea as to what I could be doing wrong with my formatting? Thanks.

    *Altered GSNAP file: output.bam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:1207:12974:19520       147     chr1    4610    40      83M140N16M      =       4311    0       CGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGTGCCTCATGAC     DDCADDC@CCDA>CC@@@DDDCADDDDDDDDCDCCCB?DCDDCFFHHBHHDGC=CJIIJGHCHEIIGEHE3JJJJIIHFEGCJIIIIFHHGHDDFDD?C     NM:i:1  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      163     chr1    15      40      75M     =       41      101     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC     144=BDHHHHHJJJJJJJJJJJJJJJJJIJJJJJJGIJJFHJJJJJJJIJIJJIJIGIHHHHFFFFFEEDEDCDB     NM:i:0  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      83      chr1    41      40      69M6S   =       15      -101    CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTCCAC     DABBCDEEDBDFFFFHAHHJIGDJIIHFFJJJIHFJJHHGBJJIHHDJJIGHHJJIHHGJJJIHHHHFCFEFFFC     NM:i:0  XS:A:-  NH:i:1
    *TOPHAT File: accepted_hits.sam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:2206:17700:177847      113     chr1    5783    1       28M659N71M      =       5783    0       TCGACCACTTCCCTGGCAGCTCCCTGGACTGAAGGAGACGTGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGTT     DDDCDDCDDDBBCC<3(BDDBDDDDDDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFHHHHJJJJJJJJIIJIJJIJJJJJJJJJHHHHHFFDB=+@     NM:i:4  XS:A:-  NH:i:3  CC:Z:chrX       CP:i:154906249
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        129     chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7  CC:Z:chr15      CP:i:100331773
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        65      chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7

  • #2
    Hi zorph,

    I too am trying to get cufflinks to read a GSNAP generated SAM file. Just curious, were those XS:A:- or XS:A:+ tags automatically inserted by GSNAP OR you manually inserted them?
    If you manually inserted these tags, how did you figure out strand information (+ or -) ?
    Thanks

    Comment


    • #3
      I'll third this complaint ... with SAM generated by GMAP. I've checked, and all of the records with N's in CIGAR strings do have XS:A:[+-] tags. I had wondered if there was a specific order that cufflinks is expecting the tags to be in, but the OP's example doesn't deviate from the example SAM lines given in the manual, so that seems unlikely.

      Please post if either of you crack this case.

      Comment


      • #4
        The latest version of GSNAP says that ( version released on 2012-04-27 ) it adds the XS tags, so one doe not have to do this manually.

        The XS tag is added to spliced reads and it tells information about which strand the read came from (not the strand it aligned to.) The cufflinks manual says that

        This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).
        Note the strand it aligned to is easy to get from sam flag. But, getting the strand info of RNA it came from is tricky in unstranded sequencing. TopHat uses splice junction information to infer that. One can manually try to add the XS tag based on the sequence info at the splice junction of the alignment. TopHat manual says that

        With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns

        Comment


        • #5
          I should have mentioned that I found my issue. I was careless before in saying that all my spliced alignments had XS:A:[+-] tags. Some of them instead have XS:A:? tags (presumably where the transcript's strand couldn't be determined from the sequence at the edges of the splice?) - and when I removed these undetermined XS tags, Cufflinks doesn't give me that error anymore. Hope this helps someone.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X