Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • manual XS:A[-|+] assignment for cufflinks

    Hi,
    I am using GSNAP and I have assigned the strand to my reads. I have tried to format my bam file in same way tophat's accepted_hits.bam file is formatted (I have placed an example of this below). However, when I run my sorted bam file through cufflinks, I get the following lines:

    "BAM record error: found spliced alignment without XS attribute"

    Does anyone have any idea as to what I could be doing wrong with my formatting? Thanks.

    *Altered GSNAP file: output.bam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:1207:12974:19520       147     chr1    4610    40      83M140N16M      =       4311    0       CGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGTGCCTCATGAC     DDCADDC@CCDA>CC@@@DDDCADDDDDDDDCDCCCB?DCDDCFFHHBHHDGC=CJIIJGHCHEIIGEHE3JJJJIIHFEGCJIIIIFHHGHDDFDD?C     NM:i:1  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      163     chr1    15      40      75M     =       41      101     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC     144=BDHHHHHJJJJJJJJJJJJJJJJJIJJJJJJGIJJFHJJJJJJJIJIJJIJIGIHHHHFFFFFEEDEDCDB     NM:i:0  XS:A:-  NH:i:1
    D8FF8JN1:127:C066VACXX:7:1103:12412:135532      83      chr1    41      40      69M6S   =       15      -101    CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTCCAC     DABBCDEEDBDFFFFHAHHJIGDJIIHFFJJJIHFJJHHGBJJIHHDJJIGHHJJIHHGJJJIHHHHFCFEFFFC     NM:i:0  XS:A:-  NH:i:1
    *TOPHAT File: accepted_hits.sam*
    HTML Code:
    D8FF8JN1:127:C066VACXX:8:2206:17700:177847      113     chr1    5783    1       28M659N71M      =       5783    0       TCGACCACTTCCCTGGCAGCTCCCTGGACTGAAGGAGACGTGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGTT     DDDCDDCDDDBBCC<3(BDDBDDDDDDEDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFHHHHJJJJJJJJIIJIJJIJJJJJJJJJHHHHHFFDB=+@     NM:i:4  XS:A:-  NH:i:3  CC:Z:chrX       CP:i:154906249
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        129     chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7  CC:Z:chr15      CP:i:100331773
    D8FF8JN1:127:C066VACXX:8:1307:2101:94989        65      chr1    6590    0       39M88N60M       =       6590    0       GGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGCCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAG     CCFFFFFHHHHGJ@BD<DDDCECEEDDDDDDDDDDDDDCCDDDDDDDDDDDDCDDDDBB@CDDCCC@@>@B(4>?9:?B?CB?>@DCDAA@AC+4:A@C     NM:i:1  XS:A:-  NH:i:7

  • #2
    Hi zorph,

    I too am trying to get cufflinks to read a GSNAP generated SAM file. Just curious, were those XS:A:- or XS:A:+ tags automatically inserted by GSNAP OR you manually inserted them?
    If you manually inserted these tags, how did you figure out strand information (+ or -) ?
    Thanks

    Comment


    • #3
      I'll third this complaint ... with SAM generated by GMAP. I've checked, and all of the records with N's in CIGAR strings do have XS:A:[+-] tags. I had wondered if there was a specific order that cufflinks is expecting the tags to be in, but the OP's example doesn't deviate from the example SAM lines given in the manual, so that seems unlikely.

      Please post if either of you crack this case.

      Comment


      • #4
        The latest version of GSNAP says that ( version released on 2012-04-27 ) it adds the XS tags, so one doe not have to do this manually.

        The XS tag is added to spliced reads and it tells information about which strand the read came from (not the strand it aligned to.) The cufflinks manual says that

        This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).
        Note the strand it aligned to is easy to get from sam flag. But, getting the strand info of RNA it came from is tricky in unstranded sequencing. TopHat uses splice junction information to infer that. One can manually try to add the XS tag based on the sequence info at the splice junction of the alignment. TopHat manual says that

        With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns

        Comment


        • #5
          I should have mentioned that I found my issue. I was careless before in saying that all my spliced alignments had XS:A:[+-] tags. Some of them instead have XS:A:? tags (presumably where the transcript's strand couldn't be determined from the sequence at the edges of the splice?) - and when I removed these undetermined XS tags, Cufflinks doesn't give me that error anymore. Hope this helps someone.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Genetic Variation in Immunogenetics and Antibody Diversity
            by seqadmin



            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
            11-06-2024, 07:24 PM
          • seqadmin
            Choosing Between NGS and qPCR
            by seqadmin



            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
            10-18-2024, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 11-08-2024, 11:09 AM
          0 responses
          138 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-08-2024, 06:13 AM
          0 responses
          112 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-01-2024, 06:09 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-30-2024, 05:31 AM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Working...
          X