Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • using STAR+Cufflinks for transcript assembly turns unstranded RNA-seq to stranded?

    I am trying to use STAR+Cufflinks to do a reference based transcript assembly using unstranded RNA-seq data.

    As mentioned in the STAR manual "If you have un-stranded RNA-seq data, and wish to run Cufflinks/Cuffdiff on STAR alignments, you will
    need to run STAR with --outSAMstrandField intronMotif option, which will generate the XS strand attribute for all alignments that contain splice junctions"

    Thus in the generated SAM file, strand will be derived from the intron motif. Unstranded RNA-seq data will be assigned a strand, which results in a lot of genes have both sense and antisense transcripts in the merged transcript assembly.

    My questions are:

    1) how reliable is the derived strand info from intron motif?
    2) Is the assembled transcripts affected by this?

    Thank you very much!

    Runxuan

  • #2
    hi,
    Your un-stranded data doesn't get 'converted to stranded'. An un-stranded data would have reads from both strands as PCR amplification (during library prep.) amplifies both strands of the DNA.

    The derived strand by STAR is based on alignment of any particular read and is not necessarily reflecting the strand due to the above reason.

    Regarding whether assembly would be affected or not => Cufflinks wont run without the XS attribute in the SAM/BAM file.

    Comment


    • #3
      thanks a lot, amitm. if the strand attribute from STAR feeding into cufflink is not really the strand information, is it going to affect how cufflink uses the info to assemble the transcripts? How should i deal with the sense and antisense assembled transcripts to reduce false positives?

      Comment


      • #4
        hi,
        If you are worried about a scenario where a gene locus has no/minimal sense transcription but very high antisense transcription and then Cufflinks not able to differentiate then you might need to do prepare a Stranded library before sequencing.

        If not then at data analysis step there is very minimal you could do -
        1) Do you know the sequence of these antisense? Do they maintain the exon intron boundary (introns spliced off), but just in complementary strand? Or do they read through introns? If they read through introns then you can set an arbitrary threshold (depending on your read length) saying -
        If a read extends beyond the exon boundary into the intron sequence for at least 'n' bases, then it might be from an unspliced transcript/ antisense. Hence discard the read. then use the filtered reads only for transcript assembly.

        Doing so genome-wide would be very tricky as there might be genuine transcripts with alternate exon start-ends.

        I'm not aware of your organism, but if it is something that has been widely studied then there would be datasets available around & PCR validations to cross-check your results for.
        Last edited by amitm; 07-21-2015, 08:45 AM. Reason: Corrected typo

        Comment


        • #5
          Since it wasn't mentioned yet I'll add that cufflinks determines the strand of the assembled isoforms from the value of the XS attribute in the alignments (generated by STAR with --outSAMstrandField intronMotif set at runtime). The XS attribute is only populated with strand information for spliced reads. The 4-bp motif at the splice site informs STAR what the strand is if the motif is a known one. If it is an unknown motif then there is no strand information. 90+% of splices will have those known motifs in mammalian genomes. The only other way cufflinks can determine strand is if you provide a reference GTF for assembly in which case it will use the strand information from that for matching assembled isoforms from the data.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Originally posted by sdriscoll View Post
            The only other way cufflinks can determine strand is if you provide a reference GTF for assembly in which case it will use the strand information from that for matching assembled isoforms from the data.
            but this is not necessarily correct strand information if i use an unstranded RNA-seq data, isn't it?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advanced Tools Transforming the Field of Cytogenomics
              by seqadmin


              At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
              09-26-2023, 06:26 AM
            • seqadmin
              How RNA-Seq is Transforming Cancer Studies
              by seqadmin



              Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
              09-07-2023, 11:15 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 09-29-2023, 09:38 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-27-2023, 06:57 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-26-2023, 07:53 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-25-2023, 07:42 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Working...
            X