Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strand specific RNAseq not strand specific

    We are using a slightly modified version of the following protocol to generate strand-specific RNAseq libraries for Arabidopsis thaliana:
    The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2nd strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

    It is a dUTP method with a UDG digestion of the 2nd strand before PCR amplification.

    We ran a test lane of 24 multiplexed libraries, and everything looked OK (~90% read pairs mapped with tophat2 using fr-firststrand). But RSeQC's infer_experiment.py function finds only ~ 60% of reads are oriented the right way. Here's an example output.

    This is PairEnd Data
    Fraction of reads failed to determine: 0.0073
    Fraction of reads explained by "1++,1--,2+-,2-+": 0.3922
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.6004

    The statistics are similar for the other libraries.

    Is there a likely place where our protocol may have gone wrong and lost the strand specificity?

    Our protocol has a couple modifications:
    -we are using Superscript III instead of Superscript II
    -we are using the inline barcodes from this protocol: Kumar et al 2012

    Thanks.

  • #2
    Hi Druncie,

    you migth re-run RSeQC's infer_experiment with a more stringent setting:
    • Limit the reference gene model to genes w/o known antisense/overlapping elements (e.g. start with a couple of your favourite genes)
    • Increase the sample size
    • Increase minimum mapping quality

    The genomic strandedness is obfuscated by a lot of ambiguity and the dynamic range within the transcript abundances.
    If you need the correct measure for strand-specificity, I suggest to use the ERCC spike-in transcripts for further experiments. They have known length, known concentrations, and have a defined strand-specificity.

    Comment


    • #3
      Originally posted by Michael.Ante View Post
      Hi Druncie,

      you migth re-run RSeQC's infer_experiment with a more stringent setting:
      • Limit the reference gene model to genes w/o known antisense/overlapping elements (e.g. start with a couple of your favourite genes)
      • Increase the sample size
      • Increase minimum mapping quality

      The genomic strandedness is obfuscated by a lot of ambiguity and the dynamic range within the transcript abundances.
      If you need the correct measure for strand-specificity, I suggest to use the ERCC spike-in transcripts for further experiments. They have known length, known concentrations, and have a defined strand-specificity.
      Hi Michael.Ante,

      Thank you for these suggestions. I actually noticed this first by looking at a few of my favorite genes with IGV. I've gone ahead and re-run RSeQC gene-by-gene and the results are pretty consistent for each gene. In the attached plot, I show the percentage of forward (1+-,1-+,2++,2--) read pairs for each transcript against the total number of reads on that transcript. As #reads goes up, the percentage of forward direction reads converges to ~60%. So, it doesn't seem to be a problem with specific genes with lots of antisense transcription. All of this is done only with uniquely-mapped reads.

      Next time, we'll see about adding some ERCC spike-ins. But any suggestions for where in the protocol something may have gone wrong?

      Thanks
      Attached Files

      Comment


      • #4
        Here are a couple suggestions from off this forum:

        -our protocol does not uses Actinomycin D during 1st strand synthesis with Superscript III, possibly allowing self-priming and 2nd strand synthesis during the RT reaction.
        -our protocol uses UDG, rather than the USER mix (with DNA glycosylase-lyase Endonuclease VIII to break the backbone of the UTP containing strand)

        Does anyone have any experience that would suggest that either of these would be the cause of the strand-specificity failure? Is adding ActD going to significantly reduce the efficiency of the 1st strand synthesis as suggested in this forum?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X