Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strand-specific RNA-seq, tophat, RSeQC

    Hi guys,
    i know there are similar threads about that problem and yet it seems that is still quite confusing. Hell, is still confusing to me so i'd like to share my experience with it.

    So i got two sets of PE RNA-seq as BAM files, reads were aligned with TopHat. After a while i've been told it was ssRNA-seq. Okay, but stranded how i haven't been told. Okay. Found about RSeQC and there i find 'infer_experiment.py' which tells me how the reads are stranded. Seems there are two ways: 1++,1–,2+-,2-+ and 1+-,1-+,2++,2–. From RSeQC site:

    1. 1++,1–,2+-,2-+

    read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
    read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
    read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
    read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

    2. 1+-,1-+,2++,2–

    read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
    read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
    read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
    read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

    So far so good. One of my sets is case.1 the other case.2. So i figure out how to split the reads so i can analyze the data properly. But during analysis in IGV i see weird things so i continue reading on the web and figure out that my reads were not aligned with 'library-type' option but just with default setting aka fr-unstranded. So i thought,okay, maybe weirdness comes from the mapping. So i read tophat manual which makes it even more weird to understand what exactly 'library-type' one needs in each case of strandedness according to RSeQC. Anyhow, i figure out that case.1 should be 'fr-secondstrand' and case.2 should be 'fr-firststrand'. So i aligned myself the reads with the appropriate 'library-type' option of tophat and then reran 'infer_experiment.py' on original bam and the bam i created.
    Output from infer_experiment.py:
    --original bam--
    Fraction of reads failed to determine: 0.0010
    Fraction of reads explained by "1++,1--,2+-,2-+": 0.9123
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.0868

    --my bam--
    Fraction of reads explained by "1++,1--,2+-,2-+": 0.9124
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.0867

    Seems to me there is no significant improvement. So my weird stuff which i see in IGV should be legit.

    However this mapping experiment leads me to believe that there is no need for 'library-type' option in TopHat and one must not care much about it. Just map and then see how the library was stranded with RSeQC. Extract the reads properly and proceed with analysis.

    Well that was my odyssey
    Cheers
    D.

    PS: if i got it all wrong please correct me. it will be appreciated.

  • #2
    From what I understand, the reason you get such similar results using 'fr-unstranded' even though it should be 'fr-firststrand' is because there isn't much antisense transcription in your system. I would be careful about mapping and then extracting, because it may affect the exact set of alignments you get depending on parameters (e.g. multihits).

    Comment


    • #3
      According to the TopHat manual page:
      If either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below.
      The alignment should be (more or less) the same, but the reads will be attributed with an XS flag. This flag is necessary for Cufflinks:
      ... This attribute, which must have a value of “+” or “-“, indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records ...
      Thus, if you want to proceed with Cufflinks you need the correct library-type option in TopHat in order to get a good transcript estimation.
      It may be the case that also TopHat uses the library-type option for spliced reads...

      From what I've seen so far is that your strandedness depends on the method (e.g. TruSeq stranded 90-97%) and your RNA input (the better the RIN, the better the strandedness).

      Edit:
      RSeQC uses only a subset of reads (200000 per default). Therefore, small fluctuations, as shown in your example, are expectable.
      Last edited by Michael.Ante; 08-24-2015, 02:33 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 12:17 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-29-2024, 10:49 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-25-2024, 11:49 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Working...
      X