Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Mchicken
    Member
    • Jan 2014
    • 39

    strand-specific libraries / firststrand /secondstrand

    Hey,

    I`m still very uncertain when dealing with strand specific RNA-Seq data. Especially when using TopHat2 and Cufflinks, as these make use of the strand-information via the library-types.

    I found this table for the TopHat2 / Cufflinks library type options: http://www.nature.com/nprot/journal/...12.016_T1.html

    In my data I can clearly see that the R1 (forward) read maps on the sense/coding strand and the R2 (reverse) read maps on the antisense strand.

    Illustration:

    a) gene located on wat (+) strand

    ......................R1
    .....................----->
    --------------[############# Gene ##############]-------------------- wat (+)
    --------------------------------------------------------------------------------------------- cri (-)
    ..........................................................<------
    ............................................................R2


    b) gene located on cri (-) strand

    .......................R2
    ......................----->
    --------------------------------------------------------------------------------------------- wat (+)
    --------------[############# Gene ##############]-------------------- cri (-)
    ..........................................................<-----
    ............................................................R1


    This would mean (according to my link) that I have fr-secondstrand. As
    the leftmost end of the fragment (in transcript coordinates) is the first sequenced
    Am I correct with this assumption?

    What I still do not get are the terms "firststrand" and "secondstrand" themselves.

    My understanding of the library prep is the following (leaving out fragmentation):

    1) Transcription

    5' [###########Gene############] 3' coding strand
    3' -------------------------------------------------- 5' template strand

    5' -------------------------------------------------- 3' mRNA


    2) Adapter Ligation (Lets assume 5'Adapter seq is only AATT and 3'Adapter seq only GGCC)

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters


    3) 1st strand synthesis

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    4) 2nd strand synthesis

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA <---- identical (U->T) to mRNA
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    Let`s skip the PCR


    5a) Sequencing 1st cDNA strand

    5'........SeqPrimer----->
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA

    As I see it, I now get a read, which is identical to a part of the mRNA sequence located at the left end.


    5b) Sequencing 2nd cDNA strand

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA
    .....................................<-----remirPqeS..........5'

    Now I should get a read whose reverse complement is identical to a part of the mRNA sequene located at the right end.




    With this understanding of the library prep I would say that if my R1 (forward) read is located on the sense/coding strand I would have sequenced the 1st strand first, but according to my link it must have been "secondstrand".

    I hope anyone is able to understand me and detects my misinterpretation of the first/secondstrand terms or my misinterpretation of the library prep.

    Thanks in advance

    Mario
    Last edited by Mchicken; 07-15-2015, 06:06 AM.
  • EricHaugen
    Member
    • Sep 2009
    • 13

    #2
    Better to determine this setting empirically:
    Run TopHat+Cufflinks pipeline separately with either firststrand or secondstrand options.
    Then assuming your annotation file matches your library somewhat,
    the version with much larger alignment and FPKM numbers will be the correct option for your library prep method.

    Comment

    • Mchicken
      Member
      • Jan 2014
      • 39

      #3
      First of all thanks for your advice. I already read this way of library determination somewhere.
      Nevertheless there should be a logical explanation anyway. The company, which sequenced our samples told us yesterday, that indeed the R1 read corresponds to the sense/coding strand, like I observed when I mapped my paired-end data with TopHat2 (using library-type unstranded).

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Honestly... when I first wrote the code to handle firststrand/secondstrand, it took me a week of going back and forth and talking to different people who make libraries because the description in the Tuxedo package is so incredibly confusing. They should be named clearly, as in:

        READ1-PLUS protocol and READ1-MINUS protocol, or READ1-SENSE, or something like that.

        Every time I am asked questions about this I have to go back to the comments in my source code because the names are so vague and the official descriptions so opaque as to be meaningless.

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          Brian's right; the terminology is confusing.

          Regarding your original questions, the orientation of the gene on the DNA (Watson or Crick strand) is irrelevant. The quoted statement ["the leftmost end of the fragment (in transcript coordinates) is the first sequenced"] indicates that read1 proceeds in the 5'->3' orientation of the mRNA.

          As for your second question, strandedness (for TopHat) refers to the sequence being generated. In diagram 5a, the first cDNA strand is the template, which means that the sequence is identical to the second cDNA strand.

          Comment

          • Mchicken
            Member
            • Jan 2014
            • 39

            #6
            Okay now to summarize:

            In my case, indeed the library-type is fr-secondstrand as the R1 (forward) read maps in 5' to 3' direction of the mRNA.

            And the reason to call it fr-secondstrand is that the first cDNA strand only served as template for the generation of the R1 read, which is identical to the "second strand" (leading to the name fr-secondstrand).

            Up to now I used fr-unstranded as library type parameter, which also gave me good results. But I think in future I will be using the correct library type and hope that this will improve my result further.


            Thank your very much guys, this issue has been a mystery for a long time for me and now I finally get it

            Comment

            • kenietz
              Member
              • Nov 2011
              • 86

              #7
              Hi guys,
              i apologize for reviving the thread but i am also a bit confused about the stranded RNA-seq.
              I have some Illumina PE data which is stranded but i dont know how the library was generated. I received bam files aligned with TopHat. So i used RSeQC's 'infer_experiment.py' command to tell me how the libraries are stranded.
              So for one of them i get: 1++,1–,2+-,2-+ and for the other 1+-,1-+,2++,2–. Now my problem is to link this info to TopHat fr-firststrand or fr-secondstrand. From what i have read so far on the web it seems to me that:
              - fr-secondstrand corresponds to 1++,1–,2+-,2-+
              - fr-firststrand corresponds to 1+-,1-+,2++,2–

              Is that right?

              Asking because i wonder if the alignment could be improved if the appropriate library type is used. As of now default unstranded was used.

              Thank you for your help time

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              23 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              28 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              22 views
              0 reactions
              Last Post SEQadmin2  
              Working...