Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strand-specific libraries / firststrand /secondstrand

    Hey,

    I`m still very uncertain when dealing with strand specific RNA-Seq data. Especially when using TopHat2 and Cufflinks, as these make use of the strand-information via the library-types.

    I found this table for the TopHat2 / Cufflinks library type options: http://www.nature.com/nprot/journal/...12.016_T1.html

    In my data I can clearly see that the R1 (forward) read maps on the sense/coding strand and the R2 (reverse) read maps on the antisense strand.

    Illustration:

    a) gene located on wat (+) strand

    ......................R1
    .....................----->
    --------------[############# Gene ##############]-------------------- wat (+)
    --------------------------------------------------------------------------------------------- cri (-)
    ..........................................................<------
    ............................................................R2


    b) gene located on cri (-) strand

    .......................R2
    ......................----->
    --------------------------------------------------------------------------------------------- wat (+)
    --------------[############# Gene ##############]-------------------- cri (-)
    ..........................................................<-----
    ............................................................R1


    This would mean (according to my link) that I have fr-secondstrand. As
    the leftmost end of the fragment (in transcript coordinates) is the first sequenced
    Am I correct with this assumption?

    What I still do not get are the terms "firststrand" and "secondstrand" themselves.

    My understanding of the library prep is the following (leaving out fragmentation):

    1) Transcription

    5' [###########Gene############] 3' coding strand
    3' -------------------------------------------------- 5' template strand

    5' -------------------------------------------------- 3' mRNA


    2) Adapter Ligation (Lets assume 5'Adapter seq is only AATT and 3'Adapter seq only GGCC)

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters


    3) 1st strand synthesis

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    4) 2nd strand synthesis

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA <---- identical (U->T) to mRNA
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    Let`s skip the PCR


    5a) Sequencing 1st cDNA strand

    5'........SeqPrimer----->
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA

    As I see it, I now get a read, which is identical to a part of the mRNA sequence located at the left end.


    5b) Sequencing 2nd cDNA strand

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA
    .....................................<-----remirPqeS..........5'

    Now I should get a read whose reverse complement is identical to a part of the mRNA sequene located at the right end.




    With this understanding of the library prep I would say that if my R1 (forward) read is located on the sense/coding strand I would have sequenced the 1st strand first, but according to my link it must have been "secondstrand".

    I hope anyone is able to understand me and detects my misinterpretation of the first/secondstrand terms or my misinterpretation of the library prep.

    Thanks in advance

    Mario
    Last edited by Mchicken; 07-15-2015, 06:06 AM.

  • #2
    Better to determine this setting empirically:
    Run TopHat+Cufflinks pipeline separately with either firststrand or secondstrand options.
    Then assuming your annotation file matches your library somewhat,
    the version with much larger alignment and FPKM numbers will be the correct option for your library prep method.

    Comment


    • #3
      First of all thanks for your advice. I already read this way of library determination somewhere.
      Nevertheless there should be a logical explanation anyway. The company, which sequenced our samples told us yesterday, that indeed the R1 read corresponds to the sense/coding strand, like I observed when I mapped my paired-end data with TopHat2 (using library-type unstranded).

      Comment


      • #4
        Honestly... when I first wrote the code to handle firststrand/secondstrand, it took me a week of going back and forth and talking to different people who make libraries because the description in the Tuxedo package is so incredibly confusing. They should be named clearly, as in:

        READ1-PLUS protocol and READ1-MINUS protocol, or READ1-SENSE, or something like that.

        Every time I am asked questions about this I have to go back to the comments in my source code because the names are so vague and the official descriptions so opaque as to be meaningless.

        Comment


        • #5
          Brian's right; the terminology is confusing.

          Regarding your original questions, the orientation of the gene on the DNA (Watson or Crick strand) is irrelevant. The quoted statement ["the leftmost end of the fragment (in transcript coordinates) is the first sequenced"] indicates that read1 proceeds in the 5'->3' orientation of the mRNA.

          As for your second question, strandedness (for TopHat) refers to the sequence being generated. In diagram 5a, the first cDNA strand is the template, which means that the sequence is identical to the second cDNA strand.

          Comment


          • #6
            Okay now to summarize:

            In my case, indeed the library-type is fr-secondstrand as the R1 (forward) read maps in 5' to 3' direction of the mRNA.

            And the reason to call it fr-secondstrand is that the first cDNA strand only served as template for the generation of the R1 read, which is identical to the "second strand" (leading to the name fr-secondstrand).

            Up to now I used fr-unstranded as library type parameter, which also gave me good results. But I think in future I will be using the correct library type and hope that this will improve my result further.


            Thank your very much guys, this issue has been a mystery for a long time for me and now I finally get it

            Comment


            • #7
              Hi guys,
              i apologize for reviving the thread but i am also a bit confused about the stranded RNA-seq.
              I have some Illumina PE data which is stranded but i dont know how the library was generated. I received bam files aligned with TopHat. So i used RSeQC's 'infer_experiment.py' command to tell me how the libraries are stranded.
              So for one of them i get: 1++,1–,2+-,2-+ and for the other 1+-,1-+,2++,2–. Now my problem is to link this info to TopHat fr-firststrand or fr-secondstrand. From what i have read so far on the web it seems to me that:
              - fr-secondstrand corresponds to 1++,1–,2+-,2-+
              - fr-firststrand corresponds to 1+-,1-+,2++,2–

              Is that right?

              Asking because i wonder if the alignment could be improved if the appropriate library type is used. As of now default unstranded was used.

              Thank you for your help time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 12:50 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              181 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-28-2025, 12:58 PM
              0 responses
              275 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-24-2025, 02:48 PM
              0 responses
              663 views
              0 likes
              Last Post seqadmin  
              Working...
              X