Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat and cufflinks of non-strand-specific reads

    Hi all,

    I have questions regarding the strand info after TopHat and Cufflinks in NON-strand-specific experiments. I tried to look for the answers in the forum and in the web – but did not manage, so I am posting it here.

    I have Illumina RNA seq reads, from a library that was prepared with NON-strand specific protocol.

    1. I saw in tophat manual that TopHat will treat reads as strand specific. What option should I use when running TopHat if my reads are not strand specific?

    2. After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
    In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?

    3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?

    Thanks a lot.

  • #2
    Hi gfmgfm,
    did you manage to get an answer yet to your question? I'm especially interested in number one... I use Cufflinks for expression analysis and I'm not sure, whether it will use reads from both strands to calculate the FPKM value or just the reads which are on the sense-strand...

    Thanks

    Comment


    • #3
      From the manual
      --library-type TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol.
      If you look below at the possible flags:
      Library Type Examples Description
      fr-unstranded Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
      fr-firststrand dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
      fr-secondstrand Ligation, Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
      I.e. even though the manual is saying that Tophat will treat all reads as stranded, the first option is giving you a strand-unaware alignment.

      After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
      You will get a strand for all reads. Even if a read does not go over a splice junction, it can usually be uniquely mapped to the + or - strand of DNA that the RNA came from, i.e.

      DNA
      (+) ATGCCGAGAGAGAGTTCAGAGAGATTCG
      (-) TACGGCTCTCTCTCAAGTCTCTCTAAGC

      Read
      GCCGAGAGAG

      Will map to + strand only, and be reported to you as +
      In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?
      Hopefully this will be the same, since the splice site should be in the same "direction" as your actual read.

      3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?
      Look at the information on the cufflinks manual/how it works page. But, as mentioned a lot on this forum, cufflinks is very far from perfect, and I would strongly recommend you run it giving it a reference annotation.

      Comment


      • #4
        Thanks dvanic for your reply.
        So is it right that the '--library-type' option is used only in tophat for the junction finding and has no effect on cufflinks and cuffdiff (even though they have the same option)?

        I analyzed my strand-specific SOLiD reads (2 conditions having 3 replicates each, total around 100M paired-end reads of a fungi) once using fr-secondstrand and once using fr-unstranded, and the only difference I got are some minor variations on the junction positions.

        I would have expected that using the "fr-secondstrand" option, only reads which align to the sense strand of the gene/transcript would be taken into account for the expression calculation. However, this doesn't seem to be the case because, because there are no major differences between "fr-secondstrand" and "fr-unstranded". This is true even for certain genes which have only anti-sense reads aligned to them.

        It just seems that there is not much use in using stranded information, or am I wrong?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-21-2024, 07:49 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-20-2024, 07:23 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-17-2024, 06:54 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-14-2024, 07:24 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Working...
        X