Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Cufflink strand calling

    Hi all,

    I have 2 questions about strand calling..

    1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

    [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

    But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??


    2) Secondly, as a way of benchmarking strand calling accuracy of Cufflinks, for each Cufflinks transcript, I have been looking for known genes that overlap with the predicted transcript and compared the strand predicted by cufflinks to the strand of the known gene. I consider a cufflinks prediction wrong if all of the known overlapping genes have the opposite strand to the transcript.

    In my analysis, almost 40% of transcripts for which Cufflinks has assigned a strand were on the wrong strand (by my definition above). That seems a pretty high number. So just wanted to know, has anyone else tried something like this.. what kind of results did you get?

    Also, do you think I might be doing something wrong that is causing the inaccurate strand calling? Any ideas on how I might improve it?

    I am working with Illumina unstranded rna-seq reads.

    thanks..

  • #2
    Hi avi,

    I am finding similar results. Did you find any answers to your questions?

    Thanks.

    Comment


    • #3
      Hi joro,

      No, i still didn't get any answers to how Cufflinks does its strand calling for single exon transcripts.

      But when I looked at the "wrong" strand assigned transcripts, I found that a huge majority of wrong strand assignments came from single exon transcripts. So going forward, I decided to use the cufflink strand assignments only for multi-exon transcripts. I am assuming all single exon transcripts as strand unknown even if cufflinks assigns it a strand.

      On a related note.. i recently realised that my data itself might not be very good and posted a question about that http://seqanswers.com/forums/showthread.php?t=14416. I haven't got an answer to that yet, but you might also want to check that about your data.

      cheers..

      Comment


      • #4
        Thanks for your quick reply avi.

        Out of interest, did you specify a library type when running Cufflinks?

        I'm using SOLiD stranded rna-seq reads so it's interesting that we experienced the same problem.

        Comment


        • #5
          Originally posted by avi View Post
          Hi all,

          I have 2 questions about strand calling..

          1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

          [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

          But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??
          I think assigning a strand for a transcript depends on which genome sequence on which the reads are mapping. The strand should be + if reads are mapped on the sense strand, and vice versa.

          Cheers,

          Comment


          • #6
            I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

            I don't know if avi finds the same thing?

            Comment


            • #7
              Originally posted by joro View Post
              I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

              I don't know if avi finds the same thing?
              I don't know if I have catched your meaning.
              But I guess it depends on which strand is defined "sense" or "antisense". I think it is reasonable to assign "+" to an "antisense" strand. You can check it comparing to strands of multi-exon transcripts with their reads.

              Cheers,

              Comment


              • #8
                Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.

                Comment


                • #9
                  Originally posted by joro View Post
                  Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.
                  I am wondering how you could know some assigned strands is wrong. Have you got reference transcripts for them?

                  Comment


                  • #10
                    Yes, I have looked at reference genes that overlap with the predicted transcripts. Also, all the reads map to the same strand as the reference gene so it seems that Cufflinks assigns the opposite strand.

                    Comment


                    • #11
                      Hi Hunny,

                      Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                      This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.

                      @joro: No, I didn't specify a library type. Thats very strange if you are getting wrong strand predictions even though your data is strand specific. But your guess is as good as mine here. Hopefully someone with more experience might be able to clear this up for us.

                      Comment


                      • #12
                        Hi avi,

                        Originally posted by avi View Post
                        Hi Hunny,

                        Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                        This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.
                        Yes, I see. I've just now checked my predicted transcripts from Cufflinks. And I find that Cufflinks does not assign any strand information on some of my single-exon transcripts(because I haven't checked all of them), but assigns with a dot in the strand field of GTF file.

                        I am using Tophat-1.3.1 and Cufflinks-1.0.3 with default options.

                        Cheers,

                        Comment


                        • #13
                          This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                          @Hunny, what reads are you working with? Did you specify a library type?

                          Thanks.

                          Comment


                          • #14
                            Originally posted by joro View Post
                            This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                            @Hunny, what reads are you working with? Did you specify a library type?

                            Thanks.
                            I am working with Illumina single-end reads.
                            No, I didn't specify a library type, just with default options.

                            Cheers,

                            Comment


                            • #15
                              I encountered the same problem. I found strand error and overlap between transcripts.
                              Detail in http://seqanswers.com/forums/showthread.php?t=26555
                              I am confused also.
                              github:
                              https://github.com/Bioinformatics-and-Genomics

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 02:20 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-07-2024, 06:58 AM
                              0 responses
                              181 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:18 AM
                              0 responses
                              228 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:04 AM
                              0 responses
                              185 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X