Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Cufflink strand calling

    Hi all,

    I have 2 questions about strand calling..

    1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

    [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

    But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??


    2) Secondly, as a way of benchmarking strand calling accuracy of Cufflinks, for each Cufflinks transcript, I have been looking for known genes that overlap with the predicted transcript and compared the strand predicted by cufflinks to the strand of the known gene. I consider a cufflinks prediction wrong if all of the known overlapping genes have the opposite strand to the transcript.

    In my analysis, almost 40% of transcripts for which Cufflinks has assigned a strand were on the wrong strand (by my definition above). That seems a pretty high number. So just wanted to know, has anyone else tried something like this.. what kind of results did you get?

    Also, do you think I might be doing something wrong that is causing the inaccurate strand calling? Any ideas on how I might improve it?

    I am working with Illumina unstranded rna-seq reads.

    thanks..

  • #2
    Hi avi,

    I am finding similar results. Did you find any answers to your questions?

    Thanks.

    Comment


    • #3
      Hi joro,

      No, i still didn't get any answers to how Cufflinks does its strand calling for single exon transcripts.

      But when I looked at the "wrong" strand assigned transcripts, I found that a huge majority of wrong strand assignments came from single exon transcripts. So going forward, I decided to use the cufflink strand assignments only for multi-exon transcripts. I am assuming all single exon transcripts as strand unknown even if cufflinks assigns it a strand.

      On a related note.. i recently realised that my data itself might not be very good and posted a question about that http://seqanswers.com/forums/showthread.php?t=14416. I haven't got an answer to that yet, but you might also want to check that about your data.

      cheers..

      Comment


      • #4
        Thanks for your quick reply avi.

        Out of interest, did you specify a library type when running Cufflinks?

        I'm using SOLiD stranded rna-seq reads so it's interesting that we experienced the same problem.

        Comment


        • #5
          Originally posted by avi View Post
          Hi all,

          I have 2 questions about strand calling..

          1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

          [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

          But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??
          I think assigning a strand for a transcript depends on which genome sequence on which the reads are mapping. The strand should be + if reads are mapped on the sense strand, and vice versa.

          Cheers,

          Comment


          • #6
            I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

            I don't know if avi finds the same thing?

            Comment


            • #7
              Originally posted by joro View Post
              I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

              I don't know if avi finds the same thing?
              I don't know if I have catched your meaning.
              But I guess it depends on which strand is defined "sense" or "antisense". I think it is reasonable to assign "+" to an "antisense" strand. You can check it comparing to strands of multi-exon transcripts with their reads.

              Cheers,

              Comment


              • #8
                Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.

                Comment


                • #9
                  Originally posted by joro View Post
                  Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.
                  I am wondering how you could know some assigned strands is wrong. Have you got reference transcripts for them?

                  Comment


                  • #10
                    Yes, I have looked at reference genes that overlap with the predicted transcripts. Also, all the reads map to the same strand as the reference gene so it seems that Cufflinks assigns the opposite strand.

                    Comment


                    • #11
                      Hi Hunny,

                      Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                      This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.

                      @joro: No, I didn't specify a library type. Thats very strange if you are getting wrong strand predictions even though your data is strand specific. But your guess is as good as mine here. Hopefully someone with more experience might be able to clear this up for us.

                      Comment


                      • #12
                        Hi avi,

                        Originally posted by avi View Post
                        Hi Hunny,

                        Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                        This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.
                        Yes, I see. I've just now checked my predicted transcripts from Cufflinks. And I find that Cufflinks does not assign any strand information on some of my single-exon transcripts(because I haven't checked all of them), but assigns with a dot in the strand field of GTF file.

                        I am using Tophat-1.3.1 and Cufflinks-1.0.3 with default options.

                        Cheers,

                        Comment


                        • #13
                          This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                          @Hunny, what reads are you working with? Did you specify a library type?

                          Thanks.

                          Comment


                          • #14
                            Originally posted by joro View Post
                            This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                            @Hunny, what reads are you working with? Did you specify a library type?

                            Thanks.
                            I am working with Illumina single-end reads.
                            No, I didn't specify a library type, just with default options.

                            Cheers,

                            Comment


                            • #15
                              I encountered the same problem. I found strand error and overlap between transcripts.
                              Detail in http://seqanswers.com/forums/showthread.php?t=26555
                              I am confused also.
                              github:
                              https://github.com/Bioinformatics-and-Genomics

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-19-2024, 07:20 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-16-2024, 05:49 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X