Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by lpachter View Post
    While it is ok to use raw counts to compare gene expression between samples, as Cole explained, to test differential expression of _isoforms_ its necessary to account for uncertainty in the assignment of reads to transcripts. Converting isoform FPKMs to counts and then applying DEseq is a bad idea because the uncertainty is then not incorporated into the DE calculation.
    But wouldn't the uncertainty in the isoform abundances be overcome by the addition of biological replicates. For instance, gene X has 2 isoforms, A and B. A is assigned 20% of the gene expression, while 80% is assigned to B. If the uncertainty of the abundance of A is +- 10%, it would be dificult to assess the isoform abundance with confidence using this sample as you suggest. However, if two additional replicates call the abundance of A 15% and 23%, then the uncertainty of measurement should decrease if the precision of these combined measurements is that high. Is this correct? I recognize that the uncertainty in isoform abundance is a concern, but at what point will the accuracy and precision of the measurement only be a statistical excercise? In other words, would we benefit from knowing an isoform changed from 20% to 30%? Wouldn't the biological replicates inherentlly provide more power?

    By the way, I am not a statstician, so I could be completely off base. But I am learning a lot in these discussions and thanks to everyone for participating!

    Comment


    • #17
      You are absolutely correct that biological replicates will help with accurately estimating relative isoform level abundance, and the replicated deconvolutions inform about the variability in isoform expression. With many replicates, one can directly estimate the variability in the MLE that way. But with few replicates, it is still necessary to estimate variability by leveraging variability in other transcripts and in addition it is important to account for the uncertainty in isoform level expression.

      Comment


      • #18
        Just curious - any updates on the ETA for biological replicates in cufflinks?

        also another question - if i have multiple RNAseq runs and want to predict isoforms - is the best thing to do to combine all the data into one big file and then run cufflinks? THere does not appear to be an option for including multiple sam files?

        Chris

        Comment


        • #19
          Hi Chris,
          I am pretty new to RNA-seq data but my first had experience with cufflinks tells me that combining files is not a very good idea.....Cufflinks is memory intense algorithm and when looking to predict new isoforms it can run forever. However if you are using a reference .gtf file for the analysis and only concerned with those which are in the refrence file, it may work.

          Arpit

          Comment


          • #20
            General wet lab comment:

            Isoform expression is not only a key developmental marker, but also a response mechanism to environmental conditions and changes. Both are very non-static exercises of a given genome.

            Comment


            • #21
              >Just curious - any updates on the ETA for biological replicates in cufflinks?

              We have been working this out and it will be released with the next update of Cufflinks on the website. We had planned to already have it out but logistical issues due to summer travel have slowed us down a bit. We'll post here as soon as its out (I hate to set a date that we don't meet but we really are planning on wrapping this up imminently).

              >also another question - if i have multiple RNAseq runs and want to predict ?>isoforms - is the best thing to do to combine all the data into one big file >and then run cufflinks? THere does not appear to be an option for including >multiple sam files?

              This is a very good question. Its not entirely clear that the best thing to do is to combine data- for one thing the various replicates will be useful in identifying spurious hits. We've started thinking of this because some of our collaborators are working with large case-control studies and are asking exactly the same question. For now, the best advice I can give is to merge the data.

              Comment


              • #22
                Originally posted by Cole Trapnell View Post
                We will be directly supporting biological replicates within the next few weeks in both cuffdiff and cufflinks itself. We've recently worked out the math for how to handle them well in our model and improve the robustness of our statistical testing. I need a few weeks to implement the enhancements and do the testing, etc.
                any updates on this? Can cuffdiff now handle biological and/or technical (library prep) replicates?

                Comment


                • #23
                  Originally posted by amackey View Post
                  any updates on this? Can cuffdiff now handle biological and/or technical (library prep) replicates?
                  Sorry, just say Lior's reply to same question, just yesterday (I don't always notice SEQanswers paging system...) I'll wait more patiently.

                  Thanks again,
                  -Aaron

                  Comment


                  • #24
                    Hi All..

                    It was really interesting to read this educating discussion.

                    Just wondering for folks with Single Read data, would you recommend using Tophat/Cufflinks. My impression specially from the cufflinks paper is that it is basically built for PE data.

                    -Abhi

                    Comment


                    • #25
                      apratrap,

                      I guess the documentation says that it should work also with single-end. But too wonder whether anyone has any benchmarks/validation for single end data? Particularly with respect to isoform prediction and the detection of differential splicing?

                      Comment


                      • #26
                        Originally posted by lpachter View Post
                        >Just curious - any updates on the ETA for biological replicates in cufflinks?

                        We have been working this out and it will be released with the next update of Cufflinks on the website. We had planned to already have it out but logistical issues due to summer travel have slowed us down a bit. We'll post here as soon as its out (I hate to set a date that we don't meet but we really are planning on wrapping this up imminently).

                        >also another question - if i have multiple RNAseq runs and want to predict ?>isoforms - is the best thing to do to combine all the data into one big file >and then run cufflinks? THere does not appear to be an option for including >multiple sam files?

                        This is a very good question. Its not entirely clear that the best thing to do is to combine data- for one thing the various replicates will be useful in identifying spurious hits. We've started thinking of this because some of our collaborators are working with large case-control studies and are asking exactly the same question. For now, the best advice I can give is to merge the data.
                        I agree - merge the replicates. Replicates are great for controlling gene expression (finding normal biological variation) but when comparing control/mutant isoforms I find it's best to merge the replicate read sets and then run them through the Tophat pipeline. The problem is when you have only 20-30 million alignments you'll still see splicing variations in biological replicates up to pretty good RPKM levels which means to make valid comparisons you're going to have to not trust a large portion of your data. While putting more reads into the mix doesn't necessarily alter the gene expression values it does increase the robustness of those expressions which should equate to more complete/robust isoforms reported.

                        The more reads the better! It is only fair to then compare isoforms between control/mutant that have a similar number of alignments making up the data so you can rule out the possibility that if you DID have the same number of alignments some of the variation might go away.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment


                        • #27
                          combining runs

                          I am a bit stumped at the moment.


                          What would be the recommended pipeline (anyone feel free to chime in!)

                          What I have done (I guess the standard pipeline): Map reads from each run independently with tophat > Run cufflinks for each run > Cuffcompare > Cuffdiff

                          What I would like to do:

                          Generate a 'master' mapping/sam file by combining all of my reads and mapping in tophat > Analyze those reads in Cufflinks w/ reference gtf to produce a master gtf (w/ annotation) > THen go back to quantitate each run independently relative to the new gtf?

                          I can't seem to think of how to do this? I suppose could just go back and split the "master" sam file on the basis of the run identifier? Sorry ... bioinformaticist in training...

                          Comment


                          • #28
                            total mapped fragments RNA-seq

                            I am interested in having read counts for RNA-seq differential expression analysis. Does anybody know how to count the total number of fragments mapped if my Sam file (from Tophat) have a mixture of proper and improper pairs? I am using this formula:

                            read counts= fpkm x length transcript x total fragments mapped /10e9

                            Thanks.

                            Comment


                            • #29
                              Cuffdiff - differential expression analysis between groups of samples

                              We will be directly supporting biological replicates within the next few weeks in both cuffdiff and cufflinks itself. We've recently worked out the math for how to handle them well in our model and improve the robustness of our statistical testing. I need a few weeks to implement the enhancements and do the testing, etc.
                              Hello,

                              Cole mentioned on this thread in May that he is working on introducing the differential expression analysis functionality for groups of samples in Cuffdiff (e.g. control samples compared with treated samples). Is there any news about this new version of Cuffdiff? Will it be released soon or is it already available somewhere?

                              Given the currently available Cuffdiff version (v0.9.3), is there any viable workaround to analyze groups of samples?

                              Thank you,
                              Alexandra

                              Comment


                              • #30
                                Replicates/groups has been supported for some time now, but paired samples are not supported (I don't know of any RNA-seq software than handles paired samples).

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X