Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

DEG for paired samples, biological replicates

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEG for paired samples, biological replicates

    Hi.
    I am new to the seq world, although with experience in microarray aand teh omic world.

    I want to know which program is available to compare RNA seq data when you have biological replicates and paired samples. I have samples for 3 patients, 2 samples each, control and lesion. I have being playing with cufflinks but not sure if it does handle this kind of experimental design.

  • #2
    Cuffdiff now supports replicates, so it should handle this sort of setup

    Comment


    • #3
      Cole, if cufflinks can handle paired samples, I'd be really impressed, but I wonder if you simply were to fast with your reply and overlooked the fact that the question was about paired samples. If not, please correct me.

      For readers unfamiliar with the issue, a quick reminder of text-book knowledge with an example for a paired t test:
      Imagine you have 5 sample, for which you measure some quantitative trait before and after a certain treatment. Let's say this trait varies according to a normal distribution and the data looks like this

      Code:
                          S1    S2    S3    S4    S5
      before treatment  4.00  7.30 11.13  8.50  9.50
      after treatment   4.72  8.44 12.04  9.66 10.65
      The mean is 8.09 before and 9.10 after treatment. The (pooled) standard deviation of the data is 2.7, and hence, the difference of the means (1.01) is not significant. ( t = (9.10-8.09)/2.7=.37, p=.36 )

      However, we did not use the sample pairing here. If we want to do this, we first take the differences and then the average, i.e., instead of subtracting the averages, we subtract each treated value from the corresponding untreated value.

      The differences are:
      Code:
      0.72 1.13 0.91 1.15 1.14
      The mean is 1.01 as before, but the standard deviation of the difference is only .19, and hence, the difference is clearly significant.

      To come back to DEGs: Whenever samples are paired (i.e., the same sample is measured twice under different conditions) and unless the difference between the samples is typically much smaller than the effect of the treatment, we dramatically loose statistical power if our method is unable to make use of the pairing information.

      To my knowledge, none of the currently released tools can do this, though.

      We have recently expanded our DESeq package to be able to fit generalized linear models (GLMs), and these can be used to model the pairing. Unfortunately, our method to estimate the dispersion (which, for count data, takes the role of the standard deviation of the differences in the example above) does not work for paired designs. We have some ideas how to get around this and are testing them at the moment, but it does not yet work as well as we would like it.

      As far as I know, the edgeR people seem to pursue similar ideas.

      DESeq offers a function for a "variance stabilizing transformation" which translates the count data onto a continuous scale such that it becomes approximately homoscedastic. This allows then to use tools that worked well for microarrays, such as pairwise t-tests or Smyth's 'limma' package. However, the transformation costs power and introduces bias in case the library sizes are too different. Still, it may a good way to get started.

      Simon

      Comment


      • #4
        Cuffdiff doesn't explicitly support sample pairing, hypatia, but I suggest you try the newest version of Cuffdiff (0.9.1) if you haven't already, as it should get you started pretty quickly. You may see fewer differentially expressed genes due to a loss in power, but hopefully being able to find differentially spliced genes or those undergoing shifts in promoter preference will make up for it

        Comment


        • #5
          My first objective is really differential expression, specially in the lower range of expression and I already know that in my disease model, the correlation of each patient fold change is not high, around 0.4, so the paired model will make a huge difference here.

          Comment


          • #6
            GLMs in edgeR

            Hi hypatia and others

            We have recently implemented GLM methods in edgeR, so the package can now deal with paired designs as well as other more complicated designs as well. One of the PhD students in our division has been working on using Cox-Reid conditional inference to estimate the dispersion parameter for the negative binomial model. This approach does take into account the paired nature of the samples (or indeed works whatever the experimental design) and has been giving us very reasonable results in our testing.

            Following Simon's work with DESeq, the edgeR methods can now also add a gene abundance-related trend on the dispersion estimates.

            These new methods, the GLMs with CR estimation of the dispersion (plus a whole lot of other improvements), are all implemented in the current development version of edgeR in the Bioconductor repository. They will be rolled out into the release version with the release of Bioconductor 2.7 on 18 October.

            We'll be adding to the documentation over the next couple of weeks to get some examples in there of using these methods on paired designs and other more complicated experimental designs.

            The new edgeR methods have been developed with exactly this sort of application in mind, so I certainly encourage you to give them a try. We'd be really interested in how they work for you.

            Best regards
            Davis

            Comment


            • #7
              Hi Hypatia,

              From my current reading of the literature, it seems to me that baySeq may be a good solution for you right now:
              http://www.biomedcentral.com/1471-2105/11/422/

              Regarding Cufflinks, I would like to correct Simon who is not a developer of the program and therefore may not understand it completely (Simon, please correct me if I am wrong). It is not accurate to say that it does not support sample pairing.
              What Cufflinks does is estimate expression values according to a generative model of the sequencing process, one that currently takes into account sequencing bias of various kinds. The value of paired samples is that the experimental bias should be similar in the pairs, and this will be implicitly "learned" by Cufflinks, so that its not clear to me a priori that it will produce inferior results to methods that learn the distributions of counts.

              Comment


              • #8
                Originally posted by lpachter View Post
                From my current reading of the literature, it seems to me that baySeq may be a good solution for you right now:
                http://www.biomedcentral.com/1471-2105/11/422/
                I am afraid, no. BaySeq's paper abstract advertises its ability to deal with more complex designs but I looked a bit closer at the paper and it seems that it focuses on nested one-way designs and cannot deal with crossed factors (two-way anova) as one would need for paired samples. At least if I have understood its approach correctly.

                Regarding Cufflinks, I would like to correct Simon who is not a developer of the program and therefore may not understand it completely (Simon, please correct me if I am wrong). It is not accurate to say that it does not support sample pairing.
                What Cufflinks does is estimate expression values according to a generative model of the sequencing process, one that currently takes into account sequencing bias of various kinds. The value of paired samples is that the experimental bias should be similar in the pairs, and this will be implicitly "learned" by Cufflinks, so that its not clear to me a priori that it will produce inferior results to methods that learn the distributions of counts.
                Actually, I did not say that it does not support paired sampling. Cole said so (and he is a developer ;-) ).

                I'm a bit puzzled what you might mean by "implicit learning", and by the fact that you talk about sequencing bias. (The value of paired designs, as I understand the term, is not to reduce bias but to reduce variance.) Anyway, I guess, this discussion has to wait until you have written up and published the method behind the new biological replicate functionality.

                Simon

                Comment


                • #9
                  Regarding baySeq, I am not an author on that software so I cannot speak for the details of it. I just mentioned it because it seemed like they do a lot of things right on a lot of aspects of differential expression analysis. They actually say in their paper that they do not handle paired samples, but as you pointed out no program currently does, and many of the other details matter as well.

                  The relationship between bias and variance is that bias causes variance. For an explanation of this see
                  http://genomebiology.com/2010/11/5/R50

                  Comment


                  • #10
                    Hi Davis,

                    do you have any updates about the documentation for using edgeR in dataset with paired samples design?

                    You also mentioned some "very reasonable results in our testing"... are these results publicly available now?

                    All the best!




                    Originally posted by Davis McC View Post
                    Hi hypatia and others

                    We have recently implemented GLM methods in edgeR, so the package can now deal with paired designs as well as other more complicated designs as well. One of the PhD students in our division has been working on using Cox-Reid conditional inference to estimate the dispersion parameter for the negative binomial model. This approach does take into account the paired nature of the samples (or indeed works whatever the experimental design) and has been giving us very reasonable results in our testing.

                    Following Simon's work with DESeq, the edgeR methods can now also add a gene abundance-related trend on the dispersion estimates.

                    These new methods, the GLMs with CR estimation of the dispersion (plus a whole lot of other improvements), are all implemented in the current development version of edgeR in the Bioconductor repository. They will be rolled out into the release version with the release of Bioconductor 2.7 on 18 October.

                    We'll be adding to the documentation over the next couple of weeks to get some examples in there of using these methods on paired designs and other more complicated experimental designs.

                    The new edgeR methods have been developed with exactly this sort of application in mind, so I certainly encourage you to give them a try. We'd be really interested in how they work for you.

                    Best regards
                    Davis

                    Comment


                    • #11
                      Hi f1boston

                      The edgeR functions for analysing differential expression with a paired samples design are documented in the package. We are planning on updating the User's Guide substantially to include better examples of using the GLM methods with paired and other experimental designs, but unfortunately this has taken a back seat while we have been putting a lot of work into the development of the new methods.

                      We don't have any publicly available results as such - but all of the methods are available through Bioconductor so anyone could test them themselves. I certainly encourage you to do so! Finding a suitable yardstick for comparison with other/previous methods is difficult - hence why I said that the results look reasonable. They do look reasonable, but the actual 'truth' is not known in the datasets we have seen. The important point is that the GLM methods can properly analyse paired designs, whereas our older methods could not.

                      If you have more specific questions that I may be able to help you with please feel free to get in touch.

                      Best regards
                      Davis

                      Comment


                      • #12
                        How many pairs is need to gain statistic power for paired study using GLM methods

                        Hi Davis,

                        I was wondering using edgeR GLM method for paired study, how many paired would be required for gaining certain statistic power?

                        Cheers,
                        Sheng


                        Originally posted by Davis McC View Post
                        Hi f1boston

                        The edgeR functions for analysing differential expression with a paired samples design are documented in the package. We are planning on updating the User's Guide substantially to include better examples of using the GLM methods with paired and other experimental designs, but unfortunately this has taken a back seat while we have been putting a lot of work into the development of the new methods.

                        We don't have any publicly available results as such - but all of the methods are available through Bioconductor so anyone could test them themselves. I certainly encourage you to do so! Finding a suitable yardstick for comparison with other/previous methods is difficult - hence why I said that the results look reasonable. They do look reasonable, but the actual 'truth' is not known in the datasets we have seen. The important point is that the GLM methods can properly analyse paired designs, whereas our older methods could not.

                        If you have more specific questions that I may be able to help you with please feel free to get in touch.

                        Best regards
                        Davis

                        Comment


                        • #13
                          Hi Sheng

                          edgeR can indeed be used to analyse RNA-seq data from paired designs, but your question is far, far too vague for me to be able to give you any sensible answer about how many samples you need.

                          In general, the answer is "as many as you can afford".

                          Cheers
                          Davis

                          Comment


                          • #14
                            Simon,

                            I am sorry if this sounds stupid, but how does the experimental design of the poster differ from the pasillaGenes example in the DESeq documentation? I am probably not able to get a handle on what the poster might mean by paired samples, could you help me understand what we mean by a paired sample experimental design?

                            Thanks,
                            Praful
                            Last edited by aggp11; 01-11-2012, 11:06 AM.

                            Comment


                            • #15
                              Simon,

                              Nevermind, I think I know understand the issue now.

                              Thanks,
                              Praful

                              Comment

                              Working...
                              X