Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can edgeR/DESeq have more than one covariate?

    Beside diagnosis, can I include age, gender and etc as covariates in the GLM model? I could not find such information in their paper or manual.

    Thanks a lot.
    Last edited by arrchi; 01-13-2012, 01:31 PM.

  • #2
    Sure, this was added to both packages mid 2010. Are you reading the current versions of the manuals?

    Comment


    • #3
      Hi,

      I read Simon's answer from a few months ago, but could not find details about covariates in the DESeq manual. The covariates I am interested in accounting for are quantitative and are known to affect gene expression (e.g. RIN can have a huge influence on mRNA levels). Could you clarify whether or not DESeq allows for this type of covariates? I previously ran into this post, which seemed to imply that the quantitative covariates cannot be easily incorporated.

      Thank you,
      Alexandra

      Comment


      • #4
        edgeR provides complete support for any number of covariates and factors, provided of course you have enough libraries available to estimate the parameters, in particular more libraries than coefficients. See



        or the edgeR User's Guide.

        There are two case studies in the edgeR User's Guide which involve two experimental factors.

        Gordon

        Comment


        • #5
          In principle, using quantitative covariates should be possible with both packages, even though I have not seen this actually being done, because it is rarely useful in practice: As we are talking about (generalized) linear models, there should be some reason to assume that your covariate influences log expression in a linear manner. For you example, RIN, I do not think that this would be a good assumption.

          Comment


          • #6
            Hi Simon,

            I do not fully understand your statement regarding RIN. Why did you say that RIN most likely does not influence expression in a linear fashion? Do you have any literature evidence that would suggest this to be the case?

            The RNA-Seq samples we are looking at have been previously analyzed in a larger Parkinson disease/control expression study that used microarrays for assessing gene expression levels. In the microarray study, we included RIN in the used linear model because of its high impact on gene expression. Actually, in many cases RIN would be a stronger predictor for gene expression than the case/control status itself. From my point of view, it makes sense that samples with low(er) RIN will have more RNA degradation, and the apparent level of expression will be influenced by RINs. To a lesser extent, post-mortem interval and age are also predictors of gene expression.

            Additionally, even after multiple RNA extractions, some of the samples obtained from diseased tissue that we included in our recent RNA-Seq study had a tendency for lower RINs than the samples obtained from control tissue. Therefore, for our analyses, we need to compare a set of diseased samples and a set of control samples that differ in terms of average RIN, covariate that we know to be predictive of gene expression. In this case, it seems mandatory to include RIN in the used analyses. What would your opinion be?

            Gordon, thank you for the article reference.

            Alexandra

            Comment


            • #7
              Interesting. If you say that you saw a linear dependence on RIN in earlier studies, it certainly makes sense to try to add it in a quantitative manner. It did not occur to me that sample integrity can be so hard to control that one needs to account for its variation, but then, I never had to work with post-mortem samples.

              I might have thought things through in my previous post, because, in principle, the GLM approaches of edgeR and DESeq should not care whether covariates are categorical or quantitative. Just proceed according the the vignettes and use a numerical vector instead of only factors in the model frame, and ask again, if this throws an error.

              Comment


              • #8
                Hello,

                I am a complete newbie in statistics, so please forgive me if this question sounds really illogical. I basically tried to to the same thing as the OP wanted, but i got stuck at the estimateDispersions function. I have a relatively large sample size (32) and i want to have 2 covariates one binary e.g. "sex" and another quantitative variable e.g. "RIN", that is partially replicated (e.g. 2, 2, 2.5, 3, 3.5, 3.5,......).
                If I do the estimateDispersions stage, then the internal function modelMatrixToConditionFactor(modelFrame) creates a condition vector with 22 levels (each having 1 to 3 replicates) and then estimates dispersions.

                My question is, can I just leave out the quantitative variable from the estimate dispersions stage, and only use it in fitNbinomGLMs() or is that just mathematically uncorrect and hence wrong?
                What I am trying to do, is to prove that the quantitative variable has a significant effect to the expression.

                Any advise would be greatly appreciated.

                Comment


                • #9
                  Mathematically incorrect.

                  Try the edgeR package. It has fast, reliable glm features and has no trouble with this sort of scenario.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Exploring the Dynamics of the Tumor Microenvironment
                    by seqadmin




                    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                    07-08-2024, 03:19 PM
                  • seqadmin
                    Exploring Human Diversity Through Large-Scale Omics
                    by seqadmin


                    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                    06-25-2024, 06:43 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 07-19-2024, 07:20 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-16-2024, 05:49 AM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-15-2024, 06:53 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-10-2024, 07:30 AM
                  0 responses
                  42 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X