Seqanswers Leaderboard Ad

**Simon Anders** · 01-14-2012, 12:24 AM

Sure, this was added to both packages mid 2010. Are you reading the current versions of the manuals?

**adumitri** · 07-03-2012, 12:21 PM

Hi,

I read Simon's answer from a few months ago, but could not find details about covariates in the DESeq manual. The covariates I am interested in accounting for are quantitative and are known to affect gene expression (e.g. RIN can have a huge influence on mRNA levels). Could you clarify whether or not DESeq allows for this type of covariates? I previously ran into this post, which seemed to imply that the quantitative covariates cannot be easily incorporated.

Thank you,
Alexandra

**Gordon Smyth** · 07-07-2012, 05:08 PM

edgeR provides complete support for any number of covariates and factors, provided of course you have enough libraries available to estimate the parameters, in particular more libraries than coefficients. See

http://nar.oxfordjournals.org/content/40/10/4288

or the edgeR User's Guide.

There are two case studies in the edgeR User's Guide which involve two experimental factors.

Gordon

**Simon Anders** · 07-09-2012, 12:43 PM

In principle, using quantitative covariates should be possible with both packages, even though I have not seen this actually being done, because it is rarely useful in practice: As we are talking about (generalized) linear models, there should be some reason to assume that your covariate influences log expression in a linear manner. For you example, RIN, I do not think that this would be a good assumption.

**adumitri** · 07-10-2012, 07:30 AM

Hi Simon,

I do not fully understand your statement regarding RIN. Why did you say that RIN most likely does not influence expression in a linear fashion? Do you have any literature evidence that would suggest this to be the case?

The RNA-Seq samples we are looking at have been previously analyzed in a larger Parkinson disease/control expression study that used microarrays for assessing gene expression levels. In the microarray study, we included RIN in the used linear model because of its high impact on gene expression. Actually, in many cases RIN would be a stronger predictor for gene expression than the case/control status itself. From my point of view, it makes sense that samples with low(er) RIN will have more RNA degradation, and the apparent level of expression will be influenced by RINs. To a lesser extent, post-mortem interval and age are also predictors of gene expression.

Additionally, even after multiple RNA extractions, some of the samples obtained from diseased tissue that we included in our recent RNA-Seq study had a tendency for lower RINs than the samples obtained from control tissue. Therefore, for our analyses, we need to compare a set of diseased samples and a set of control samples that differ in terms of average RIN, covariate that we know to be predictive of gene expression. In this case, it seems mandatory to include RIN in the used analyses. What would your opinion be?

Gordon, thank you for the article reference.

Alexandra

**Simon Anders** · 07-10-2012, 08:07 AM

Interesting. If you say that you saw a linear dependence on RIN in earlier studies, it certainly makes sense to try to add it in a quantitative manner. It did not occur to me that sample integrity can be so hard to control that one needs to account for its variation, but then, I never had to work with post-mortem samples.

I might have thought things through in my previous post, because, in principle, the GLM approaches of edgeR and DESeq should not care whether covariates are categorical or quantitative. Just proceed according the the vignettes and use a numerical vector instead of only factors in the model frame, and ask again, if this throws an error.

**koduu** · 10-28-2013, 10:55 AM

Hello,

I am a complete newbie in statistics, so please forgive me if this question sounds really illogical. I basically tried to to the same thing as the OP wanted, but i got stuck at the estimateDispersions function. I have a relatively large sample size (32) and i want to have 2 covariates one binary e.g. "sex" and another quantitative variable e.g. "RIN", that is partially replicated (e.g. 2, 2, 2.5, 3, 3.5, 3.5,......).
If I do the estimateDispersions stage, then the internal function modelMatrixToConditionFactor(modelFrame) creates a condition vector with 22 levels (each having 1 to 3 replicates) and then estimates dispersions.

My question is, can I just leave out the quantitative variable from the estimate dispersions stage, and only use it in fitNbinomGLMs() or is that just mathematically uncorrect and hence wrong?
What I am trying to do, is to prove that the quantitative variable has a significant effect to the expression.

Any advise would be greatly appreciated.

**Gordon Smyth** · 10-28-2013, 02:37 PM

Mathematically incorrect.

Try the edgeR package. It has fast, reliable glm features and has no trouble with this sort of scenario.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Can edgeR/DESeq have more than one covariate?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News