Unconfigured Ad

**dpryan** · 12-22-2014, 07:01 AM

DESeq2 is doing some additional things, such as looking for samples with too much leverage and then excluding them/those genes. If you look at some of the genes called DE by edgeR but not by DESeq2, I suspect that some of them will be flagged due to cook's cutoff distance.

**Wolfgang Huber** · 12-23-2014, 01:56 AM

Dear Keysoon

I suggest investigating the results and the input data for those genes that (i) only edgeR finds, (ii) only DESeq2 finds, (iii) both find.

Devon's hypothesis is pertinent. In addition, it could be a threshold effect. I tend to find it useful to plot statistics (such as estimated fold changes, p-values) from the two methods against each other to see whether the difference is large or is just a numerical wiggle.

In addition, DESeq2 offers banded hypothesis testing, i.e you can test directly against the null hypothesis |beta|<=1, rather than (the more traditional) beta=0, and then you don't need to apply the post hoc logFC filter. This is statistically more efficient. See DESeq2's vignette.

Kind regards
Wolfgang

**tirohia** · 01-11-2015, 10:34 PM

Hi.

I have a similar situation where I am getting large (2 orders of magnitude) difference between DESeq D.E. estimates and EdgeR D.E. estimates. Not even going to start talking about the huge numbers limma/voom are giving me despite being told by multiple people that I should be using that as well.

I've got 3 samples from each two positions on an inoculated plant stem, one close to the inoculation site, one distant. Repeated for treatment and control, so 12 samples all up.

So the DESeq analysis, I filtered on the log2FC. When I run it with banded hypothesis testing, I'm getting different results - I would have thought they would have been similar/the same? As an example, when I'm comparing two of the samples, I'm getting 2 vs 6 for up regulated genes, but 162 vs 348 down, banded testing vs filtering on the log2FC respectively. How do/can I figure out which is better? I assuming I'm doing the banded hypothesis correctly :

Code:

res <- results(dds, lfcThreshold=2, altHypothesis="greaterAbs")
res <- na.omit(res)
res=res[res$padj < 0.1,]
plotMA(res, ylim=yl)
res

As for comparing DESeq and EdgeR, there's the suggestion to plot the various statistics from each method. I'm doing this on the overlap - the genes identified by both DESeq and EdgeR.
The adjusted p values from each method for all of them are low. Very low. The log2FC though differ quite a bit though. I'm assuming this would due to differences in the way the two methods model and adjust the read counts. What sort of magnitude of difference would constitute something to worry about? I'm looking at DESeq averaging values around -2 and EdgeR averaging values around -4/-5. Which seems quite big to me.

I had a quick look at the base means - a number of them are quite low - between 100 -500. Which seems low, but from what I gather, isn't to horrendously low.

I think what I'm trying to ask (apologies for the rambling statement-like question) is, given the different ways that DESeq and EdgeR work, what constitutes a numerical wiggle and what constitutes a large and therefore possibly due to something else, difference.

Is it worth looking through dds at the cooks value of all the genes that EdgeR picks up but DESeq discards?

Cheers
Ben.

**Michael Love** · 01-12-2015, 05:11 AM

keysoon,

You are not comparing the same test for edgeR and DESeq2.

For edgeR, you built a group variable which combines breed and diet information, and are then testing the difference between two groups of samples. This is testing, is there a difference between the group of samples in FIN.FLU and in TEX.FLU. As far as I can tell, this test is not involving the diet effect, because it does not involve the CON samples.

For DESeq2, you are testing interaction terms, which test whether the breed effect is *different* in the flu group.

Maybe you can specify the exact hypothesis you want to test.

**Michael Love** · 01-12-2015, 05:17 AM

hi tirohia,

We recommend plotting the unfiltered res object with plotMA, so you can see the genes with small padj relative to the rest of genes.

We recommend banded testing over filtering on the log2FC.

You can use the summary(res) function to get an idea how many outliers are being detected, assuming you are using an up-to-date version of DESeq2.

**Michael Love** · 01-12-2015, 05:18 AM

hi tirohia,

Maybe you can start a new post, as the two questions here concern different issues. This will make the thread easier to follow.

**keysoon** · 01-12-2015, 06:05 AM

Originally posted by Michael Love View Post

keysoon,

You are not comparing the same test for edgeR and DESeq2.

For edgeR, you built a group variable which combines breed and diet information, and are then testing the difference between two groups of samples. This is testing, is there a difference between the group of samples in FIN.FLU and in TEX.FLU. As far as I can tell, this test is not involving the diet effect, because it does not involve the CON samples.

For DESeq2, you are testing interaction terms, which test whether the breed effect is *different* in the flu group.

Maybe you can specify the exact hypothesis you want to test.

Hi Michael,

Thank you for noticing the actual problem. I was not quite confident with the EdgeR design. In the experiment, I am interested to see the effect of diet (FLU and CON) within and between breeds (FIN, TEX, FXT).

Best Regards,
Keysoon

**Michael Love** · 01-12-2015, 07:26 AM

For simplicity, let's use the group variable for both softwares. In edgeR you are correct in including ~0+group, but in DESeq2 you should just use ~group, which allows moderation of the fold changes.

In DESeq2, the within-breed effect of diet FLU vs CON will be, for example for the FIN breed:

Code:

results(dds, contrast=c("group","FIN.FLU","FIN.CON"))

To test if the diet effect is different between two breeds, you are testing if the following contrast is zero:

(FIN.FLU - FIN.CON) - (TEX.FLU - TEX.CON)

By rearranging you get

(FIN.FLU + TEX.CON) - (FIN.CON + TEX.FLU)

This contrast can be tested in DESeq2 with:

Code:

results(dds, contrast=list(c("groupFIN.FLU","groupTEX.CON"),c("groupFIN.CON","groupTEX.FLU")))

There is another way to build contrasts with an interaction term, but to keep things simple and comparable, I'd recommend the above. You should be able to see how to form the edgeR contrasts this way.

**keysoon** · 01-13-2015, 05:04 AM

Hi Michael,

I modified the design according to your suggestion. I hope I followed it well. I have attached a part of the code below:

Code:

group<-factor(paste(breed, diet, sep="."))
sampletable<-cbind(sampletable, group=group)

#DESeq2
ddsMat<-DESeqDataSetFromMatrix(countData=countdata, colData = sampletable, design = ~group )
dds<-DESeq(ddsMat)
dTEX_Diet<-results(dds, contrast=c("group","TEX.FLU", "TEX.CON"))
dTEX_Diet_sig<-subset(dTEX_Diet, padj < 0.1)
dTEX_Diet_sig<-dTEX_Diet_sig[abs(dTEX_Diet_sig$log2FoldChange) >=1,]
nrow(dTEX_Diet_sig) 
148

dTEX_FXT_Diet<-results(dds, contrast=list(c("groupTEX.FLU","groupFXT.CON"), c("groupTEX.CON","groupFXT.FLU")))
dTEX_FXT_Diet_sig<-subset(dTEX_FXT_Diet, padj < 0.1)
dTEX_FXT_Diet_sig<-dTEX_FXT_Diet_sig[abs(dTEX_FXT_Diet_sig$log2FoldChange) >=1,]
nrow(dTEX_FXT_Diet_sig)
159

#EdgeR
cpm<-cpm(countdata)
keep<-rowSums(cpm>1)>=5
ecountdata<-countdata[keep,]
ED<-DGEList(counts=ecountdata, group=group)
Y<-calcNormFactors(ED)
design<-model.matrix(~0+group)
Y<-estimateGLMCommonDisp(Y, verbose=TRUE)
Y<-estimateGLMTrendedDisp(Y, design)
Y<-estimateGLMTagwiseDisp(Y, design)
fit<-glmFit(Y, design)

mycontrast<-makeContrasts(FIN_Diet=FIN.FLU-FIN.CON, TEX_Diet=TEX.FLU-TEX.CON, FXT_Diet=FXT.FLU-FXT.CON, FIN_TEX_Diet=(FIN.FLU-FIN.CCON)-(TEX.FLU-TEX.CON), 
       FIN_FXT_Diet=(FIN.FLU-FIN.CON)-(FXT.FLU-FXT.CON),TEX_FXT_Diet=(TEX.FLU-TEX.CON)-(FXT.FLU-FXT.CON), levels=design)

eTEX_Diet<-glmLRT(fit, contrast=mycontrast[,"TEX_Diet"])
summary(Y<-decideTestsDGE(eTEX_Diet, p=0.1))
eTEX_Diet_tt<-topTags(eTEX_Diet, n=sum(46+37))
eTEX_Diet_sig<-eTEX_Diet_tt[abs(eTEX_Diet_tt$table$logFC) >=1.0,]
nrow(eTEX_Diet_sig) 
76

eTEX_FXT_Diet<-glmLRT(fit, contrast=mycontrast[,"TEX_FXT_Diet"])
summary(Y<-decideTestsDGE(eTEX_FXT_Diet, p=0.1))
eTEX_FXT_Diet_tt<-topTags(eTEX_FXT_Diet, n=sum(52+191))
eTEX_FXT_Diet_sig<-eTEX_FXT_Diet_tt[abs(eTEX_FXT_Diet_tt$table$logFC) >=1.0,]
nrow(eTEX_FXT_Diet_sig)
243

Between two programs, 64 genes are common in TEX_Diet and 97 genes are common in TEX_FXT_Diet groups. Do you think it's normal to get that much of differences?

Thank you very much.

Best Regards,
Keysoon

**Michael Love** · 01-13-2015, 06:04 AM

Yes that's normal.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

DESeq2 vs EdgeR for multi-factor designs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News