Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • keysoon
    Junior Member
    • Nov 2012
    • 7

    DESeq2 vs EdgeR for multi-factor designs

    Hi,

    I have 31 samples from three different breed groups where I am studying the effect of diet on breeds. I constructed multi-factor designs for DESeq2 and EdgeR as follows:
    countdata<-read.csv("htseq-merged_table.csv", row.names=1)
    countdata<-countdata[1: (nrow(countdata)-5),]
    sampleinfo<-read.csv("SampleTable_mod.csv")
    sample<-sampleinfo$Sample
    breed<-sampleinfo$Breed
    diet<-sampleinfo$Diet
    sampletable<-data.frame(Sample=sample, breed=breed, diet=diet)
    DESeq2:
    colnames(countdata)<-sampletable$Sample
    ddsMat<-DESeqDataSetFromMatrix(countData=countdata, colData = sampletable, design = ~breed+diet+breed:diet )
    dds<-DESeq(ddsMat)
    resultsNames(dds)
    #[1] "Intercept" "breedFIN" "breedTEX" "breedFXT" "dietFLU"
    #[6] "dietCON" "breedFIN.dietFLU" "breedTEX.dietFLU" "breedFXT.dietFLU" "breedFIN.dietCON"
    #[11] "breedTEX.dietCON" "breedFXT.dietCON"

    FINTEXFlu<-results(dds, contrast=list("breedFIN.dietFLU","breedTEX.dietFLU"))
    FINTEXFluSig<-subset(FINTEXFlu, padj < 0.1) #(36)
    FINTEXFluSig<-FINTEXFluSig[abs(FINTEXFluSig$log2FoldChange) >=1,]
    nrow(FINTEXFluSig)
    25

    EdgeR:
    group<-factor(paste(breed, diet, sep="."))
    ED<-DGEList(counts=countdata, group=group)
    Y<-calcNormFactors(ED)
    design<-model.matrix(~0+group)
    colnames(design)<-levels(group)
    Y<-estimateGLMCommonDisp(Y, verbose=TRUE)
    Y<-estimateGLMTrendedDisp(Y, design)
    Y<-estimateGLMTagwiseDisp(Y, design)
    fit<-glmFit(Y, design)
    mycontrast<-makeContrasts(FINF.TEXF=FIN.FLU-TEX.FLU, FINF.FXTF=FIN.FLU-FXT.FLU, TEXF.FXTF=TEX.FLU-FXT.FLU, levels=design)
    FINTEXF<-(glmLRT(fit, contrast=mycontrast[,"FINF.TEXF"]))
    summary(Y<-decideTestsDGE(FINTEXF, p=0.1)) # (-98,+480)
    ttFINTEXF<-topTags(FINTEXF, n=sum(98+480))
    sigFINTEXF<-ttFINTEXF[abs(ttFINTEXF$table$logFC) >=1.0,]
    nrow(sigFINTEXF)
    519
    Could anyone please tell me what made such a big difference in results from two different programs? I would appreciate your help.

    Thank you very much!
    Last edited by keysoon; 12-22-2014, 08:06 AM.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    DESeq2 is doing some additional things, such as looking for samples with too much leverage and then excluding them/those genes. If you look at some of the genes called DE by edgeR but not by DESeq2, I suspect that some of them will be flagged due to cook's cutoff distance.

    Comment

    • Wolfgang Huber
      Senior Member
      • Aug 2009
      • 109

      #3
      Dear Keysoon

      I suggest investigating the results and the input data for those genes that (i) only edgeR finds, (ii) only DESeq2 finds, (iii) both find.

      Devon's hypothesis is pertinent. In addition, it could be a threshold effect. I tend to find it useful to plot statistics (such as estimated fold changes, p-values) from the two methods against each other to see whether the difference is large or is just a numerical wiggle.

      In addition, DESeq2 offers banded hypothesis testing, i.e you can test directly against the null hypothesis |beta|<=1, rather than (the more traditional) beta=0, and then you don't need to apply the post hoc logFC filter. This is statistically more efficient. See DESeq2's vignette.

      Kind regards
      Wolfgang
      Last edited by Wolfgang Huber; 12-23-2014, 02:00 AM.
      Wolfgang Huber
      EMBL

      Comment

      • tirohia
        Member
        • Nov 2011
        • 47

        #4
        Hi.

        I have a similar situation where I am getting large (2 orders of magnitude) difference between DESeq D.E. estimates and EdgeR D.E. estimates. Not even going to start talking about the huge numbers limma/voom are giving me despite being told by multiple people that I should be using that as well.

        I've got 3 samples from each two positions on an inoculated plant stem, one close to the inoculation site, one distant. Repeated for treatment and control, so 12 samples all up.

        So the DESeq analysis, I filtered on the log2FC. When I run it with banded hypothesis testing, I'm getting different results - I would have thought they would have been similar/the same? As an example, when I'm comparing two of the samples, I'm getting 2 vs 6 for up regulated genes, but 162 vs 348 down, banded testing vs filtering on the log2FC respectively. How do/can I figure out which is better? I assuming I'm doing the banded hypothesis correctly :

        Code:
        res <- results(dds, lfcThreshold=2, altHypothesis="greaterAbs")
        res <- na.omit(res)
        res=res[res$padj < 0.1,]
        plotMA(res, ylim=yl)
        res
        As for comparing DESeq and EdgeR, there's the suggestion to plot the various statistics from each method. I'm doing this on the overlap - the genes identified by both DESeq and EdgeR.
        The adjusted p values from each method for all of them are low. Very low. The log2FC though differ quite a bit though. I'm assuming this would due to differences in the way the two methods model and adjust the read counts. What sort of magnitude of difference would constitute something to worry about? I'm looking at DESeq averaging values around -2 and EdgeR averaging values around -4/-5. Which seems quite big to me.

        I had a quick look at the base means - a number of them are quite low - between 100 -500. Which seems low, but from what I gather, isn't to horrendously low.

        I think what I'm trying to ask (apologies for the rambling statement-like question) is, given the different ways that DESeq and EdgeR work, what constitutes a numerical wiggle and what constitutes a large and therefore possibly due to something else, difference.

        Is it worth looking through dds at the cooks value of all the genes that EdgeR picks up but DESeq discards?

        Cheers
        Ben.

        Comment

        • Michael Love
          Senior Member
          • Jul 2013
          • 333

          #5
          keysoon,

          You are not comparing the same test for edgeR and DESeq2.

          For edgeR, you built a group variable which combines breed and diet information, and are then testing the difference between two groups of samples. This is testing, is there a difference between the group of samples in FIN.FLU and in TEX.FLU. As far as I can tell, this test is not involving the diet effect, because it does not involve the CON samples.

          For DESeq2, you are testing interaction terms, which test whether the breed effect is *different* in the flu group.

          Maybe you can specify the exact hypothesis you want to test.

          Comment

          • Michael Love
            Senior Member
            • Jul 2013
            • 333

            #6
            hi tirohia,

            We recommend plotting the unfiltered res object with plotMA, so you can see the genes with small padj relative to the rest of genes.

            We recommend banded testing over filtering on the log2FC.

            You can use the summary(res) function to get an idea how many outliers are being detected, assuming you are using an up-to-date version of DESeq2.

            Comment

            • Michael Love
              Senior Member
              • Jul 2013
              • 333

              #7
              hi tirohia,

              Maybe you can start a new post, as the two questions here concern different issues. This will make the thread easier to follow.

              Comment

              • keysoon
                Junior Member
                • Nov 2012
                • 7

                #8
                Originally posted by Michael Love View Post
                keysoon,

                You are not comparing the same test for edgeR and DESeq2.

                For edgeR, you built a group variable which combines breed and diet information, and are then testing the difference between two groups of samples. This is testing, is there a difference between the group of samples in FIN.FLU and in TEX.FLU. As far as I can tell, this test is not involving the diet effect, because it does not involve the CON samples.

                For DESeq2, you are testing interaction terms, which test whether the breed effect is *different* in the flu group.

                Maybe you can specify the exact hypothesis you want to test.
                Hi Michael,

                Thank you for noticing the actual problem. I was not quite confident with the EdgeR design. In the experiment, I am interested to see the effect of diet (FLU and CON) within and between breeds (FIN, TEX, FXT).

                Best Regards,
                Keysoon
                Last edited by keysoon; 01-12-2015, 06:17 AM.

                Comment

                • Michael Love
                  Senior Member
                  • Jul 2013
                  • 333

                  #9
                  For simplicity, let's use the group variable for both softwares. In edgeR you are correct in including ~0+group, but in DESeq2 you should just use ~group, which allows moderation of the fold changes.

                  In DESeq2, the within-breed effect of diet FLU vs CON will be, for example for the FIN breed:

                  Code:
                  results(dds, contrast=c("group","FIN.FLU","FIN.CON"))
                  To test if the diet effect is different between two breeds, you are testing if the following contrast is zero:

                  (FIN.FLU - FIN.CON) - (TEX.FLU - TEX.CON)

                  By rearranging you get

                  (FIN.FLU + TEX.CON) - (FIN.CON + TEX.FLU)

                  This contrast can be tested in DESeq2 with:

                  Code:
                  results(dds, contrast=list(c("groupFIN.FLU","groupTEX.CON"),c("groupFIN.CON","groupTEX.FLU")))
                  There is another way to build contrasts with an interaction term, but to keep things simple and comparable, I'd recommend the above. You should be able to see how to form the edgeR contrasts this way.

                  Comment

                  • keysoon
                    Junior Member
                    • Nov 2012
                    • 7

                    #10
                    Hi Michael,

                    I modified the design according to your suggestion. I hope I followed it well. I have attached a part of the code below:
                    Code:
                    group<-factor(paste(breed, diet, sep="."))
                    sampletable<-cbind(sampletable, group=group)
                    
                    #DESeq2
                    ddsMat<-DESeqDataSetFromMatrix(countData=countdata, colData = sampletable, design = ~group )
                    dds<-DESeq(ddsMat)
                    dTEX_Diet<-results(dds, contrast=c("group","TEX.FLU", "TEX.CON"))
                    dTEX_Diet_sig<-subset(dTEX_Diet, padj < 0.1)
                    dTEX_Diet_sig<-dTEX_Diet_sig[abs(dTEX_Diet_sig$log2FoldChange) >=1,]
                    nrow(dTEX_Diet_sig) 
                    148
                    
                    dTEX_FXT_Diet<-results(dds, contrast=list(c("groupTEX.FLU","groupFXT.CON"), c("groupTEX.CON","groupFXT.FLU")))
                    dTEX_FXT_Diet_sig<-subset(dTEX_FXT_Diet, padj < 0.1)
                    dTEX_FXT_Diet_sig<-dTEX_FXT_Diet_sig[abs(dTEX_FXT_Diet_sig$log2FoldChange) >=1,]
                    nrow(dTEX_FXT_Diet_sig)
                    159
                    
                    #EdgeR
                    cpm<-cpm(countdata)
                    keep<-rowSums(cpm>1)>=5
                    ecountdata<-countdata[keep,]
                    ED<-DGEList(counts=ecountdata, group=group)
                    Y<-calcNormFactors(ED)
                    design<-model.matrix(~0+group)
                    Y<-estimateGLMCommonDisp(Y, verbose=TRUE)
                    Y<-estimateGLMTrendedDisp(Y, design)
                    Y<-estimateGLMTagwiseDisp(Y, design)
                    fit<-glmFit(Y, design)
                    
                    mycontrast<-makeContrasts(FIN_Diet=FIN.FLU-FIN.CON, TEX_Diet=TEX.FLU-TEX.CON, FXT_Diet=FXT.FLU-FXT.CON, FIN_TEX_Diet=(FIN.FLU-FIN.CCON)-(TEX.FLU-TEX.CON), 
                           FIN_FXT_Diet=(FIN.FLU-FIN.CON)-(FXT.FLU-FXT.CON),TEX_FXT_Diet=(TEX.FLU-TEX.CON)-(FXT.FLU-FXT.CON), levels=design)
                    
                    eTEX_Diet<-glmLRT(fit, contrast=mycontrast[,"TEX_Diet"])
                    summary(Y<-decideTestsDGE(eTEX_Diet, p=0.1))
                    eTEX_Diet_tt<-topTags(eTEX_Diet, n=sum(46+37))
                    eTEX_Diet_sig<-eTEX_Diet_tt[abs(eTEX_Diet_tt$table$logFC) >=1.0,]
                    nrow(eTEX_Diet_sig) 
                    76
                    
                    eTEX_FXT_Diet<-glmLRT(fit, contrast=mycontrast[,"TEX_FXT_Diet"])
                    summary(Y<-decideTestsDGE(eTEX_FXT_Diet, p=0.1))
                    eTEX_FXT_Diet_tt<-topTags(eTEX_FXT_Diet, n=sum(52+191))
                    eTEX_FXT_Diet_sig<-eTEX_FXT_Diet_tt[abs(eTEX_FXT_Diet_tt$table$logFC) >=1.0,]
                    nrow(eTEX_FXT_Diet_sig)
                    243
                    Between two programs, 64 genes are common in TEX_Diet and 97 genes are common in TEX_FXT_Diet groups. Do you think it's normal to get that much of differences?

                    Thank you very much.

                    Best Regards,
                    Keysoon
                    Last edited by keysoon; 01-13-2015, 05:15 AM.

                    Comment

                    • Michael Love
                      Senior Member
                      • Jul 2013
                      • 333

                      #11
                      Yes that's normal.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      8 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      15 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...