The differences in this dataset are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But in some other datasets, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.
I tried setting `betaPrior = FALSE` but still got the difference.
Column `Genotype` in colData is the only tested variable in the model. It contains `WT` and `Mutant` as factors, with `WT` as the 1st level.
Actually if I change column `Genotype` to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.
Code:
data <- readRDS('data.rds') dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype) dds1 <- dds1[rowSums(counts(dds1)) > 1, ] dds1 <- DESeq(dds1, betaPrior = TRUE) res1 <- as.data.frame(results(dds1, contrast = list("GenotypeMutant","GenotypeWT"))) # only reorder the inputs, but the columns of counts still match the rows of colData: dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype) dds2 <- dds2[rowSums(counts(dds2)) > 1, ] dds2 <- DESeq(dds2, betaPrior = TRUE) res2 <- as.data.frame(results(dds2, contrast = list("GenotypeMutant","GenotypeWT"))) res1['ENSMUSG00000069515',] res2['ENSMUSG00000069515',]
Code:
baseMean log2FoldChange lfcSE stat pvalue ENSMUSG00000069515 3043.921 1.142537 0.09271227 12.32347 6.771949e-35 padj ENSMUSG00000069515 4.605602e-31 baseMean log2FoldChange lfcSE stat pvalue ENSMUSG00000069515 3043.921 1.142512 0.09272373 12.32168 6.923947e-35 padj ENSMUSG00000069515 4.708976e-31