I'm getting different p-values and FCs from DESeq2 by simply changing the order of the columns in the count table. I can't understand why. I made sure the cols of count table match the rows of colData in both.
The differences in this dataset are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But in some other datasets, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.
I tried setting `betaPrior = FALSE` but still got the difference.
Column `Genotype` in colData is the only tested variable in the model. It contains `WT` and `Mutant` as factors, with `WT` as the 1st level.
Actually if I change column `Genotype` to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.
The differences in this dataset are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But in some other datasets, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.
I tried setting `betaPrior = FALSE` but still got the difference.
Column `Genotype` in colData is the only tested variable in the model. It contains `WT` and `Mutant` as factors, with `WT` as the 1st level.
Actually if I change column `Genotype` to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.
Code:
data <- readRDS('data.rds') dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype) dds1 <- dds1[rowSums(counts(dds1)) > 1, ] dds1 <- DESeq(dds1, betaPrior = TRUE) res1 <- as.data.frame(results(dds1, contrast = list("GenotypeMutant","GenotypeWT"))) # only reorder the inputs, but the columns of counts still match the rows of colData: dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype) dds2 <- dds2[rowSums(counts(dds2)) > 1, ] dds2 <- DESeq(dds2, betaPrior = TRUE) res2 <- as.data.frame(results(dds2, contrast = list("GenotypeMutant","GenotypeWT"))) res1['ENSMUSG00000069515',] res2['ENSMUSG00000069515',] sessionInfo( )
Code:
baseMean log2FoldChange lfcSE stat pvalue ENSMUSG00000069515 3043.921 1.142537 0.09271227 12.32347 6.771949e-35 padj ENSMUSG00000069515 4.605602e-31 baseMean log2FoldChange lfcSE stat pvalue ENSMUSG00000069515 3043.921 1.142512 0.09272373 12.32168 6.923947e-35 padj ENSMUSG00000069515 4.708976e-31
Code:
R version 4.1.3 (2022-03-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS/LAPACK: /opt/conda/envs/WAT_diffex/lib/libopenblasp-r0.3.20.so locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] forcats_0.5.1 stringr_1.4.0 [3] dplyr_1.0.9 purrr_0.3.4 [5] readr_2.1.2 tidyr_1.2.0 [7] tibble_3.1.7 ggplot2_3.3.6 [9] tidyverse_1.3.1 DESeq2_1.34.0 [11] SummarizedExperiment_1.24.0 Biobase_2.54.0 [13] MatrixGenerics_1.6.0 matrixStats_0.62.0 [15] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 [17] IRanges_2.28.0 S4Vectors_0.32.3 [19] BiocGenerics_0.40.0 loaded via a namespace (and not attached): [1] bitops_1.0-7 fs_1.5.2 lubridate_1.8.0 [4] bit64_4.0.5 RColorBrewer_1.1-3 httr_1.4.3 [7] numDeriv_2016.8-1.1 tools_4.1.3 backports_1.4.1 [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.2 [13] colorspace_2.0-3 apeglm_1.16.0 withr_2.5.0 [16] tidyselect_1.1.2 bit_4.0.4 compiler_4.1.3 [19] cli_3.3.0 rvest_1.0.2 xml2_1.3.3 [22] DelayedArray_0.20.0 scales_1.2.0 mvtnorm_1.1-3 [25] genefilter_1.76.0 XVector_0.34.0 pkgconfig_2.0.3 [28] bbmle_1.0.25 dbplyr_2.2.0 fastmap_1.1.0 [31] rlang_1.0.2 readxl_1.4.0 rstudioapi_0.13 [34] RSQLite_2.2.8 generics_0.1.2 jsonlite_1.8.0 [37] BiocParallel_1.28.3 RCurl_1.98-1.7 magrittr_2.0.3 [40] GenomeInfoDbData_1.2.7 Matrix_1.4-1 Rcpp_1.0.8.3 [43] munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.1 [46] stringi_1.7.6 MASS_7.3-57 zlibbioc_1.40.0 [49] plyr_1.8.7 grid_4.1.3 blob_1.2.3 [52] parallel_4.1.3 bdsmatrix_1.3-6 crayon_1.5.1 [55] lattice_0.20-45 Biostrings_2.62.0 haven_2.5.0 [58] splines_4.1.3 annotate_1.72.0 hms_1.1.1 [61] KEGGREST_1.34.0 locfit_1.5-9.5 pillar_1.7.0 [64] geneplotter_1.72.0 reprex_2.0.1 XML_3.99-0.9 [67] glue_1.6.2 modelr_0.1.8 png_0.1-7 [70] vctrs_0.4.1 tzdb_0.3.0 cellranger_1.1.0 [73] gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6 [76] emdbook_1.3.12 xtable_1.8-4 broom_0.8.0 [79] coda_0.19-4 survival_3.3-1 AnnotationDbi_1.56.1 [82] memoise_2.0.1 ellipsis_0.3.2