Announcement

Collapse
No announcement yet.

Different DESeq2 results (both FC and pvalue) by changing col/row orders in input

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different DESeq2 results (both FC and pvalue) by changing col/row orders in input

    I'm getting different p-values and FCs from DESeq2 by simply changing the order of the columns in the count table. I can't understand why. I made sure the cols of count table match the rows of colData in both.

    The differences in this dataset are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But in some other datasets, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.

    I tried setting `betaPrior = FALSE` but still got the difference.

    Column `Genotype` in colData is the only tested variable in the model. It contains `WT` and `Mutant` as factors, with `WT` as the 1st level.

    Actually if I change column `Genotype` to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.


    Code:
    data <- readRDS('data.rds')
    
    dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype)
    dds1 <- dds1[rowSums(counts(dds1)) > 1, ]
    dds1 <- DESeq(dds1, betaPrior = TRUE)
    res1 <- as.data.frame(results(dds1, contrast = list("GenotypeMutant","GenotypeWT")))
    
    # only reorder the inputs, but the columns of counts still match the rows of colData:
    dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype)
    dds2 <- dds2[rowSums(counts(dds2)) > 1, ]
    dds2 <- DESeq(dds2, betaPrior = TRUE)
    res2 <- as.data.frame(results(dds2, contrast = list("GenotypeMutant","GenotypeWT")))
    
    res1['ENSMUSG00000069515',]
    res2['ENSMUSG00000069515',]
    
    sessionInfo( )



    Code:
    baseMean log2FoldChange lfcSE stat pvalue
    ENSMUSG00000069515 3043.921 1.142537 0.09271227 12.32347 6.771949e-35
    padj
    ENSMUSG00000069515 4.605602e-31
    
    
    baseMean log2FoldChange lfcSE stat pvalue
    ENSMUSG00000069515 3043.921 1.142512 0.09272373 12.32168 6.923947e-35
    padj
    ENSMUSG00000069515 4.708976e-31


    Code:
    R version 4.1.3 (2022-03-10)
    Platform: x86_64-conda-linux-gnu (64-bit)
    Running under: Debian GNU/Linux 10 (buster)
    
    Matrix products: default
    BLAS/LAPACK: /opt/conda/envs/WAT_diffex/lib/libopenblasp-r0.3.20.so
    
    locale:
    [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
    [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
    [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
    [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
    
    attached base packages:
    [1] stats4 stats graphics grDevices utils datasets methods
    [8] base
    
    other attached packages:
    [1] forcats_0.5.1 stringr_1.4.0
    [3] dplyr_1.0.9 purrr_0.3.4
    [5] readr_2.1.2 tidyr_1.2.0
    [7] tibble_3.1.7 ggplot2_3.3.6
    [9] tidyverse_1.3.1 DESeq2_1.34.0
    [11] SummarizedExperiment_1.24.0 Biobase_2.54.0
    [13] MatrixGenerics_1.6.0 matrixStats_0.62.0
    [15] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
    [17] IRanges_2.28.0 S4Vectors_0.32.3
    [19] BiocGenerics_0.40.0
    
    loaded via a namespace (and not attached):
    [1] bitops_1.0-7 fs_1.5.2 lubridate_1.8.0
    [4] bit64_4.0.5 RColorBrewer_1.1-3 httr_1.4.3
    [7] numDeriv_2016.8-1.1 tools_4.1.3 backports_1.4.1
    [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.2
    [13] colorspace_2.0-3 apeglm_1.16.0 withr_2.5.0
    [16] tidyselect_1.1.2 bit_4.0.4 compiler_4.1.3
    [19] cli_3.3.0 rvest_1.0.2 xml2_1.3.3
    [22] DelayedArray_0.20.0 scales_1.2.0 mvtnorm_1.1-3
    [25] genefilter_1.76.0 XVector_0.34.0 pkgconfig_2.0.3
    [28] bbmle_1.0.25 dbplyr_2.2.0 fastmap_1.1.0
    [31] rlang_1.0.2 readxl_1.4.0 rstudioapi_0.13
    [34] RSQLite_2.2.8 generics_0.1.2 jsonlite_1.8.0
    [37] BiocParallel_1.28.3 RCurl_1.98-1.7 magrittr_2.0.3
    [40] GenomeInfoDbData_1.2.7 Matrix_1.4-1 Rcpp_1.0.8.3
    [43] munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.1
    [46] stringi_1.7.6 MASS_7.3-57 zlibbioc_1.40.0
    [49] plyr_1.8.7 grid_4.1.3 blob_1.2.3
    [52] parallel_4.1.3 bdsmatrix_1.3-6 crayon_1.5.1
    [55] lattice_0.20-45 Biostrings_2.62.0 haven_2.5.0
    [58] splines_4.1.3 annotate_1.72.0 hms_1.1.1
    [61] KEGGREST_1.34.0 locfit_1.5-9.5 pillar_1.7.0
    [64] geneplotter_1.72.0 reprex_2.0.1 XML_3.99-0.9
    [67] glue_1.6.2 modelr_0.1.8 png_0.1-7
    [70] vctrs_0.4.1 tzdb_0.3.0 cellranger_1.1.0
    [73] gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6
    [76] emdbook_1.3.12 xtable_1.8-4 broom_0.8.0
    [79] coda_0.19-4 survival_3.3-1 AnnotationDbi_1.56.1
    [82] memoise_2.0.1 ellipsis_0.3.2
    Last edited by metheuse; 06-20-2022, 08:23 AM.
Working...
X