I construct a toy dataset with
Then, I get the gene-wise dispersion
Now, I want to check i the dispersion estimates are just fitting a negative binomial on each gene's count (pooled conditions).
Now the dispersion is the inverse of size, so it should be
But this is different from
Code:
> m Control Case GeneA 3891 4591 GeneB 69543 72122 > colData cc Case Case Control Control > d <- DESeqDataSetFromMatrix(m, colData=colData, design =~cc)
Code:
> d <- estimateSizeFactors(d) > d <- estimateDispersionsGeneEst(d) > mcols(d) DataFrame with 2 rows and 5 columns baseMean baseVar allZero dispGeneEst dispGeneEstConv <numeric> <numeric> <logical> <numeric> <logical> 1 4228.732 37181.63 FALSE 0.001842478 TRUE 2 70857.604 10439534.06 FALSE 0.002065126 TRUE
Code:
> fitdistr(as.vector(round(counts(d, normalize=T)[1,])),"negative binomial") size mu 1207.3898 4228.5000 (1543.6506) ( 97.5636) > fitdistr(as.vector(round(counts(d, normalize=T)[2,])),"negative binomial") size mu 970.5831 70857.5000 ( 982.6550) ( 1616.2617)
Now the dispersion is the inverse of size, so it should be
Code:
> c(1/1207.3898, 1/970.5831) [1] 0.0008282329 0.0010303085
Code:
> mcols(d)$dispGeneEst [1] 0.001842478 0.002065126
Comment