Hi,
I am doing a DESeq2 analysis using a data set of nine different conditions/time-points.
I first ran the analysis with the complete data set in the count table and than have used the results function to extract the pair-wise results of the DE analysis.
When doing so, I get only 3 genes with an adjusted p-value below 0.05
But if i ran the same analysis with the subset of the data which contains only the columns from the count table for "CTRL0h" and "HP0h" I get many more significant genes.
first I see a difference in the baseMean values. I guess this might be because for some of the conditions/TP I have more samples than the others, so that the baseMean is calculated differently (Am I correct in this assumption?)
But I can't figure out why I get such a big different in the number of DE genes between the two approaches.
Is it because of the differences in the independent filtering step, DESeq2 is automatically doing?
I have tried to deactivate the independent filtering in the results function, but it didn't make the results better.
Any other suggestions?
thanks,
Assa
I am doing a DESeq2 analysis using a data set of nine different conditions/time-points.
I first ran the analysis with the complete data set in the count table and than have used the results function to extract the pair-wise results of the DE analysis.
Code:
>count_table <- read.delim2("count_table_complete.txt",row.names=1) >colData <- read.delim2("metaData.txt") >levels(colData$condition) [1] "CR4W0h" "CR4W24h" "CR4W4h" "CTRL0h" "CTRL24h" "CTRL4h" "HP0h" "HP24h" "HP4h" >cds <- DESeqDataSetFromMatrix ( countData= count_table[,-1], colData = colData, design = ~condition ) >fit = DESeq(cds) >res = results(fit, contrast=c("condition", "CTRL0h" , "HP0h") ) > res log2 fold change (MAP): condition CTRL0h vs HP0h Wald test p-value: condition CTRL0h vs HP0h DataFrame with 39179 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ENSMUSG00000000001 [B][COLOR="Red"]3093.215856[/COLOR][/B] 0.002814819 0.08192671 0.03435777 0.97259186 0.998355
Code:
>table(res$padj<=0.05) FALSE TRUE 1956 3
Code:
> count_table <- read.delim2("count_table_complete.txt",row.names=1) > count_subset <- count_table[,c(2:5,23:26)] > head(count_subset) C20 C22 C23 C24 HP10 HP11 HP12 HP14 ENSMUSG00000000001 2811 2360 3053 3334 2636 2736 2505 3282 > colData_subset <- colData[c(1:4, 23:26),] >cds_subset <- DESeqDataSetFromMatrix ( countData= count_subset, colData = colData_subset, design = ~condition ) >fit_subset = DESeq(cds_subset) >res_subset = results(fit_subset, contrast=c("condition", "CTRL0h" , "HP0h") ) > res_subset log2 fold change (MAP): condition CTRL0h vs HP0h Wald test p-value: condition CTRL0h vs HP0h DataFrame with 39179 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ENSMUSG00000000001 [B][COLOR="Red"]2810.6322989 [/COLOR][/B] -0.003504429 0.05504337 -0.0636667 0.949235621 0.9938743 > table(res$padj<= 0.05) FALSE TRUE 13122 321
But I can't figure out why I get such a big different in the number of DE genes between the two approaches.
Is it because of the differences in the independent filtering step, DESeq2 is automatically doing?
I have tried to deactivate the independent filtering in the results function, but it didn't make the results better.
Any other suggestions?
thanks,
Assa
Comment