Hello,
I am analyzing a larger set of RNA-Seq data (20 controls/20 diseased, Illumina Hi-Seq 2000, 101 bp, paired-end) in tissue that is affected in later stages of our disease of interest. Therefore, we were expecting to observe more subtle expression differences for our DE comparisons.
Nevertheless, when using DESeq v1.12.0 for the protein-coding genes passing a low-count threshold (e.g. at least 2 counts per million in > half the samples, ~14,300 genes), the number of differentially expressed genes at FDR level of significance is only around 50. The problem is that when doing pathway or functional analyses, a larger number of genes would be preferable.
While I am aware that the recent version of DESeq is more conservative than other DE programs (e.g. edgeR), I think it might be problematic to choose a less stringent FDR p-value for this reason alone. Consequently, I was wondering if adding some additional filter(s) for the genes included in the DE analyses might help reduce the number of multiple tests and increase the number of significant results. For example, are there any accepted variance filters for RNA-Seq DE analyses? Something like removing genes with very high variance across biological replicates (I've noticed edgeR has the tendency to incorrectly label genes in this category as differentially expressed) or removing genes with low variance across all samples? Any suggestions on specific thresholds?
Thank you for your help!
Alexandra
I am analyzing a larger set of RNA-Seq data (20 controls/20 diseased, Illumina Hi-Seq 2000, 101 bp, paired-end) in tissue that is affected in later stages of our disease of interest. Therefore, we were expecting to observe more subtle expression differences for our DE comparisons.
Nevertheless, when using DESeq v1.12.0 for the protein-coding genes passing a low-count threshold (e.g. at least 2 counts per million in > half the samples, ~14,300 genes), the number of differentially expressed genes at FDR level of significance is only around 50. The problem is that when doing pathway or functional analyses, a larger number of genes would be preferable.
While I am aware that the recent version of DESeq is more conservative than other DE programs (e.g. edgeR), I think it might be problematic to choose a less stringent FDR p-value for this reason alone. Consequently, I was wondering if adding some additional filter(s) for the genes included in the DE analyses might help reduce the number of multiple tests and increase the number of significant results. For example, are there any accepted variance filters for RNA-Seq DE analyses? Something like removing genes with very high variance across biological replicates (I've noticed edgeR has the tendency to incorrectly label genes in this category as differentially expressed) or removing genes with low variance across all samples? Any suggestions on specific thresholds?
Thank you for your help!
Alexandra
Comment