Background: What I want to do is to take my RNA-seq data (coming from Tophat2 --> featureCounts --> DESeq2, with associated p-values) and see differentially expressed genes in a specified pathways - for example, all the genes specified to be in the MAPK/ERK pathways according to KEGG.
So, the multiple testing problem... I was told that, if I were only interested in the subset of all the genes reported by DESeq2, I would need to subset the data according to whatever gene list and perform multiple testing correction on that subset only, rather than doing it on the whole of the genome. Firstly, is this correct? Assuming this was correct, I performed standard Benjamini-Hochberg FDR correction with the p.adjust function in R for my subset of pathway genes.
I have since read up a bit more on the various FDR correction methods, and as far as I can tell they all rely on the assumption that the data is independent (at least the BH method). My particular problem is then how I could use any of those methods when I'm quite certain that the genes in my subset are, in fact, dependent, seeing as they're all part of the same pathway?
How do you guys perform multiple testing corrections with data that you expect to be dependent?
So, the multiple testing problem... I was told that, if I were only interested in the subset of all the genes reported by DESeq2, I would need to subset the data according to whatever gene list and perform multiple testing correction on that subset only, rather than doing it on the whole of the genome. Firstly, is this correct? Assuming this was correct, I performed standard Benjamini-Hochberg FDR correction with the p.adjust function in R for my subset of pathway genes.
I have since read up a bit more on the various FDR correction methods, and as far as I can tell they all rely on the assumption that the data is independent (at least the BH method). My particular problem is then how I could use any of those methods when I'm quite certain that the genes in my subset are, in fact, dependent, seeing as they're all part of the same pathway?
How do you guys perform multiple testing corrections with data that you expect to be dependent?
Comment