I have a question about filtering prior to DE analysis using DESeq -is common practice to just use all raw count values for conditions (i.e. disease/control) or would it be better to ensure that every gene being tested for differential expression has at least say 5 or 10 reads in each of the replicates within the condition groups. I understand that DESeq can account for low read counts & I've found in my data that I get a larger number of genes that are significant when I filter as I guess there aren't as many tests being conducted however I'm unsure which i should go with.
I know this kind of question will depend on the data itself and there may be no right or wrong answers but I know very few people doing this kind of analysis with whom I can discuss ...and in my data I've found that not all of the genes that are significant with the unfiltered data are still significant when I filter(despite the inc in the number of genes reaching significance) so Im just wondering what people consider best practice ??
thanks for any advice
I know this kind of question will depend on the data itself and there may be no right or wrong answers but I know very few people doing this kind of analysis with whom I can discuss ...and in my data I've found that not all of the genes that are significant with the unfiltered data are still significant when I filter(despite the inc in the number of genes reaching significance) so Im just wondering what people consider best practice ??
thanks for any advice
Comment