So I've been given transcriptome data for ... quite a lot of samples (>500). It's all from various tumours from one specific tumour type. There's a clinical grading for how serious each tumour is.
My concern is that, given it's cancer, within the groups formed by the clinical grading, there's probably a fair amount of variability. Which means that if we attempt differential expression analysis, then it's possible (likely?) that the differences within the clinical grading could obscure the differences between the groups.
If one is looking for the best way to identify several different sets of genes for each clinical grading group, so as to be able to identify them, what's the best approach? My initial thoughts involve some form of clustering to identify sub groups within each clinical grading, but without knowing which genes to look at beforehand, that's going to end up with me attempting to cluster 20,000 odd genes, which, if I recall correctly, would be prohibitive, time-wise.
What's the best approach for this sort of thing?
My concern is that, given it's cancer, within the groups formed by the clinical grading, there's probably a fair amount of variability. Which means that if we attempt differential expression analysis, then it's possible (likely?) that the differences within the clinical grading could obscure the differences between the groups.
If one is looking for the best way to identify several different sets of genes for each clinical grading group, so as to be able to identify them, what's the best approach? My initial thoughts involve some form of clustering to identify sub groups within each clinical grading, but without knowing which genes to look at beforehand, that's going to end up with me attempting to cluster 20,000 odd genes, which, if I recall correctly, would be prohibitive, time-wise.
What's the best approach for this sort of thing?