I want to cluster RNA-seq samples (417 samples) using based on the expression levels of a group of selected genes using NMF method. Having a matrix (mat) of expression levels with sample names as columns and gene names as rows I tried following command from Bioconductor NMF package:
I figure out that rank=4 is the optimum.
First do you think it is a reasonable way of sample clustering?
If so, should I normalize/transform the expression levels before clustering (like TMM normalization or log or asinh transformations)?
And finally, I need to know the names of samples in each calculated clusters. Using command "basisnames" I got NULL. What command should I try to get the samples orders in clusters?
Thanks for the help.
Code:
library(NMF)
Code:
res <- nmf(mat, 2:10, nrun = 200, seed = 123456)
First do you think it is a reasonable way of sample clustering?
If so, should I normalize/transform the expression levels before clustering (like TMM normalization or log or asinh transformations)?
And finally, I need to know the names of samples in each calculated clusters. Using command "basisnames" I got NULL. What command should I try to get the samples orders in clusters?
Thanks for the help.