I'm clearly out of my comfort zone here but it needs to be done. So I've taken a cursory look into clustering some RNA-seq data and I am more confused then when I started. Does anyone know of any good reviews that could help my mathematically limited knowledge-base understand some of the complexities of hierarchical clustering, especially with respect to RNA-seq data.
I have taken a look at a couple things I have found.
1. DESeq has some clustering abilities but seems pretty limited.
2. MBCluster.Seq ( http://cran.r-project.org/web/packag...luster.Seq.pdf ) looks interesting but seems to lack human friendly documentation.
3. hopach ( http://bioc.ism.ac.jp/2.6/bioc/html/hopach.html ) looks interesting but there is no documentation on how to feed it RNA-seq data (or any data for that matter).
Data preprocessing steps I've considered but may or may not be correct.
1. Seems best to select genes with high variance.
2. Perhaps values should be log2 transformed??
3. Filtering genes with low read counts seems like a good idea.
4. Perhaps depending on what you want to see, divide by lowest expressed condition for each gene so you are looking a fold change.
5. Are there tools that take into account variance within a condition???
It all looks pretty complicated so any insights would be greatly appreciated.
Thanks!!
I have taken a look at a couple things I have found.
1. DESeq has some clustering abilities but seems pretty limited.
2. MBCluster.Seq ( http://cran.r-project.org/web/packag...luster.Seq.pdf ) looks interesting but seems to lack human friendly documentation.
3. hopach ( http://bioc.ism.ac.jp/2.6/bioc/html/hopach.html ) looks interesting but there is no documentation on how to feed it RNA-seq data (or any data for that matter).
Data preprocessing steps I've considered but may or may not be correct.
1. Seems best to select genes with high variance.
2. Perhaps values should be log2 transformed??
3. Filtering genes with low read counts seems like a good idea.
4. Perhaps depending on what you want to see, divide by lowest expressed condition for each gene so you are looking a fold change.
5. Are there tools that take into account variance within a condition???
It all looks pretty complicated so any insights would be greatly appreciated.
Thanks!!
Comment