I'm using the WGCNA package in R to analyse RNA sequencing data. I've hit a point follwing the tutorial, and adapting it to my data, where I don't understand the rationale behind one of the steps, implying that I am perhaps missing something fundamental about the process. I've read the main publication associated with the pakcage and haven't found the answer to my question. If anyone could help me figure out what I'm missing, I would be very grateful.
In the tutorial, the similarity between module eigengenes is calculated as the correlation coefficient between pairs of eigengenes. Similarly, the function mergeCloseModules also uses the correlation coefficient, and as far as I can tell this cannot be changed. It seems to me that the absolute value of the correlation coefficient should be used instead. The eigengene of a module is simply the first Principle Component of that module's expression matrix. As I understand PCA, the direction of a PC is arbitrary, which means that the sign of the correlation coefficient between eigengenes should also be arbitrary. Strongly negatively correlated eigengenes are therefore just as similar as strongly positively correlated ones. If the signed correlation coefficient is used, then strongly negatively correlated genes will be considered highly dissimilar.
Any help filling in my understanding would be greatly appreciated,
Eric
In the tutorial, the similarity between module eigengenes is calculated as the correlation coefficient between pairs of eigengenes. Similarly, the function mergeCloseModules also uses the correlation coefficient, and as far as I can tell this cannot be changed. It seems to me that the absolute value of the correlation coefficient should be used instead. The eigengene of a module is simply the first Principle Component of that module's expression matrix. As I understand PCA, the direction of a PC is arbitrary, which means that the sign of the correlation coefficient between eigengenes should also be arbitrary. Strongly negatively correlated eigengenes are therefore just as similar as strongly positively correlated ones. If the signed correlation coefficient is used, then strongly negatively correlated genes will be considered highly dissimilar.
Any help filling in my understanding would be greatly appreciated,
Eric