Hi everyone,
I never had any serious statistic course, I did not like it, and now I regret it.
Here is my problem :
Thanks to RNAseq (Illumina), I have expression (RPKM) of ~43.000 genes at 8 time measures, typically day 1, day 2, day 3 ....day 8.
Among them, one gene is important (let's call him GOI, Gene of Interest).
I would like to screen the ~42.999 genes according to their expression pattern and select the closest ones compared to GOI pattern.
I heard about Covariance but I don't really get it. I found this formula :
=> Cov(X,Y)=1/N Sum (Xi-Xaverage)(Yi-Yaverage)
According to some research, I may find a number for each gene and just sort them : the highest the best.
But, I also see that a Covariance calculation can lead to a Covariance Matrix ! And then one have to calculate correlation things, etc. (This is what numpy gives me in my scrip using np.cov() ).
So.. what to use ? What are the differences between a covariance matrix and a simple covariance calculation ? I also heard that I should prefer "reads number" instead of "rpkm" measures to screen them, why ?
Thanks a lot for any kind of help.
M.
I never had any serious statistic course, I did not like it, and now I regret it.
Here is my problem :
Thanks to RNAseq (Illumina), I have expression (RPKM) of ~43.000 genes at 8 time measures, typically day 1, day 2, day 3 ....day 8.
Among them, one gene is important (let's call him GOI, Gene of Interest).
I would like to screen the ~42.999 genes according to their expression pattern and select the closest ones compared to GOI pattern.
I heard about Covariance but I don't really get it. I found this formula :
=> Cov(X,Y)=1/N Sum (Xi-Xaverage)(Yi-Yaverage)
According to some research, I may find a number for each gene and just sort them : the highest the best.
But, I also see that a Covariance calculation can lead to a Covariance Matrix ! And then one have to calculate correlation things, etc. (This is what numpy gives me in my scrip using np.cov() ).
So.. what to use ? What are the differences between a covariance matrix and a simple covariance calculation ? I also heard that I should prefer "reads number" instead of "rpkm" measures to screen them, why ?
Thanks a lot for any kind of help.
M.
Comment