Hello all,
I've been using edgeR to analyze a rna sequence dataset generated by our collaborators. The experimental design is simple: 3 wt (control) and 3 mutant (treatment) individuals.
Unfortunately, I have noticed a problem with the output. The magnitude of the logFC value that is reported using topTags(et)$table (after executing the exact test) does not match the value I get when I use excel to independently calculate the logged (base 2) ratio of the average cpm's for each category (i.e., log2(avg.cpm.mut)/(avg.cpm.wt)). For example:
AVG.CPM.WT AVG.CPM.MUT CALC.CPM.OUT.LOGFC EDGER.OUT.LOGFC PAIRWISE.DIFF
Gene1 112.104446 87.70609579 -0.354094474 -0.348296103 0.005798
Gene2 112.5008464 123.8749047 0.138948091 0.178525207 0.039577
and so on...
Note that I used the cpm(y) function to extract the cpm values for each individual entry (as calculated by edgeR) before calculating their average values.
The pairwise difference between the two methods for calculating the logFC ranges from very small (<0.000001) to large (>0.2). Regardless of how the logFC is calculated, however, the sign of the logFC value is the same (that is, the directionality of differential gene expression doesn't change). I thought this might just be rounding error, because the two logFC values are highly correlated and the scatter is very consistent throughout the range of values... or am I missing something obvious here?
Has anyone else run into this problem? If so, how did you resolve this issue?
Hopefully my explanation is not too confusing - please let me know if I am not being clear enough about what is going on.
Thanks
I've been using edgeR to analyze a rna sequence dataset generated by our collaborators. The experimental design is simple: 3 wt (control) and 3 mutant (treatment) individuals.
Unfortunately, I have noticed a problem with the output. The magnitude of the logFC value that is reported using topTags(et)$table (after executing the exact test) does not match the value I get when I use excel to independently calculate the logged (base 2) ratio of the average cpm's for each category (i.e., log2(avg.cpm.mut)/(avg.cpm.wt)). For example:
AVG.CPM.WT AVG.CPM.MUT CALC.CPM.OUT.LOGFC EDGER.OUT.LOGFC PAIRWISE.DIFF
Gene1 112.104446 87.70609579 -0.354094474 -0.348296103 0.005798
Gene2 112.5008464 123.8749047 0.138948091 0.178525207 0.039577
and so on...
Note that I used the cpm(y) function to extract the cpm values for each individual entry (as calculated by edgeR) before calculating their average values.
The pairwise difference between the two methods for calculating the logFC ranges from very small (<0.000001) to large (>0.2). Regardless of how the logFC is calculated, however, the sign of the logFC value is the same (that is, the directionality of differential gene expression doesn't change). I thought this might just be rounding error, because the two logFC values are highly correlated and the scatter is very consistent throughout the range of values... or am I missing something obvious here?
Has anyone else run into this problem? If so, how did you resolve this issue?
Hopefully my explanation is not too confusing - please let me know if I am not being clear enough about what is going on.
Thanks
Comment