Hi,
Hope someone can kindly help me. I am confused about why the FPKM of a gene is different if different comparing samples are used!
I have four samples (say A, B, C, and D) and I run two cuffdiff processes:
cuffdiff I: cuffdiff annotation.gtf -p 8 sampleA sampleB sampleC
cuffdiff II: cuffdiff annotation.gtf -p 8 sampleA sampleB sampleC sampleD
(1)The FPKM of gene X in sampleA (and also in sampleB and sampleC) should be the same in both cuffdiff I and cuffdiff II. However, they are different.
(2) Even though for gene Y, cuffdiff I and cuffdiff II have similar log2(sampleA(Y)/sampleB(Y)), the t-statistic from cuffdiff I and cuffdiff II are quite different. cuffdiff II seem to have smaller value of the t-statistic even if the log ratio of sampleA/sampleB are similar in cuffdiff I and cuffdiff II.
The weird thing is that all the parameters are the same between cuffdiff I and cuffdiff II. The only difference is I add one more sample (sampleD).
Does this mean that the detection of differential expression largely dependent on the number of samples you provide? It seems that both the FPKM and the way they calculate t-statistic (between two samples) are influenced by the number of samples you provide, although some of the samples are not even compared (in my case, sampleC and sampleD). This would cause a big issue especially when dealing with time course data.
Thanks for your kind help.
Hope someone can kindly help me. I am confused about why the FPKM of a gene is different if different comparing samples are used!
I have four samples (say A, B, C, and D) and I run two cuffdiff processes:
cuffdiff I: cuffdiff annotation.gtf -p 8 sampleA sampleB sampleC
cuffdiff II: cuffdiff annotation.gtf -p 8 sampleA sampleB sampleC sampleD
(1)The FPKM of gene X in sampleA (and also in sampleB and sampleC) should be the same in both cuffdiff I and cuffdiff II. However, they are different.
(2) Even though for gene Y, cuffdiff I and cuffdiff II have similar log2(sampleA(Y)/sampleB(Y)), the t-statistic from cuffdiff I and cuffdiff II are quite different. cuffdiff II seem to have smaller value of the t-statistic even if the log ratio of sampleA/sampleB are similar in cuffdiff I and cuffdiff II.
The weird thing is that all the parameters are the same between cuffdiff I and cuffdiff II. The only difference is I add one more sample (sampleD).
Does this mean that the detection of differential expression largely dependent on the number of samples you provide? It seems that both the FPKM and the way they calculate t-statistic (between two samples) are influenced by the number of samples you provide, although some of the samples are not even compared (in my case, sampleC and sampleD). This would cause a big issue especially when dealing with time course data.
Thanks for your kind help.
Comment