Dear ALL,
Many of you will be aware of the Golub et al data, one of the first
high-throughput gene expression datasets ever, used for teaching
in many places. I recently noticed that for one of the genes,
the expression pattern in their famous figure does not seem
to match the underlying data (as they are available via the
R package, and described in Wim Krijnen's wonderful book
"Applied Statistics for Bioinformatics using R".) But maybe
I'm just missing or misunderstanding something.
From the Figure of the original publication, it is clear that
expression values of Cyclin D3 for AML are mostly below 0, cf. the forth row in
However, inspecting the data, and concordant with e.g. Figure 2.4 of the book,
"Boxplot of ALL and AML expression values of gene CCND3 Cyclin D3",
it is clear that for AML, expression values of Cyclin D3 are above 0.
I checked that there is only one Cyclin D3 in the dataset; I'm not sure
about the possibility that the data were normalized differently; I checked
some other genes, and I found no such problem for any other gene, so
I'm not sure that it's a normalization issue.
Can anyone help and shed some light on this issue?
Thanks!!
georg
Many of you will be aware of the Golub et al data, one of the first
high-throughput gene expression datasets ever, used for teaching
in many places. I recently noticed that for one of the genes,
the expression pattern in their famous figure does not seem
to match the underlying data (as they are available via the
R package, and described in Wim Krijnen's wonderful book
"Applied Statistics for Bioinformatics using R".) But maybe
I'm just missing or misunderstanding something.
From the Figure of the original publication, it is clear that
expression values of Cyclin D3 for AML are mostly below 0, cf. the forth row in
However, inspecting the data, and concordant with e.g. Figure 2.4 of the book,
"Boxplot of ALL and AML expression values of gene CCND3 Cyclin D3",
it is clear that for AML, expression values of Cyclin D3 are above 0.
I checked that there is only one Cyclin D3 in the dataset; I'm not sure
about the possibility that the data were normalized differently; I checked
some other genes, and I found no such problem for any other gene, so
I'm not sure that it's a normalization issue.
Can anyone help and shed some light on this issue?
Thanks!!
georg