Examine the power of data integration in a real-world clinical settings. Many approaches work well on some data-sets yet not on others. We here challenge you to demonstrate a unified single approach to data-integration that matches or outperforms the current state of the art on two different diseases, breast cancer and neuroblastoma.
Breast cancer affects about 3 million women every year (McGuire et al, Cancers 7), and this number is growing fast, especially in developed countries. Can you improve on the large Metabric study [http://molonc.bccrc.ca/aparicio-lab/research/metabric/] (Curtis et al., Nature 486, and Dream Challenge, Margolin et al, Sci Transl Med 5)? The cohort is biologically heterogeneous with all five distinct PAM50 breast cancer subtypes represented. Matched profiles for microarray and copy number data as well as clinical information (survival times, multiple prognostic markers, therapy data) are available for about 2,000 patients.
Neuroblastoma is the most common extracranial solid tumor in children. The base study compared RNA-seq and Agilent microarray gene expression profiles for clinical endpoint prediction of 498 children patients (FDA SEQC - Zhang et al, Genome Biology 16). The published summary data are complemented by raw signal level data for gene expression arrays, RNA-Seq expression profiles, and extended clinical meta-data. In addition, we provide matched aCGH data for 145 of these patients for copy number analysis (Fischer lab, Köln - Stigliani et al, Neoplasia 14, Coco et al, IJC 131, Kocak et al, Cell Death Dis 4, Theissen et al, Genes Chromosomes Cancer 53).
Analysis suggestions:
Technical:
Biological:
Join us for a stimulating scientific meeting and lively discussions in Chicago 07-08 July 2018!
Follow us on twitter @CAMDA_conf
camda.info
Breast cancer affects about 3 million women every year (McGuire et al, Cancers 7), and this number is growing fast, especially in developed countries. Can you improve on the large Metabric study [http://molonc.bccrc.ca/aparicio-lab/research/metabric/] (Curtis et al., Nature 486, and Dream Challenge, Margolin et al, Sci Transl Med 5)? The cohort is biologically heterogeneous with all five distinct PAM50 breast cancer subtypes represented. Matched profiles for microarray and copy number data as well as clinical information (survival times, multiple prognostic markers, therapy data) are available for about 2,000 patients.
Neuroblastoma is the most common extracranial solid tumor in children. The base study compared RNA-seq and Agilent microarray gene expression profiles for clinical endpoint prediction of 498 children patients (FDA SEQC - Zhang et al, Genome Biology 16). The published summary data are complemented by raw signal level data for gene expression arrays, RNA-Seq expression profiles, and extended clinical meta-data. In addition, we provide matched aCGH data for 145 of these patients for copy number analysis (Fischer lab, Köln - Stigliani et al, Neoplasia 14, Coco et al, IJC 131, Kocak et al, Cell Death Dis 4, Theissen et al, Genes Chromosomes Cancer 53).
Analysis suggestions:
Technical:
- Efficient horizontal data integration (inter-type), combining gene expression, CNV/CNA, clinical markers …
- Efficient vertical data integration (intra-type), e.g., combining the expression profiles from complementary high-throughput technologies (RNA-seq and microarrays), combining information across patients, …
- Characterization of differences in algorithm performance between the two diseases, identification of possible causes and mitigation strategies.
Biological:
- Better survival time prediction by effective data integration or improved models.
- Advancing our understanding of the mechanisms behind cancer progression or therapy response by effective data integration or novel functional (network/pathway) analysis.
- Improved cancer subgrouping.
Join us for a stimulating scientific meeting and lively discussions in Chicago 07-08 July 2018!
Follow us on twitter @CAMDA_conf
camda.info
Comment