Introducing the ICGC-TCGA DREAM Somatic Mutation Calling Challenge
We are very excited to announce an international effort to benchmark methods for identifying somatic mutations in cancer genomes from whole-genome sequencing. This message outlines the competition. We encourage all groups to download a set of standard datasets and submit their results. This will be useful to both algorithm-developers, to benchmark their newest techniques, and to data-analysts, to verify their current pipelines are internationally competitive! Details below, and registration at:


What problem are we trying to solve?
Cancer is a family of diseases caused by somatic genetic mutations. Fundamental questions remain about the causes of these mutations and their roles in shaping cellular phenotypes. The particular variations in tumour genomes can influence which treatments best suit patients. The genomics revolution is now systematically characterizing every somatic variation in every tumour for large cohorts (>300 patients). The bottleneck has now become the informatics analysis of these data. Accurately identifying these variants remains an open problem in the field. The major factor influencing the poor performance of today’s mutation callers is the heterogeneity of tumour biopsies. Cancer samples are a complex mixture of normal cells of different types and multiple tumour sub-clones, mixed together in ways that vary spatially within individual tumours. These sources of noise have profound effects on mutation callers. Benchmark studies conducted by TCGA and ICGC have discovered that different mutation calling software run on the same data have limited intersection between the resulting lists of mutations (overlaps of only ~20% are typical). Thus, a great debate has ensued about which software should be run to yield a unified set of calls for major cancer genomics efforts.
How are we trying to solve it?
In response, we have launched the ICGC-TCGA DREAM Somatic Mutation Calling (SMC) Challenge, a community-based collaborative competition of researchers from across the world, to find the most accurate SNV calling and break-point detection algorithms. This Challenge will create a “living benchmark” for mutation-detection pipelines, continually evaluating the best methods and accelerating the adoption of standards. evaluating the best methods and accelerating the adoption of standards. It will create a general platform extensible to addressing other key problems in cancer genome analysis, such as reconstructing tumour phylogeny, detecting fusion transcripts from RNA sequencing data, distinguishing driver from passenger mutations, amongst others.
How will the Challenge be run?
The Challenge will include two components. First, to help bring in researchers from other fields, a series of synthetic tumours of increasing difficulty will be simulated and made available to any team in the world, with a live leaderboard showing top results. Second, a set of 10 tumour-normal pairs from actual patients will be made available to any team, after approval of data-access by the ICGC Data Access Compliance Office. Importantly, methods will be evaluated in the real tumours by experimentally verification on the same patient DNA used for the original sequencing. Validation will be conducted for thousands (i.e., 5,000-10,000) of predictions via deep-sequencing using an independent technology, with the entire Challenge completed in about a year. Both somatic single-nucleotide and structural variation prediction accuracy will be benchmarked on both synthetic and patient-derived data, providing a global picture of mutation-detection accuracy.
The best performing methods will be applied retrospectively to over ten thousand cancer genomes, and the results distributed publicly to the research community via CGHub. Moreover, the top-scoring methods will be made available as an open source tools, allowing users around the world to process their own data using the same pipelines validated and used by the ICGC and TCGA. Challenge-assisted peer review and early editorial feedback will help identify publishable themes that cut across multiple approaches. The involvement of major journals introduces the possibility of reaching a broad audience and raises the impact and exposure of contestant contributions, which in turn increase incentives and overall morale. Nature Publishing Group has stepped up to coordinate publication models stemming from the SMC challenge.
What resources are available to Challenge participants?
The Challenge is run on the Synapse (https://www.synapse.org/) open computational platform. Synapse serves not just as a data repository but also as a set of tools for conducting collaborative analysis and sharing and documenting data, models and analysis methods. Synapse enables researchers to seamlessly and transparently conduct, track and share their ongoing work – building up living research projects in real-time.
GeneTorrent client, an open-source software developed by Annai Systems, is available for local data download. A comprehensive description of GeneTorrent features and operation is available on the CGHub website: https://cghub.ucsc.edu/docs/user/index.html
Google is offering Google Cloud Platform credits of $2,000 to approved DREAM contest participants, including free access to contest data in Google Cloud Storage. These credits can be used for Compute Engine VMs and other Cloud Platform services. Access to Challenge data is provided via a Google Cloud Storage bucket, so all computation and submissions can be performed on the Google Cloud Platform.
Who is running the Challenge?
The organizers include leaders of prominent national and international initiatives related to cancer-genome science. Leaders of the ICGC (Stein, Boutros) and TCGA (Stuart) cancer genomics projects will ensure broad exposure in the cancer genomics community and sanction that the results will set the standard for sequence analysis performed by the ICGC and TCGA. Challenge organizers also include leaders of DREAM Challenges (Stolovitzky, Friend, Margolin and Norman).
Where can I ask more questions?
We encourage all questions be posted on the ICGC-TCGA DREAM Mutation Calling Challenge Forum: http://support.sagebase.org/sagebase...ling_challenge
We are very excited to announce an international effort to benchmark methods for identifying somatic mutations in cancer genomes from whole-genome sequencing. This message outlines the competition. We encourage all groups to download a set of standard datasets and submit their results. This will be useful to both algorithm-developers, to benchmark their newest techniques, and to data-analysts, to verify their current pipelines are internationally competitive! Details below, and registration at:
What problem are we trying to solve?
Cancer is a family of diseases caused by somatic genetic mutations. Fundamental questions remain about the causes of these mutations and their roles in shaping cellular phenotypes. The particular variations in tumour genomes can influence which treatments best suit patients. The genomics revolution is now systematically characterizing every somatic variation in every tumour for large cohorts (>300 patients). The bottleneck has now become the informatics analysis of these data. Accurately identifying these variants remains an open problem in the field. The major factor influencing the poor performance of today’s mutation callers is the heterogeneity of tumour biopsies. Cancer samples are a complex mixture of normal cells of different types and multiple tumour sub-clones, mixed together in ways that vary spatially within individual tumours. These sources of noise have profound effects on mutation callers. Benchmark studies conducted by TCGA and ICGC have discovered that different mutation calling software run on the same data have limited intersection between the resulting lists of mutations (overlaps of only ~20% are typical). Thus, a great debate has ensued about which software should be run to yield a unified set of calls for major cancer genomics efforts.
How are we trying to solve it?
In response, we have launched the ICGC-TCGA DREAM Somatic Mutation Calling (SMC) Challenge, a community-based collaborative competition of researchers from across the world, to find the most accurate SNV calling and break-point detection algorithms. This Challenge will create a “living benchmark” for mutation-detection pipelines, continually evaluating the best methods and accelerating the adoption of standards. evaluating the best methods and accelerating the adoption of standards. It will create a general platform extensible to addressing other key problems in cancer genome analysis, such as reconstructing tumour phylogeny, detecting fusion transcripts from RNA sequencing data, distinguishing driver from passenger mutations, amongst others.
How will the Challenge be run?
The Challenge will include two components. First, to help bring in researchers from other fields, a series of synthetic tumours of increasing difficulty will be simulated and made available to any team in the world, with a live leaderboard showing top results. Second, a set of 10 tumour-normal pairs from actual patients will be made available to any team, after approval of data-access by the ICGC Data Access Compliance Office. Importantly, methods will be evaluated in the real tumours by experimentally verification on the same patient DNA used for the original sequencing. Validation will be conducted for thousands (i.e., 5,000-10,000) of predictions via deep-sequencing using an independent technology, with the entire Challenge completed in about a year. Both somatic single-nucleotide and structural variation prediction accuracy will be benchmarked on both synthetic and patient-derived data, providing a global picture of mutation-detection accuracy.
The best performing methods will be applied retrospectively to over ten thousand cancer genomes, and the results distributed publicly to the research community via CGHub. Moreover, the top-scoring methods will be made available as an open source tools, allowing users around the world to process their own data using the same pipelines validated and used by the ICGC and TCGA. Challenge-assisted peer review and early editorial feedback will help identify publishable themes that cut across multiple approaches. The involvement of major journals introduces the possibility of reaching a broad audience and raises the impact and exposure of contestant contributions, which in turn increase incentives and overall morale. Nature Publishing Group has stepped up to coordinate publication models stemming from the SMC challenge.
What resources are available to Challenge participants?
The Challenge is run on the Synapse (https://www.synapse.org/) open computational platform. Synapse serves not just as a data repository but also as a set of tools for conducting collaborative analysis and sharing and documenting data, models and analysis methods. Synapse enables researchers to seamlessly and transparently conduct, track and share their ongoing work – building up living research projects in real-time.
GeneTorrent client, an open-source software developed by Annai Systems, is available for local data download. A comprehensive description of GeneTorrent features and operation is available on the CGHub website: https://cghub.ucsc.edu/docs/user/index.html
Google is offering Google Cloud Platform credits of $2,000 to approved DREAM contest participants, including free access to contest data in Google Cloud Storage. These credits can be used for Compute Engine VMs and other Cloud Platform services. Access to Challenge data is provided via a Google Cloud Storage bucket, so all computation and submissions can be performed on the Google Cloud Platform.
Who is running the Challenge?
- Paul C. Boutros, Ontario Institute for Cancer Research
- Lincoln D. Stein, Ontario Institute for Cancer Research
- Josh Stuart, University of California, Santa Cruz
- Gustavo Stolovitzky, IBM, DREAM
- Stephen Friend, Sage Bionetworks
- Adam Margolin, Sage Bionetworks
- Thea Norman, Sage Bionetworks
The organizers include leaders of prominent national and international initiatives related to cancer-genome science. Leaders of the ICGC (Stein, Boutros) and TCGA (Stuart) cancer genomics projects will ensure broad exposure in the cancer genomics community and sanction that the results will set the standard for sequence analysis performed by the ICGC and TCGA. Challenge organizers also include leaders of DREAM Challenges (Stolovitzky, Friend, Margolin and Norman).
Where can I ask more questions?
We encourage all questions be posted on the ICGC-TCGA DREAM Mutation Calling Challenge Forum: http://support.sagebase.org/sagebase...ling_challenge
Comment