The Somatic Mutation Working Group of the FDA-led MAQC/SEQC2 Consortium has released the multi-center and cross-platform whole-genome and whole-exome reference sequencing data sets of one pair of tumor-normal breast cancer cell lines (HCC1395 and HCC1395BL). These resources can be used to produce machine learning models and bioinformatic methods related to somatic mutation detections. The data descriptor is published in Scientific Data. All the raw sequencing data are on SRA:SRP162370. Additionally, some of the BWA MEM aligned BAM files are also available on NCBI's FTP server.
The genomic DNA was produced in a single batch by ATCC to ensure sample homogeneity, i.e., the same DNA material with the same genome content was being sequenced in every replicate. The following are some of the working group's papers that have used those data sets:
Find all of SEQC2's publications:
The MAQC website
The genomic DNA was produced in a single batch by ATCC to ensure sample homogeneity, i.e., the same DNA material with the same genome content was being sequenced in every replicate. The following are some of the working group's papers that have used those data sets:
- Established the high-confidence somatic mutation call set that may be used as the "ground truth" for benchmarking analyses or machine learning modelings. Fang L.T. et al. Nat Biotechnol (2021) / PMID:34504347 / SharedIt
- Used the high-confidence somatic mutation call set as the "ground truth" to investigate how different sample preparations, sequencing library kits, and bioinformatic algorithms affect the accuracy of the somatic mutation pipelines, and develop best practices. Xiao W. et al. Nat Biotechnol (2021) / PMID:34504346 / SharedIt
- Used the high-confidence somatic mutation call set as the labeled training data to build more accurate machine learning models for somatic mutation detections. Sahraeian S.M.E. et al. bioRxiv
Find all of SEQC2's publications:
The MAQC website