Seqanswers Leaderboard Ad



No announcement yet.

Exploring Human Diversity Through Large-Scale Omics


  • Exploring Human Diversity Through Large-Scale Omics

    Click image for larger version

Name:	Population Level Omics.jpg
Views:	156
Size:	683.6 KB
ID:	325852

    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed understanding of human biology and disease.

    Current Efforts
    Among the most recent initiatives advancing the field are the UK Biobank and the All of Us Research Program. The UK Biobank is an extensive biomedical database and resource housing anonymized genetic, lifestyle, and health information, as well as biological samples from half a million participants in the UK. Similarly, the All of Us Research Program is collecting data from over one million diverse participants across the US, intending to advance personalized health care.

    Both efforts are yielding new biological insights, as reflected in the consistent publication of studies based on their data. For instance, whole genome sequencing data from 490,640 UK Biobank participants allowed scientists to identify over 1.5 billion variants, with 18.8 times more variants than genotyping arrays and 40 times more than whole-exome sequencing3. This variant data is expected to help characterize disease mechanisms, aid drug discovery, and improve understanding of non-coding variants. In a recent publication from the All of Us program, 245,388 clinical-grade genome sequences were released containing a significant number of participants from under-represented communities4. Researchers from this study also identified over one billion genetic variants, including more than 275 million previously unreported, with over 3.9 million having coding consequences.

    Other Large-Scale Projects
    In addition to the UK Biobank and the All of Us Research Program, numerous other global initiatives are making substantial contributions. These include GenomeAsia 100K, H3Africa, the Mexico City Prospective Study, the Estonian Biobank, and FinnGen, among others. The data from several of these projects have been instrumental in revealing extensive variation across underrepresented groups. For example, the Mexico City Prospective Study allowed researchers to identify high relatedness and genetic diversity among 140,000 adults in Mexico City5. The indigenous segments showed more homozygous loss-of-function variants, and the researchers developed an improved imputation reference panel for Indigenous Mexican ancestry, enhancing genetic study accuracy for these populations.

    Expanding into Proteomics
    Outside of genomics, several of these projects are advancing large-scale proteomics. Recent work includes a study that explored the associations between rare genetic variants and plasma protein levels in nearly 50,000 UK Biobank participants6. Researchers identified over 5,400 rare variant-protein associations and 1,962 gene-protein associations. The integration of genomics and proteomics in this work allowed researchers to identify the significant role of rare variants in protein abundance and the discovery of potential biomarkers and drug targets.

    Another significant study involving data from the UK Biobank identified 618 associations between plasma proteins and cancer risk, with 107 persisting beyond seven years post-blood draw7. Four proteins (CD74, TNFRSF1B, ADAM8, SFTPA2) had strong evidence across observational and genetic analyses, suggesting their potential role in cancer development and as early biomarkers.
    Additionally, a study employing machine learning with UK Biobank proteomics data improved cardiovascular disease risk prediction8. The new machine learning model outperformed traditional clinical models and was consistent across sexes and ethnicities, offering enhanced prediction accuracy and interpretability.

    A recent review on population and proteomics in health and disease discussed how advances in proteomic technologies, such as enhanced coverage and throughput of proteomic assays, improvements in multiplexed protein assays, and the combination of proteomic data with other omics, have enabled this type of large-scale research9. These studies have further improved the quantification of thousands of proteins, helping to understand genomic and non-genomic correlates of the soluble proteome, constructing disease prediction biomarker panels, and comparing mass spectrometry with affinity-based platforms.

    Challenges and the Road Ahead
    Advances in numerous technologies, such as next-generation sequencing, proteomic assays, and machine learning, have enabled significant progress in large-scale omics projects. However, these developments also present several challenges that must be overcome to sustain this progress.

    One of the primary challenges for population-level research is the storage and computational power needed to process these large-scale datasets10. As omics data grows in volume and complexity, traditional computational infrastructures struggle to keep up. The difficulty of this process is further increased when integrating diverse data types, such as genomics, proteomics, transcriptomics, and metabolomics. Combining these data types requires sophisticated computational tools and algorithms capable of handling and analyzing multi-dimensional data.

    Development of these tools has been highlighted in recent studies, such as one that optimized the SAIGE algorithm for GPU use, achieving a 20-fold acceleration in genome-wide association studies (GWAS) across 2,068 traits from 635,969 participants in the Million Veteran Program11. This optimization enabled efficient analysis of vast datasets, significantly reducing computational time and costs. The optimized method is also adaptable to various computing environments, including cloud platforms, making it more accessible for large-scale research projects.

    Understanding this vast diversity among humans also boosts efforts like pangenomics. As the Human Pangenome Reference Consortium has stated, “No single genome can represent the genetic diversity of our species12.” Therefore, pangenomes provide a comprehensive representation of genetic diversity across different populations, allowing for more accurate gene-disease association studies, better variant discovery, and improved understanding of complex traits by incorporating the full spectrum of human genomic variation, reducing biases present in linear reference genomes.

    These initiatives have been incredibly successful and are beginning to address the critical need for diversity in omic data. While there is still much progress to be made, the research community is clearly on the right track.

    1. Powledge TM. Human genome project completed. Genome Biology. 2003;4(1) doi:
    2. Nurk S, Koren S, Rhie A, et al. The complete sequence of a human genome. Science. 2022;376(6588):44-53. doi:
    3. Li S, Carss KJ, Halldorsson, Bjarni V, Cortes A. Whole-genome sequencing of half a million UK Biobank participants. medRxiv. Published online January 1, 2023:2023.12.06.23299426. doi:
    4. Bick AG, Metcalf GA, Mayo KR, et al. Genomic data in the All of Us Research Program. Nature. 2024;627(8003):340-346. doi:
    5. Ziyatdinov A, Torres J, Alegre-Díaz J, et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature. 2023;622(7984):784-793. doi:
    6. Dhindsa RS, Burren OS, Sun BB, et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature. 2023;622(7982):339-347. doi:
    7. Papier K, Atkins JR, Tammy, et al. Identifying proteomic risk factors for cancer using prospective and exome analyses of 1463 circulating proteins and risk of 19 cancers in the UK Biobank. Nature Communications. 2024;15(1):4010. doi:
    8. Climente-González H, Oh M, Chajewska U, et al. Interpretable Machine Learning Leverages Proteomics to Improve Cardiovascular Disease Risk Prediction and Biomarker Identification. medRxiv. Published online January 1, 2024:2024.01.12.24301213. doi:
    9. Sun BB, Suhre K, Gibson BW. Promises and Challenges of Populational Proteomics in Health and Disease. Molecular & Cellular Proteomics. 2024;23(7). doi:
    10. Rivas MA, Chang C. Efficient storage and regression computation for population-scale genome sequencing studies. bioRxiv. Published online January 1, 2024:2024.04.11.589062. doi:
    11. Rodriguez A, Kim Y, Nandi TN, et al. Accelerating Genomeand PhenomeWide Association Studies using GPUs – A case study using data from the Million Veteran Program. bioRxiv. Published online January 1, 2024:2024.05.17.594583. doi:
    12. Liao W, Asri M, Ebler J, et al. A draft human pangenome reference. Nature. 2023;617(7960):312-324. doi:

      Please sign into your account to post comments.

    About the Author


    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles


    • Exploring the Dynamics of the Tumor Microenvironment
      by seqadmin

      The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
      07-08-2024, 03:19 PM
    • Exploring Human Diversity Through Large-Scale Omics
      by seqadmin

      In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
      06-25-2024, 06:43 AM
    • Best Practices for Single-Cell Sequencing Analysis
      by seqadmin

      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      06-06-2024, 07:15 AM