Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Largest Catalog of Human Protein-Coding Variation Unveiled

    RGC-ME Dataset Offers Valuable Resource for Precision Medicine Efforts

    In a groundbreaking development, researchers have unveiled the largest catalog of human protein-coding variation to date, providing a rich resource for studying rare coding variants and their implications for disease biology. Derived from exome sequencing of an astonishing 985,830 individuals representing diverse ancestral backgrounds, the catalog offers a comprehensive view of approximately 10.5 million missense and 1.1 million predicted loss-of-function variants (pLOF). With a significant proportion of individuals from populations of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry, the catalog promises a globally representative dataset. The resource, known as the RGC-ME dataset, aims to accelerate precision medicine efforts and advance our understanding of rare coding variation.

    Rare coding variants that significantly impact gene function can provide crucial insights into the biology of specific genes. However, due to their rarity, identifying and ascertaining the frequency of these variants has traditionally required extensive sample sizes. The vast scope of the RGC-ME dataset, representing nearly one million exomes, enables researchers to delve into the characteristics and frequencies of these rare variants with unprecedented precision.

    Of particular significance is the inclusion of individuals with rare homozygous pLOF variants in over 4,800 genes. Remarkably, this work marks the first documentation of at least one pLOF homozygote for 1,838 of these genes, adding valuable insights to the human knockout catalog. Studying naturally occurring “knockouts” offers a unique opportunity to identify potential drug targets with improved safety profiles, as deeper phenotyping of these individuals can shed light on the implications of gene mutations.

    Moreover, the dataset helps shed light on gene tolerance to loss-of-function mutations and provides refined estimates of selection against heterozygous loss-of-function. By incorporating data from a larger sample size, researchers have identified 3,459 genes intolerant to loss of function, 83 of which were previously assessed as tolerant. Furthermore, 1,241 genes lacking disease annotations have been identified, opening new avenues for exploration and potential disease associations.

    The RGC-ME dataset not only offers insights into loss-of-function variants but also enriches our understanding of missense variation. Researchers have identified 457 genes tolerant to loss-of-function but depleted in missense variation. These findings provide crucial clues about functional regions within genes and aid in distinguishing between pathogenic and benign missense variants.

    In addition to enhancing our understanding of known variants, the dataset also tackles the challenge of variants with unknown or conflicting significance. By employing splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME, researchers have been able to identify and interpret cryptic splice sites in approximately 10,708 variants reported in ClinVar.

    An interesting observation from the dataset is the presence of clinically actionable genetic variants in approximately 3% of sequenced individuals, as per the ACMG SF 3.1 list of genes. This finding underscores the potential clinical relevance of incorporating genetic information into medical decision-making processes.

    The researchers behind the RGC-ME dataset have made this invaluable resource accessible to the public through a variant allele frequency browser. By doing so, they aim to foster collaboration and provide researchers with the tools necessary to advance their studies in rare coding variation. This extensive catalog will serve as a vital reference for precision medicine initiatives and aid in unraveling the complex interplay between genetics and disease. Read the official preprint here.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Working...
X