Advanced Integration of GWAS and Genetic Expression Data
Researchers from the University of Chicago have developed an innovative statistical tool that significantly advances the identification of disease-causing genetic variants. This tool, detailed in Nature Genetics, represents a significant step in genetic research. By integrating genome-wide association studies (GWAS) data with genetic expression predictions, this tool mitigates the prevalence of false positives and pinpoints causal genes with greater accuracy.
GWAS, a staple method in genetics, involves comparing genome sequences of individuals with a specific disease against healthy counterparts to identify genetic variants that increase disease risk. However, the complexity of human diseases, often influenced by multiple genes and environmental factors, limits GWAS's capability to only pinpoint associations, not causality. This limitation stems from the phenomenon of linkage disequilibrium, where DNA is inherited in blocks, making nearby genetic variants highly correlated. "You may have many genetic variants in a block that are all correlated with disease risk, but you don't know which one is actually the causal variant," explains Xin He, Ph.D., Associate Professor of Human Genetics at the University of Chicago and the senior author of the study.
Tackling the Challenge of Non-Coding Genomes
A significant challenge in this field is the interpretation of genetic variants located in non-coding genomes. Researchers address this by using expression quantitative trait loci (eQTLs), which are genetic variants associated with gene expression. However, the correlation between nearby variants and eQTLs often leads to false positives. Current methods utilizing eQTL data suffer from a high rate of false positive risk genes, sometimes exceeding 50%.
Introducing the cTWAS Model
To overcome these challenges, Xin He and Matthew Stephens, Ph.D., Ralph W. Gerard Professor in the Departments of Statistics and Human Genetics, have introduced a new method called causal-Transcriptome-Wide Association Studies (cTWAS). This method employs advanced statistical techniques, specifically a Bayesian multiple regression model, to account for multiple genes and variants simultaneously, reducing false positive rates. "If you look at one at a time, you'll have false positives, but if you look at all the nearby genes and variants together, you are much more likely to find the causal gene," states He.
Practical Applications and Future Developments
The utility of cTWAS was demonstrated in a study on the genetics of LDL cholesterol levels. Unlike existing eQTL methods that misidentified genes, cTWAS accurately identified 35 putative causal genes for LDL, many of which were previously unreported. This points to new biological pathways and potential treatment targets.
The cTWAS software is now available for download from He’s lab website. Looking forward, He aims to expand the tool’s capabilities to include other omics data types, such as splicing and epigenetics, and to utilize eQTLs from multiple tissue types. “The software will allow people to do analyses that connect genetic variations to phenotypes. That's really the key challenge facing the entire field,” He remarks. This development heralds a more precise era in genetic research, promising enhanced understanding and treatment of genetic diseases.
Original Publication
Zhao, S., Crouse, W., Qian, S. et al. Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits. Nat Genet (2024). https://doi.org/10.1038/s41588-023-01648-9
Researchers from the University of Chicago have developed an innovative statistical tool that significantly advances the identification of disease-causing genetic variants. This tool, detailed in Nature Genetics, represents a significant step in genetic research. By integrating genome-wide association studies (GWAS) data with genetic expression predictions, this tool mitigates the prevalence of false positives and pinpoints causal genes with greater accuracy.
GWAS, a staple method in genetics, involves comparing genome sequences of individuals with a specific disease against healthy counterparts to identify genetic variants that increase disease risk. However, the complexity of human diseases, often influenced by multiple genes and environmental factors, limits GWAS's capability to only pinpoint associations, not causality. This limitation stems from the phenomenon of linkage disequilibrium, where DNA is inherited in blocks, making nearby genetic variants highly correlated. "You may have many genetic variants in a block that are all correlated with disease risk, but you don't know which one is actually the causal variant," explains Xin He, Ph.D., Associate Professor of Human Genetics at the University of Chicago and the senior author of the study.
Tackling the Challenge of Non-Coding Genomes
A significant challenge in this field is the interpretation of genetic variants located in non-coding genomes. Researchers address this by using expression quantitative trait loci (eQTLs), which are genetic variants associated with gene expression. However, the correlation between nearby variants and eQTLs often leads to false positives. Current methods utilizing eQTL data suffer from a high rate of false positive risk genes, sometimes exceeding 50%.
Introducing the cTWAS Model
To overcome these challenges, Xin He and Matthew Stephens, Ph.D., Ralph W. Gerard Professor in the Departments of Statistics and Human Genetics, have introduced a new method called causal-Transcriptome-Wide Association Studies (cTWAS). This method employs advanced statistical techniques, specifically a Bayesian multiple regression model, to account for multiple genes and variants simultaneously, reducing false positive rates. "If you look at one at a time, you'll have false positives, but if you look at all the nearby genes and variants together, you are much more likely to find the causal gene," states He.
Practical Applications and Future Developments
The utility of cTWAS was demonstrated in a study on the genetics of LDL cholesterol levels. Unlike existing eQTL methods that misidentified genes, cTWAS accurately identified 35 putative causal genes for LDL, many of which were previously unreported. This points to new biological pathways and potential treatment targets.
The cTWAS software is now available for download from He’s lab website. Looking forward, He aims to expand the tool’s capabilities to include other omics data types, such as splicing and epigenetics, and to utilize eQTLs from multiple tissue types. “The software will allow people to do analyses that connect genetic variations to phenotypes. That's really the key challenge facing the entire field,” He remarks. This development heralds a more precise era in genetic research, promising enhanced understanding and treatment of genetic diseases.
Original Publication
Zhao, S., Crouse, W., Qian, S. et al. Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits. Nat Genet (2024). https://doi.org/10.1038/s41588-023-01648-9