A team of researchers led by Jianxin Wang at the School of Computer Science and Engineering, Central South University, conducted an extensive analysis of bioinformatics foundation models (FMs). This involved categorizing them into four primary types: language FMs, vision FMs, graph FMs, and multimodal FMs. Their study reviewed the capabilities of these models across different biological domains, including genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis.
Bioinformatics FMs utilize supervised and unsupervised machine learning techniques to tackle fundamental and integrative biological challenges. These models are increasingly used to process high-throughput biological data and provide a computational framework for molecular biology research.
"Taking advantage of the latest bioinformatics FM, one can achieve unprecedented accuracy, realize an integrated AI model, and perform richer downstream analysis," stated Wang. His study provides an in-depth discussion of various prediction and generation models while highlighting their applications in solving complex biological problems. This includes considerations such as biological database integration, training strategies, hyperparameter tuning, and model interpretability.
The study also emphasizes the importance of model pre-training strategies, benchmarking methods, and approaches for improving AI interpretability. To illustrate the practical advancements of bioinformatics FMs, Wang referenced DeepMind’s development of artificial intelligence systems for protein structure prediction. "Taking the classic biological problem ‘protein three-dimensional structure reconstruction’ as a representative demonstration, DeepMind has developed three iterations of an artificial intelligence system over the past five years," Wang noted.
Overall, this study provides a guide for scientists seeking to integrate FMs into their bioinformatics research.
Publication Details
Fei Guo, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, Jianxin Wang, Foundation models in bioinformatics, National Science Review, 2025; nwaf028, https://doi.org/10.1093/nsr/nwaf028
Bioinformatics FMs utilize supervised and unsupervised machine learning techniques to tackle fundamental and integrative biological challenges. These models are increasingly used to process high-throughput biological data and provide a computational framework for molecular biology research.
"Taking advantage of the latest bioinformatics FM, one can achieve unprecedented accuracy, realize an integrated AI model, and perform richer downstream analysis," stated Wang. His study provides an in-depth discussion of various prediction and generation models while highlighting their applications in solving complex biological problems. This includes considerations such as biological database integration, training strategies, hyperparameter tuning, and model interpretability.
The study also emphasizes the importance of model pre-training strategies, benchmarking methods, and approaches for improving AI interpretability. To illustrate the practical advancements of bioinformatics FMs, Wang referenced DeepMind’s development of artificial intelligence systems for protein structure prediction. "Taking the classic biological problem ‘protein three-dimensional structure reconstruction’ as a representative demonstration, DeepMind has developed three iterations of an artificial intelligence system over the past five years," Wang noted.
Overall, this study provides a guide for scientists seeking to integrate FMs into their bioinformatics research.
Publication Details
Fei Guo, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, Jianxin Wang, Foundation models in bioinformatics, National Science Review, 2025; nwaf028, https://doi.org/10.1093/nsr/nwaf028