A Faster, AI-Powered Approach to Chromatin Structure Prediction
The three-dimensional organization of DNA within the nucleus plays a fundamental role in regulating gene expression, dictating which genes are accessible and active in different cell types. While experimental methods have provided insights into chromatin architecture, these techniques are labor-intensive and slow. Now, MIT researchers have developed an AI-based approach that can rapidly predict chromatin structures, offering a computational alternative to traditional experimental methods.
The study, published in Science Advances, describes a generative AI model called ChromoGen, which can generate thousands of chromatin structures in minutes. This capability could accelerate research on genome organization and its impact on gene regulation.
Predicting Genome Structure from Sequence
Inside the nucleus, DNA is packaged into chromatin, a dynamic structure that determines which genes are accessible for transcription. The folding of chromatin is influenced by both the DNA sequence itself and epigenetic modifications, which vary between cell types. These structural differences contribute to cell identity and function.
“Deep learning is really good at pattern recognition,” stated Bin Zhang, an associate professor of chemistry at MIT and the senior author of the study. “It allows us to analyze very long DNA segments, thousands of base pairs, and figure out what is the important information encoded in those DNA base pairs.”
Traditional methods such as Hi-C and its variants identify chromatin structures by chemically linking nearby DNA segments, breaking them into fragments, and sequencing them. While effective, these techniques are resource-intensive, requiring days to process a single sample.
ChromoGen, in contrast, predicts chromatin structures computationally. The model consists of two key components: a deep learning module that analyzes DNA sequence and chromatin accessibility data, and a generative AI model trained on experimental chromatin conformations. This dual approach enables the model to predict chromatin structures that align closely with those observed in experiments.
Generative AI Captures Structural Variability
One of the key challenges in predicting chromatin structure is that DNA does not adopt a single conformation. Instead, it exists as an ensemble of possible structures that fluctuate within the nucleus.
“A major complicating factor of predicting the structure of the genome is that there isn’t a single solution that we’re aiming for. There’s a distribution of structures, no matter what portion of the genome you’re looking at,” explained Greg Schuette, an MIT graduate student and co-lead author of the study. “Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do.”
To address this, ChromoGen was trained on more than 11 million chromatin conformations obtained from Dip-C experiments on human B lymphocyte cells. By learning from this vast dataset, the model generates multiple possible structures for each DNA sequence, capturing the variability seen in experimental data.
Once trained, ChromoGen can generate predictions far more efficiently than experimental techniques.
“Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU,” Schuette stated.
Validating Model Accuracy
To test its predictive power, the researchers used ChromoGen to generate structures for more than 2,000 DNA sequences and compared them to experimentally derived structures. The AI-generated conformations closely matched those obtained through Hi-C and Dip-C techniques, demonstrating the model’s accuracy.
Beyond its initial training dataset, the model also performed well when applied to different cell types, suggesting broad applicability. This flexibility could be useful for studying chromatin organization across diverse cellular contexts.
“We typically look at hundreds or thousands of conformations for each sequence, and that gives you a reasonable representation of the diversity of the structures that a particular region can have,” Zhang said. “If you repeat your experiment multiple times, in different cells, you will very likely end up with a very different conformation. That’s what our model is trying to predict.”
Potential Applications
By providing a fast and scalable way to model chromatin architecture, ChromoGen could facilitate research on gene regulation, cellular differentiation, and genome organization. The model could be used to explore how chromatin structure varies across cell types, how changes in chromatin state influence gene expression, and how DNA mutations impact genome folding.
“There are a lot of interesting questions that I think we can address with this type of model,” Zhang shared. While experimental methods will remain essential for validating findings, AI-driven approaches like ChromoGen offer a powerful tool for studying genome organization at scale.
Publication Details
Greg Schuette et al., ChromoGen: Diffusion model predicts single-cell chromatin conformations. Sci. Adv. 11, eadr8265 (2025). DOI:10.1126/sciadv.adr8265
The three-dimensional organization of DNA within the nucleus plays a fundamental role in regulating gene expression, dictating which genes are accessible and active in different cell types. While experimental methods have provided insights into chromatin architecture, these techniques are labor-intensive and slow. Now, MIT researchers have developed an AI-based approach that can rapidly predict chromatin structures, offering a computational alternative to traditional experimental methods.
The study, published in Science Advances, describes a generative AI model called ChromoGen, which can generate thousands of chromatin structures in minutes. This capability could accelerate research on genome organization and its impact on gene regulation.
Predicting Genome Structure from Sequence
Inside the nucleus, DNA is packaged into chromatin, a dynamic structure that determines which genes are accessible for transcription. The folding of chromatin is influenced by both the DNA sequence itself and epigenetic modifications, which vary between cell types. These structural differences contribute to cell identity and function.
“Deep learning is really good at pattern recognition,” stated Bin Zhang, an associate professor of chemistry at MIT and the senior author of the study. “It allows us to analyze very long DNA segments, thousands of base pairs, and figure out what is the important information encoded in those DNA base pairs.”
Traditional methods such as Hi-C and its variants identify chromatin structures by chemically linking nearby DNA segments, breaking them into fragments, and sequencing them. While effective, these techniques are resource-intensive, requiring days to process a single sample.
ChromoGen, in contrast, predicts chromatin structures computationally. The model consists of two key components: a deep learning module that analyzes DNA sequence and chromatin accessibility data, and a generative AI model trained on experimental chromatin conformations. This dual approach enables the model to predict chromatin structures that align closely with those observed in experiments.
Generative AI Captures Structural Variability
One of the key challenges in predicting chromatin structure is that DNA does not adopt a single conformation. Instead, it exists as an ensemble of possible structures that fluctuate within the nucleus.
“A major complicating factor of predicting the structure of the genome is that there isn’t a single solution that we’re aiming for. There’s a distribution of structures, no matter what portion of the genome you’re looking at,” explained Greg Schuette, an MIT graduate student and co-lead author of the study. “Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do.”
To address this, ChromoGen was trained on more than 11 million chromatin conformations obtained from Dip-C experiments on human B lymphocyte cells. By learning from this vast dataset, the model generates multiple possible structures for each DNA sequence, capturing the variability seen in experimental data.
Once trained, ChromoGen can generate predictions far more efficiently than experimental techniques.
“Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU,” Schuette stated.
Validating Model Accuracy
To test its predictive power, the researchers used ChromoGen to generate structures for more than 2,000 DNA sequences and compared them to experimentally derived structures. The AI-generated conformations closely matched those obtained through Hi-C and Dip-C techniques, demonstrating the model’s accuracy.
Beyond its initial training dataset, the model also performed well when applied to different cell types, suggesting broad applicability. This flexibility could be useful for studying chromatin organization across diverse cellular contexts.
“We typically look at hundreds or thousands of conformations for each sequence, and that gives you a reasonable representation of the diversity of the structures that a particular region can have,” Zhang said. “If you repeat your experiment multiple times, in different cells, you will very likely end up with a very different conformation. That’s what our model is trying to predict.”
Potential Applications
By providing a fast and scalable way to model chromatin architecture, ChromoGen could facilitate research on gene regulation, cellular differentiation, and genome organization. The model could be used to explore how chromatin structure varies across cell types, how changes in chromatin state influence gene expression, and how DNA mutations impact genome folding.
“There are a lot of interesting questions that I think we can address with this type of model,” Zhang shared. While experimental methods will remain essential for validating findings, AI-driven approaches like ChromoGen offer a powerful tool for studying genome organization at scale.
Publication Details
Greg Schuette et al., ChromoGen: Diffusion model predicts single-cell chromatin conformations. Sci. Adv. 11, eadr8265 (2025). DOI:10.1126/sciadv.adr8265