
Artificial intelligence firm SandboxAQ has released a dataset designed to help researchers predict drug-protein binding, a key step in early drug development.
The company generated approximately 5.2 million synthetic 3D molecules using Nvidia chips, instead of through traditional laboratory experiments.
The dataset, released on Wednesday, is intended to support the development of faster drug screening methods.
Drug-protein binding refers to the process by which a drug molecule attaches to a specific protein in the body.
It is a crucial early step in determining whether a new treatment might be effective.
Conventional methods for assessing this interaction typically involve time-consuming lab-based testing.
SandboxAQ, which spun out of Google parent company Alphabet, produced the synthetic molecules by modelling them from existing experimental data.
While the molecules themselves have not been observed in real-world settings, they were generated using scientific equations grounded in actual lab results.
The dataset can be used to train AI models that rapidly predict whether a pharmaceutical compound will bind to a target protein.
This could accelerate the early stages of drug discovery while reducing reliance on manual laboratory processes.
The company’s approach combines traditional scientific computing with artificial intelligence.
Although equations for atomic interactions are well understood, applying them to three-dimensional pharmaceutical molecules quickly becomes computationally intensive for standard systems
Nadia Harhen is general manager of AI simulation at SandboxAQ.
Harhen said: “This is a long-standing problem in biology that we’ve all, as an industry, been trying to solve.
“All of these computationally generated structures are tagged to a ground-truth experimental data, and so when you pick this data set and you train models, you can actually use the synthetic data in a way that’s never been done before.”
The dataset is publicly available and can be used to develop AI tools that perform faster than manual calculations while retaining accuracy comparable to lab results.
SandboxAQ plans to commercialise its own AI models trained using the data, aiming to replicate laboratory-quality performance in virtual environments.
The company has raised nearly US$1bn in venture capital to fund its work applying AI to scientific computing challenges across industries, including pharmaceuticals.