A prescription for AI use in medical research
Since proteins are the molecular machines that are responsible for most life processes, studying them gives insight into how diseases develop at the microscopic level. Such information helps in a number of avenues, including drug development.
[email protected], winner of VentureBeat’s 2021 AI for Good award, simulates protein behavior with massive distributed computing power. It uses AI to strategically map each protein that it evaluates, to allocate computing resources, and to identify structural protein anomalies that might indicate signs of brewing disease. Based at the School of Medicine at Washington University in St. Louis, [email protected] launched in 2000 and works with other labs around the world, including ones at Memorial Sloan Kettering Cancer Center and Temple University.
The case for protein study
Linear chains of amino acids fold in specific ways to form proteins. If the mechanism goes awry, it can lead to disease. Alzheimer’s and Huntington’s are caused by such “misfolding” events.
Conventional methods, such as X-ray crystallography, have helped scientists understand protein structures, but understanding folding mechanisms or how the proteins perform their functions over time requires more sophisticated techniques. Computer simulations based on physical models help bridge the gap. There’s a problem here too: scale. “Some of the more complex simulations could easily take hundreds of years for a desktop computer to work through,” said Greg Bowman, PhD, director of [email protected]. “We need supercomputers to run these simulations. [email protected] solves this challenge through a distributed mechanism, using the computing power of volunteers’ machines to conduct the required simulations.”
A volunteer “citizen scientist” is assigned a simulation that matches the specific hardware. “We will send you a starting point, an initial structure of a protein, and the parameters of the model,” Bowman explains. The volunteer captures and sends snapshots of the protein structures at regular intervals. Another volunteer in the chain picks up right where the previous simulation ends. “In this way we have created a whole map, with the snapshots being GPS coordinates,” Bowman said.
How AI helps
Given the scale of the project, [email protected] has to be smart about its mapping procedures. A blind approach that relentlessly simulates everything is probably not necessary. [email protected] iterates between simulations and building maps, which in turn tell them where to look next. AI helps with this decision-making as it sifts through results and determines which parts of the protein are more of the same and which ones are likely to yield more interesting results. After all, some regions of proteins are featureless, like plains, and others have more things going on, like New York City, Bowman explained.
Another challenge that AI addresses: the heterogeneity of volunteer computing resources. Computers with more power, for example, should be assigned more complex simulations. [email protected]’s unsupervised AI learning models understand the resource match and make recommendations accordingly.
Finally, unsupervised AI is also helping researchers at [email protected] find differences in proteins that can be tied more emphatically to disease. “We have developed some deep learning tools where we can take different data sets and learn what distinguishes them,” Bowman said. In such cases, AI can parse through multiple sets of “normal” proteins and learn what “abnormal” looks like.
More recently, [email protected] shifted attention to SARS-CoV-2, the virus that causes COVID-19. Simulations of the spike protein on the virus and its behavior over time have helped scientists with vaccine and drug development, through the COVID Moonshot collaboration, which crowdsources cures for COVID-19.
[email protected] has moved beyond a focus on protein folding mechanisms, Bowman said. He likens the process to studying a car and graduating to its ecosystem of many moving parts. “What would I need to do to change my car design to make it go faster, carry more cargo, or take on more difficult terrain?” Bowman asked. [email protected] is asking similar questions of proteins.
The exascale project — [email protected] performed a billion billion operations per second — is just getting started with a few proteins. Given that the human body is estimated to contain 80,000-400,000 proteins, there is still plenty of unexplored territory. “It feels very much like being an explorer. Only we’re studying intellectual space instead of new continents,” Bowman said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more
Become a member