Though it is a cornerstone of virtually every process that occurs in living organisms, the proper folding and transport of biological proteins is a notoriously difficult and time-consuming process to experimentally study.
Building on the models of DeepMind’s AlphaFold 2, a machine learning tool able to predict the detailed three-dimensional structures of individual proteins, AF2Complex — short for AlphaFold 2 Complex — is a deep learning tool designed to predict the physical interactions of multiple proteins. With these predictions, AF2Complex is able to calculate which proteins are likely to interact with each other to form functional complexes in unprecedented detail.
“We essentially conduct computational experiments that try to figure out the atomic details of supercomplexes (large interacting groups of proteins) important to biological functions,” explained Jeffrey Skolnick, Regents’ Professor and Mary and Maisie Gibson Chair in the School of Biological Sciences, and one of the corresponding authors of the study. With AF2Complex, which was developed last year by the same research team, it’s “like using a computational microscope powered by deep learning and supercomputing.”
In their latest study, the researchers used this ‘computational microscope’ to examine a complicated protein synthesis and transport pathway, hoping to clarify how proteins in the pathway interact to ultimately transport a newly synthesized protein from the interior to the outer membrane of the bacteria — and identify players that experiments might have missed. Insights into this pathway may identify new targets for antibiotic and therapeutic design while providing a foundation for using AF2Complex to computationally expedite this type of biology research as a whole.
Created by London-based artificial intelligence lab DeepMind, AlphaFold 2 is a deep learning tool able to generate accurate predictions about the three-dimensional structure of single proteins using just their building blocks, amino acids. Taking things a step further, AF2Complex uses these structures to predict the likelihood that proteins are able to interact to form a functional complex, what aspects of each structure are the likely interaction sites, and even what protein complexes are likely to pair up to create even larger functional groups called supercomplexes.
“The successful development of AF2Complex earlier this year makes us believe that this approach has tremendous potential in identifying and characterizing the set of protein-protein interactions important to life,” shared Mu Gao, a senior research scientist at Georgia Tech. “To further convince the broad molecular biology community, we [had to] demonstrate it with a more convincing, high impact application.”
The researchers chose to apply AF2Complex to a pathway in Escherichia coli (E. coli), a model organism in life sciences research commonly used for experimental DNA manipulation and protein production due to its relative simplicity and fast growth.
To demonstrate the tool’s power, the team examined the synthesis and transport of proteins that are essential for exchanging nutrients and responding to environmental stressors: outer membrane proteins, or OMPs for short. These proteins reside on the outermost membrane of gram-negative bacteria, a large family of bacteria characterized by the presence of inner and outer membranes, like E. coli. However, the proteins are created inside the cell and must be transported to their final destinations.
“After more than two decades of experimental studies, researchers have identified some of the protein complexes of key players, but certainly not all of them,” Gao explained. AF2Complex “could enable us to discover some novel and interesting features of the OMP biogenesis pathway that were missed in previous experimental studies.”
Using the Summit supercomputer at the Oak Ridge National Laboratory, the team, which included computer science undergraduate Davi Nakajima An, put AF2Complex to the test. They compared a few proteins known to be important in the synthesis and transport of OMPs to roughly 1,500 other proteins — all of the known proteins in E. coli’s cell envelope — to see which pairs the tool computed as most likely to interact, and which of those pairs were likely to form supercomplexes.
To determine if AF2Complex’s predictions were correct, the researchers compared the tool’s predictions to known experimental data. “Encouragingly,” said Skolnick, “among the top hits from computational screening, we found previously known interacting partners.” Even within those protein pairs known to interact, AF2Complex was able to highlight structural details of those interactions that explain data from previous experiments, lending additional confidence to the tool’s accuracy.
In addition to known interactions, AF2Complex predicted several unknown pairs. Digging further into these unexpected partners revealed details on what aspects of the pairs might interact to form larger groups of functional proteins, likely active configurations of complexes that have previously eluded experimentalists, and new potential mechanisms for how OMPs are synthesized and transported.
“Since the outer membrane pathway is both vital and unique to gram-negative bacteria, the key proteins involved in this pathway could be novel targets for new antibiotics,” said Skolnick. “As such, our work that provides molecular insights about these new drug targets might be valuable to new therapeutic design.”
Beyond this pathway, the researchers are hopeful that AF2Complex could mean big things for biology research.
“Unlike predicting structures of a single protein sequence, predicting the structural model of a supercomplex can be very complicated, especially when the components or stoichiometry of the complex is unknown,” Gao noted. “In this regard, AF2Complex could be a new computational tool for biologists to conduct trial experiments of different combinations of proteins,” potentially expediting and increasing the efficiency of this type of biology research as a whole.
AF2Complex is an open-source tool available to the public and can be downloaded here.
This work was supported in part by the DOE Office of Science, Office of Biological and Environmental Research (DOE DE-SC0021303) and the Division of General Medical Sciences of the National Institute Health (NIH R35GM118039). DOI: https://doi.org/10.7554
Writer: Audra Davidson
College of Sciences at Georgia Tech
Editor: Jess Hunt-Ralston
Director of Communications
College of Sciences at Georgia Tech