This project aims to speed up protein engineering by leveraging the power of deep learning to guide the process of directed evolution. Our framework, Proteus, is designed to predict and optimize protein functions in a virtual environment, significantly speeding up the research and development process in biotechnology and pharmaceutical industries.
Protein optimization through in-silico-directed evolution guided by deep learning. Beginning with a wild-type protein sequence from a natural source (env. protein), such as a marine metagenomic sample collected within a given MPA, the sequence is replicated into a population of sequences that undergo a masking procedure where specific amino acids are targeted for mutation. The ESM 2 protein language model then suggests new amino acids for these positions (mask filling), creating a population of mutated sequences. These sequences are folded into 3D structures and evaluated using a suite of fitness models, including affinity to ligand, and additional metrics for stability and other desired properties. The iterative process refines the protein sequences through multiple rounds, selecting sequences that exhibit improved fitness scores, which are represented by larger numerical values. This optimized protein with enhanced properties will then be subject to experimental validation.
High-throughput Screening Simulation: Simulate the process of directed evolution with our deep learning models to predict the most promising protein variants.
Protein Function Prediction: Utilize state-of-the-art algorithms to predict the function of protein sequences.
Optimization Algorithms: Implement various optimization algorithms to find the optimal protein sequence for a desired function.
A directed evolutionary process guided by deep learning to optimize bacterial Lipase A. Selected variants generated in the last optimization step and affinity values corresponding to the best-performing variant, i.e., the lowest affinity, at each step; the affinity of the wild type is included for comparison (colored in red). Throughout the optimization process, affinity decreased from -3.06 kcal/mol in the wild type to -4.46 kcal/mol in the best-performing variant at step 14, corresponding to an affinity improvement of 45.6%. The total ΔΔG change, computed with ThermoMPNN, of each protein variant is displayed as bar plots. During the optimization process, total ΔΔG was constrained to be less than 1 kcal/mol.
To get started with Proteus, follow these steps:
- Clone the repository:
git clone https://github.com/Robaina/Proteus.git
cd Proteus
- Install the required dependencies:
pip install -r requirements.txt
- Build and install Proteus:
poetry build
pip install dist/proteus-*-py3-none-any.whl
- Explore the Jupyter notebooks in the notebooks directory to see examples and tutorials on how to use the framework.
To use Proteus, you can run the CLI tool to start the optimization process. The CLI tool provides a user-friendly interface to interact with the framework and run simulations. Run the following command to see the available options:
proteus --help
We welcome contributions! Please read our CONTRIBUTING.md for details on how to submit pull requests, report issues, and contribute to the code.
This project is licensed under the GPL3 - see the LICENSE file for details.
This project, Proteus, started as a fork of the DirectedEvolution project. We have since made substantial modifications to adapt it to our goals. We extend our sincere gratitude to the creators and contributors of DirectedEvolution for laying the groundwork that inspired our project. Their innovative approach and dedication to advancing the field have been instrumental in shaping our development path.