Skip to content

Robaina/Proteus

In silico protein optimization through AI-guided directed evolution

License: GPL v3 Python Version Platforms Code style: black Project Status: Active – The project has reached a stable, usable state and is being actively developed. Contributor Covenant

💡 Overview

This project aims to speed up protein engineering by leveraging the power of deep learning to guide the process of directed evolution. Our framework, Proteus, is designed to predict and optimize protein functions in a virtual environment, significantly speeding up the research and development process in biotechnology and pharmaceutical industries.

Protein optimization through in-silico-directed evolution guided by deep learning. Beginning with a wild-type protein sequence from a natural source (env. protein), such as a marine metagenomic sample collected within a given MPA, the sequence is replicated into a population of sequences that undergo a masking procedure where specific amino acids are targeted for mutation. The ESM 2 protein language model then suggests new amino acids for these positions (mask filling), creating a population of mutated sequences. These sequences are folded into 3D structures and evaluated using a suite of fitness models, including affinity to ligand, and additional metrics for stability and other desired properties. The iterative process refines the protein sequences through multiple rounds, selecting sequences that exhibit improved fitness scores, which are represented by larger numerical values. This optimized protein with enhanced properties will then be subject to experimental validation.

🔅 Features

High-throughput Screening Simulation: Simulate the process of directed evolution with our deep learning models to predict the most promising protein variants.

Protein Function Prediction: Utilize state-of-the-art algorithms to predict the function of protein sequences.

Optimization Algorithms: Implement various optimization algorithms to find the optimal protein sequence for a desired function.

A directed evolutionary process guided by deep learning to optimize bacterial Lipase A. Selected variants generated in the last optimization step and affinity values corresponding to the best-performing variant, i.e., the lowest affinity, at each step; the affinity of the wild type is included for comparison (colored in red). Throughout the optimization process, affinity decreased from -3.06 kcal/mol in the wild type to -4.46 kcal/mol in the best-performing variant at step 14, corresponding to an affinity improvement of 45.6%. The total ΔΔG change, computed with ThermoMPNN, of each protein variant is displayed as bar plots. During the optimization process, total ΔΔG was constrained to be less than 1 kcal/mol.

🔧 Getting Started

To get started with Proteus, follow these steps:

  1. Clone the repository:
git clone https://github.com/Robaina/Proteus.git
cd Proteus
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Build and install Proteus:
poetry build
pip install dist/proteus-*-py3-none-any.whl
  1. Explore the Jupyter notebooks in the notebooks directory to see examples and tutorials on how to use the framework.

🚀 Usage

To use Proteus, you can run the CLI tool to start the optimization process. The CLI tool provides a user-friendly interface to interact with the framework and run simulations. Run the following command to see the available options:

proteus --help

:octocat: Contributing

We welcome contributions! Please read our CONTRIBUTING.md for details on how to submit pull requests, report issues, and contribute to the code.

🔓 License

This project is licensed under the GPL3 - see the LICENSE file for details.

💜 Acknowledgments

This project, Proteus, started as a fork of the DirectedEvolution project. We have since made substantial modifications to adapt it to our goals. We extend our sincere gratitude to the creators and contributors of DirectedEvolution for laying the groundwork that inspired our project. Their innovative approach and dedication to advancing the field have been instrumental in shaping our development path.

About

A python tool to optimize protein properties through AI-guided directed evolution

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published