Skip to content

Commit f802cce

Browse files
committed
Adds documentation on RaSP
1 parent 2714dc7 commit f802cce

File tree

4 files changed

+190
-20
lines changed

4 files changed

+190
-20
lines changed

docs/protein-optimization/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ Stability of mutations of a wildtype using `foldx`
8585
Solvent accessibility of mutations of a wildtype using `foldx`
8686
:::
8787

88-
:::{grid-item-card} Protein Stability (using `RaSP`) [WIP]
88+
:::{grid-item-card} Protein Stability (using `RaSP`)
8989
:link: ./using_poli/objective_repository/RaSP.html
9090
:columns: 6
91-
Rapid Stability Predictions of single mutations from a wildtype. [WIP]
91+
Rapid Stability Predictions of single mutations from a wildtype.
9292
:::
9393

9494
:::{grid-item-card} RFP Fluorescence Protein Stability (using `lambo`)

docs/protein-optimization/references.bib

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,31 @@ @inproceedings{huang:TDC:2021
2828
booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
2929
year = {2021},
3030
url = {https://openreview.net/forum?id=8nvgnORnoWr}
31-
}
31+
}
32+
33+
@article{Blaabjerg:RaSP:2023,
34+
title = {Rapid protein stability prediction using deep learning representations},
35+
volume = {12},
36+
issn = {2050-084X},
37+
doi = {10.7554/eLife.82593},
38+
abstractnote = {Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available—including via a Web interface—and enables large-scale analyses of stability in experimental and predicted protein structures.},
39+
journal = {eLife},
40+
publisher = {eLife Sciences Publications, Ltd},
41+
author = {Blaabjerg, Lasse M and Kassem, Maher M and Good, Lydia L and Jonsson, Nicolas and Cagiada, Matteo and Johansson, Kristoffer E and Boomsma, Wouter and Stein, Amelie and Lindorff-Larsen, Kresten},
42+
editor = {Faraldo-Gómez, José D and Weigel, Detlef and Ben-Tal, Nir and Echave, Julian},
43+
year = {2023},
44+
month = may,
45+
pages = {e82593}
46+
}
47+
48+
@inproceedings{Boomsma:spherical_conv:2017,
49+
title = {Spherical convolutions and their application in molecular modelling},
50+
volume = {30},
51+
url = {https://proceedings.neurips.cc/paper_files/paper/2017/file/1113d7a76ffceca1bb350bfe145467c6-Paper.pdf},
52+
booktitle = {Advances in Neural Information Processing Systems},
53+
publisher = {Curran Associates, Inc.},
54+
author = {Boomsma, Wouter and Frellsen, Jes},
55+
editor = {Guyon, I. and Luxburg, U. Von and Bengio, S. and Wallach, H. and Fergus, R. and Vishwanathan, S. and Garnett, R.},
56+
year = {2017}
57+
}
58+
Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,146 @@
11
# Rapid Stability Predictions
22

3-
[TODO: write]
3+
![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
4+
![Environment to run this objective function: poli rasp](https://img.shields.io/badge/Environment-poli____rasp-teal)
5+
6+
*Rapid Stability Predictions* (RaSP) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype).
7+
8+
This objective function is quite similar to [`foldx_stability`](./foldx_stability.md), and can be considered a drop-in replacement for single mutations. Be aware that the scales are different, though.
9+
10+
## Prerequisites
11+
12+
- A collection of `pdb` files you're interested in mutating.
13+
14+
However, your life would be easier if you run this black box objective function inside the `poli__rasp` environment. See below.
15+
16+
## How to run
17+
18+
You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
19+
20+
::::{tab-set}
21+
22+
:::{tab-item} In the `poli__rasp` environment
23+
24+
To run this black box function directly (which is useful for debugging, or when you are interested in setting breakpoints and inspecting the objects directly), we recommend you run it from inside the `poli__rasp` environment, or make sure you satisfy all its requirements.
25+
26+
To create this environment, run
27+
28+
```bash
29+
# From the root of the `poli` repository
30+
conda env create --file src/poli/objective_repository/rasp/environment.yml
31+
```
32+
33+
Follow that with
34+
35+
```
36+
conda activate poli__rasp
37+
```
38+
39+
Supposing you have [`3ned.pdb`](https://www.rcsb.org/structure/3ned) in the same directory as this script:
40+
41+
```python
42+
from pathlib import Path
43+
44+
import numpy as np
45+
46+
from poli import objective_factory
47+
48+
THIS_DIR = Path(__file__).parent.resolve()
49+
50+
if __name__ == "__main__":
51+
wildtype_pdb_paths_for_rasp = [
52+
THIS_DIR / "3ned.pdb",
53+
# You could have more if you want.
54+
]
55+
56+
problem_info, f_rasp, x0, y0, _ = objective_factory.create(
57+
name="rasp",
58+
wildtype_pdb_path=wildtype_pdb_paths_for_rasp,
59+
)
60+
61+
# Getting the wildtype string
62+
wildtype_string = "".join(x0[0])
63+
64+
# Mutating the first position three times:
65+
three_mutations = [
66+
"A" + wildtype_sequence[1:],
67+
"R" + wildtype_sequence[1:],
68+
"N" + wildtype_sequence[1:],
69+
]
70+
71+
# Computing the ddG for these three mutations:
72+
x = np.array([list(mutation) for mutation in three_mutations])
73+
74+
# y is approx [[0.03, -0.07, -0.28]]
75+
y = f(x)
76+
```
77+
78+
:::
79+
80+
:::{tab-item} In isolation
81+
82+
Supposing you have [`3ned.pdb`](https://www.rcsb.org/structure/3ned) in the same directory as this script:
83+
84+
```python
85+
from pathlib import Path
86+
87+
import numpy as np
88+
89+
from poli import objective_factory
90+
91+
THIS_DIR = Path(__file__).parent.resolve()
92+
93+
if __name__ == "__main__":
94+
wildtype_pdb_paths_for_rasp = [
95+
THIS_DIR / "3ned.pdb",
96+
# You could have more if you want.
97+
]
98+
99+
problem_info, f_rasp, x0, y0, _ = objective_factory.create(
100+
name="rasp",
101+
wildtype_pdb_path=wildtype_pdb_paths_for_rasp,
102+
)
103+
104+
# Getting the wildtype string
105+
wildtype_string = "".join(x0[0])
106+
107+
# Mutating the first position three times:
108+
three_mutations = [
109+
"A" + wildtype_sequence[1:],
110+
"R" + wildtype_sequence[1:],
111+
"N" + wildtype_sequence[1:],
112+
]
113+
114+
# Computing the ddG for these three mutations:
115+
x = np.array([list(mutation) for mutation in three_mutations])
116+
117+
# y is approx [[0.03, -0.07, -0.28]]
118+
y = f(x)
119+
```
120+
121+
```{warning}
122+
Registering the objective function in this way will create a `conda` environment called `poli__rasp` with the relevant dependencies. You can find the exact environment description in the following file: `src/poli/objective_repository/rasp/environment.yml`
123+
124+
```
125+
126+
:::
127+
128+
::::
129+
130+
## Warnings
131+
132+
:::{warning}
133+
134+
This objective function requires `clang` and `cmake`, which will be included in the `poli__rasp` conda environment. Make sure you are okay with this.
135+
136+
When the objective function is created, we will clone the [`reduce`](https://github.com/rlabduke/reduce) GitHub repository inside `~/.poli_objectives/rasp`, and we will compile it. We pin the version to the commit hash `bd23a0bf627ae9b08842102a5c2e9404b4a81924`.
137+
138+
This objective function also downloads several models (as `.pt` files) from [the RaSP repository](https://github.com/KULL-Centre/papers/tree/main/2022/ML-ddG-Blaabjerg-et-al/output).
139+
140+
Namely, we download `cavity_model_15.pt` and all `ds_models` and store them in `~/.poli_objectives/rasp`. These models are taken from the commit hash `3ccebe87e017b6bd737f88e1943557d128c85616`, and the files are checked against a pre-computed `md5` checksum.
141+
142+
:::
143+
144+
## Further reading
145+
146+
In the examples of `poli` you can find how to compute the saturation mutagenesis for a given protein at a given position.

docs/protein-optimization/using_poli/objective_repository/all_objectives.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ These are all the objective functions available on `poli`. Click on one of them
99
:gutter: 3
1010

1111
:::{grid-item-card} White noise
12-
:link: ./using_poli/objective_repository/white_noise.html
12+
:link: ./white_noise.html
1313
:columns: 6
1414
White noise drawn from a unit Gaussian
1515
:::
1616

1717
:::{grid-item-card} Aloha
18-
:link: ./using_poli/objective_repository/aloha.html
18+
:link: ./aloha.html
1919
:columns: 6
2020
A toy example about optimizing 5-letter words to spell "ALOHA"
2121
:::
@@ -30,43 +30,43 @@ Most of our problems are based on the [PMO]() benchmark, by Gao et al [TODO: add
3030
:gutter: 3
3131

3232
:::{grid-item-card} Quantitative Estimate of Druglikeness (QED)
33-
:link: ./using_poli/objective_repository/rdkit_qed.html
33+
:link: ./rdkit_qed.html
3434
:columns: 6
3535
Computing the QED using `RDKit`.
3636
:::
3737

3838
:::{grid-item-card} Log-solubility (LogP)
39-
:link: ./using_poli/objective_repository/rdkit_logp.html
39+
:link: ./rdkit_logp.html
4040
:columns: 6
4141
Computing the log-quotient of solubilities using `RDKit`.
4242
:::
4343

4444
:::{grid-item-card} Penalized Log-solubility (LogP, using `lambo`)
45-
:link: ./using_poli/objective_repository/penalized_logp_lambo.html
45+
:link: ./penalized_logp_lambo.html
4646
:columns: 6
4747
Computing the penalized log-quotient of solubilities using `lambo`'s implementation.
4848
:::
4949

5050
:::{grid-item-card} DDR3 (or 3pbl) docking (using `tdc`)
51-
:link: ./using_poli/objective_repository/ddr3_docking.html
51+
:link: ./ddr3_docking.html
5252
:columns: 6
5353
A wrapper around the Therapeutics Data Commons implmenetation of 3pbl docking.
5454
:::
5555

5656
:::{grid-item-card} Synthetic Accessibility (SA, using `tdc`)
57-
:link: ./using_poli/objective_repository/sa_tdc.html
57+
:link: ./sa_tdc.html
5858
:columns: 6
5959
A wrapper around the Therapeutics Data Commons implmenetation of the synthetic accessibility oracle.
6060
:::
6161

6262
:::{grid-item-card} TDC oracles [WIP]
63-
:link: ./using_poli/objective_repository/tdc_oracles.html
63+
:link: ./tdc_oracles.html
6464
:columns: 6
6565
Some of the oracles provided by the Therapeutics Data Commons. [WIP]
6666
:::
6767

6868
:::{grid-item-card} GuacaMol [WIP]
69-
:link: ./using_poli/objective_repository/tdc_oracles.html
69+
:link: ./tdc_oracles.html
7070
:columns: 6
7171
Some of the oracles provided by GuacaMol. [WIP]
7272
:::
@@ -79,25 +79,25 @@ Some of the oracles provided by GuacaMol. [WIP]
7979
:gutter: 3
8080

8181
:::{grid-item-card} Protein Stability (using `foldx`)
82-
:link: ./using_poli/objective_repository/foldx_stability.html
82+
:link: ./foldx_stability.html
8383
:columns: 6
8484
Stability of mutations of a wildtype using `foldx`
8585
:::
8686

8787
:::{grid-item-card} Protein SASA score (using `foldx`)
88-
:link: ./using_poli/objective_repository/foldx_sasa.html
88+
:link: ./foldx_sasa.html
8989
:columns: 6
9090
Solvent accessibility of mutations of a wildtype using `foldx`
9191
:::
9292

93-
:::{grid-item-card} Protein Stability (using `RaSP`) [WIP]
94-
:link: ./using_poli/objective_repository/RaSP.html
93+
:::{grid-item-card} Protein Stability (using `RaSP`)
94+
:link: ./RaSP.html
9595
:columns: 6
96-
Rapid Stability Predictions of single mutations from a wildtype. [WIP]
96+
Rapid Stability Predictions of single mutations from a wildtype.
9797
:::
9898

9999
:::{grid-item-card} RFP Fluorescence Protein Stability (using `lambo`)
100-
:link: ./using_poli/objective_repository/foldx_rfp_lambo.html
100+
:link: ./foldx_rfp_lambo.html
101101
:columns: 6
102102
LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.
103103
:::
@@ -110,7 +110,7 @@ LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.
110110
:gutter: 3
111111

112112
:::{grid-item-card} Mario jumps
113-
:link: ./using_poli/objective_repository/super_mario_bros.html
113+
:link: ./super_mario_bros.html
114114
:columns: 6
115115
Optimizing the number of jumps on a Super Mario Bros level
116116
:::

0 commit comments

Comments
 (0)