|
1 | 1 | # Rapid Stability Predictions
|
2 | 2 |
|
3 |
| -[TODO: write] |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | +*Rapid Stability Predictions* (RaSP) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype). |
| 7 | + |
| 8 | +This objective function is quite similar to [`foldx_stability`](./foldx_stability.md), and can be considered a drop-in replacement for single mutations. Be aware that the scales are different, though. |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +- A collection of `pdb` files you're interested in mutating. |
| 13 | + |
| 14 | +However, your life would be easier if you run this black box objective function inside the `poli__rasp` environment. See below. |
| 15 | + |
| 16 | +## How to run |
| 17 | + |
| 18 | +You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment. |
| 19 | + |
| 20 | +::::{tab-set} |
| 21 | + |
| 22 | +:::{tab-item} In the `poli__rasp` environment |
| 23 | + |
| 24 | +To run this black box function directly (which is useful for debugging, or when you are interested in setting breakpoints and inspecting the objects directly), we recommend you run it from inside the `poli__rasp` environment, or make sure you satisfy all its requirements. |
| 25 | + |
| 26 | +To create this environment, run |
| 27 | + |
| 28 | +```bash |
| 29 | +# From the root of the `poli` repository |
| 30 | +conda env create --file src/poli/objective_repository/rasp/environment.yml |
| 31 | +``` |
| 32 | + |
| 33 | +Follow that with |
| 34 | + |
| 35 | +``` |
| 36 | +conda activate poli__rasp |
| 37 | +``` |
| 38 | + |
| 39 | +Supposing you have [`3ned.pdb`](https://www.rcsb.org/structure/3ned) in the same directory as this script: |
| 40 | + |
| 41 | +```python |
| 42 | +from pathlib import Path |
| 43 | + |
| 44 | +import numpy as np |
| 45 | + |
| 46 | +from poli import objective_factory |
| 47 | + |
| 48 | +THIS_DIR = Path(__file__).parent.resolve() |
| 49 | + |
| 50 | +if __name__ == "__main__": |
| 51 | + wildtype_pdb_paths_for_rasp = [ |
| 52 | + THIS_DIR / "3ned.pdb", |
| 53 | + # You could have more if you want. |
| 54 | + ] |
| 55 | + |
| 56 | + problem_info, f_rasp, x0, y0, _ = objective_factory.create( |
| 57 | + name="rasp", |
| 58 | + wildtype_pdb_path=wildtype_pdb_paths_for_rasp, |
| 59 | + ) |
| 60 | + |
| 61 | + # Getting the wildtype string |
| 62 | + wildtype_string = "".join(x0[0]) |
| 63 | + |
| 64 | + # Mutating the first position three times: |
| 65 | + three_mutations = [ |
| 66 | + "A" + wildtype_sequence[1:], |
| 67 | + "R" + wildtype_sequence[1:], |
| 68 | + "N" + wildtype_sequence[1:], |
| 69 | + ] |
| 70 | + |
| 71 | + # Computing the ddG for these three mutations: |
| 72 | + x = np.array([list(mutation) for mutation in three_mutations]) |
| 73 | + |
| 74 | + # y is approx [[0.03, -0.07, -0.28]] |
| 75 | + y = f(x) |
| 76 | +``` |
| 77 | + |
| 78 | +::: |
| 79 | + |
| 80 | +:::{tab-item} In isolation |
| 81 | + |
| 82 | +Supposing you have [`3ned.pdb`](https://www.rcsb.org/structure/3ned) in the same directory as this script: |
| 83 | + |
| 84 | +```python |
| 85 | +from pathlib import Path |
| 86 | + |
| 87 | +import numpy as np |
| 88 | + |
| 89 | +from poli import objective_factory |
| 90 | + |
| 91 | +THIS_DIR = Path(__file__).parent.resolve() |
| 92 | + |
| 93 | +if __name__ == "__main__": |
| 94 | + wildtype_pdb_paths_for_rasp = [ |
| 95 | + THIS_DIR / "3ned.pdb", |
| 96 | + # You could have more if you want. |
| 97 | + ] |
| 98 | + |
| 99 | + problem_info, f_rasp, x0, y0, _ = objective_factory.create( |
| 100 | + name="rasp", |
| 101 | + wildtype_pdb_path=wildtype_pdb_paths_for_rasp, |
| 102 | + ) |
| 103 | + |
| 104 | + # Getting the wildtype string |
| 105 | + wildtype_string = "".join(x0[0]) |
| 106 | + |
| 107 | + # Mutating the first position three times: |
| 108 | + three_mutations = [ |
| 109 | + "A" + wildtype_sequence[1:], |
| 110 | + "R" + wildtype_sequence[1:], |
| 111 | + "N" + wildtype_sequence[1:], |
| 112 | + ] |
| 113 | + |
| 114 | + # Computing the ddG for these three mutations: |
| 115 | + x = np.array([list(mutation) for mutation in three_mutations]) |
| 116 | + |
| 117 | + # y is approx [[0.03, -0.07, -0.28]] |
| 118 | + y = f(x) |
| 119 | +``` |
| 120 | + |
| 121 | +```{warning} |
| 122 | +Registering the objective function in this way will create a `conda` environment called `poli__rasp` with the relevant dependencies. You can find the exact environment description in the following file: `src/poli/objective_repository/rasp/environment.yml` |
| 123 | +
|
| 124 | +``` |
| 125 | + |
| 126 | +::: |
| 127 | + |
| 128 | +:::: |
| 129 | + |
| 130 | +## Warnings |
| 131 | + |
| 132 | +:::{warning} |
| 133 | + |
| 134 | +This objective function requires `clang` and `cmake`, which will be included in the `poli__rasp` conda environment. Make sure you are okay with this. |
| 135 | + |
| 136 | +When the objective function is created, we will clone the [`reduce`](https://github.com/rlabduke/reduce) GitHub repository inside `~/.poli_objectives/rasp`, and we will compile it. We pin the version to the commit hash `bd23a0bf627ae9b08842102a5c2e9404b4a81924`. |
| 137 | + |
| 138 | +This objective function also downloads several models (as `.pt` files) from [the RaSP repository](https://github.com/KULL-Centre/papers/tree/main/2022/ML-ddG-Blaabjerg-et-al/output). |
| 139 | + |
| 140 | +Namely, we download `cavity_model_15.pt` and all `ds_models` and store them in `~/.poli_objectives/rasp`. These models are taken from the commit hash `3ccebe87e017b6bd737f88e1943557d128c85616`, and the files are checked against a pre-computed `md5` checksum. |
| 141 | + |
| 142 | +::: |
| 143 | + |
| 144 | +## Further reading |
| 145 | + |
| 146 | +In the examples of `poli` you can find how to compute the saturation mutagenesis for a given protein at a given position. |
0 commit comments