Skip to content

Commit ad2051f

Browse files
committed
Adds dockstring to the documentation
1 parent 5f1af00 commit ad2051f

File tree

6 files changed

+144
-10
lines changed

6 files changed

+144
-10
lines changed

docs/protein-optimization/_toc.yml

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,31 @@ parts:
1818
- caption: Examples
1919
chapters:
2020
- file: using_poli/optimization_examples/protein-stability-foldx/optimizing_protein_stability.ipynb
21-
- caption: Some objective functions available
21+
- caption: Objective functions
2222
chapters:
2323
- file: using_poli/objective_repository/all_objectives.md
2424
- file: using_poli/objective_repository/white_noise.md
2525
- file: using_poli/objective_repository/aloha.md
2626
- file: using_poli/objective_repository/rdkit_qed.md
2727
- file: using_poli/objective_repository/rdkit_logp.md
28-
- file: using_poli/objective_repository/ddr3_docking.md
28+
- file: using_poli/objective_repository/dockstring.md
29+
- file: using_poli/objective_repository/drd3_docking.md
2930
- file: using_poli/objective_repository/penalized_logp_lambo.md
3031
- file: using_poli/objective_repository/sa_tdc.md
3132
- file: using_poli/objective_repository/foldx_stability.md
3233
- file: using_poli/objective_repository/foldx_sasa.md
34+
- file: using_poli/objective_repository/RaSP.md
3335
- file: using_poli/objective_repository/super_mario_bros.md
36+
- caption: "Optimizers"
37+
chapters:
38+
- file: using_poli_baselines/cma_es.md
39+
- file: using_poli_baselines/graph_ga.md
40+
- file: using_poli_baselines/latent_space_bo.md
41+
- file: using_poli_baselines/nsga_2.md
42+
- file: using_poli_baselines/random_mutations.md
43+
- caption: "Observers"
44+
chapters:
45+
- file: observers/registering_an_observer.md
3446
- caption: "Contributing"
3547
chapters:
3648
- file: contributing/a_new_problem.md

docs/protein-optimization/references.bib

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,17 @@ @inproceedings{Boomsma:spherical_conv:2017
5656
year = {2017}
5757
}
5858

59+
@article{GarciaOrtegon:dockstring:2022,
60+
title = {DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design},
61+
volume = {62},
62+
issn = {1549-9596, 1549-960X},
63+
doi = {10.1021/acs.jcim.1c01334},
64+
number = {15},
65+
journal = {Journal of Chemical Information and Modeling},
66+
author = {García-Ortegón, Miguel and Simm, Gregor N. C. and Tripp, Austin J. and Hernández-Lobato, José Miguel and Bender, Andreas and Bacallado, Sergio},
67+
year = {2022},
68+
month = aug,
69+
pages = {3486–3502},
70+
language = {en}
71+
}
72+

docs/protein-optimization/using_poli/objective_repository/RaSP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
44
![Environment to run this objective function: poli rasp](https://img.shields.io/badge/Environment-poli____rasp-teal)
55

6-
*Rapid Stability Predictions* (RaSP) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype).
6+
[*Rapid Stability Predictions* (RaSP)](https://github.com/KULL-Centre/_2022_ML-ddG-Blaabjerg) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype).
77

88
This objective function is quite similar to [`foldx_stability`](./foldx_stability.md), and can be considered a drop-in replacement for single mutations. Be aware that the scales are different, though.
99

docs/protein-optimization/using_poli/objective_repository/all_objectives.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
These are all the objective functions available on `poli`. Click on one of them to get a minimal working example of how to use it.
44

55

6-
### Toy problems
6+
## Toy problems
77

88
::::{grid}
99
:gutter: 3
@@ -22,7 +22,7 @@ A toy example about optimizing 5-letter words to spell "ALOHA"
2222

2323
::::
2424

25-
### Small molecules
25+
## Small molecules
2626

2727
Most of our problems are based on the [PMO]() benchmark, by Gao et al [TODO: add cite].
2828

@@ -47,8 +47,14 @@ Computing the log-quotient of solubilities using `RDKit`.
4747
Computing the penalized log-quotient of solubilities using `lambo`'s implementation.
4848
:::
4949

50-
:::{grid-item-card} DDR3 (or 3pbl) docking (using `tdc`)
51-
:link: ./ddr3_docking.html
50+
:::{grid-item-card} `dockstring` for ligand design
51+
:link: ./dockstring.html
52+
:columns: 6
53+
Using `dockstring` to assess the docking score of a small molecule.
54+
:::
55+
56+
:::{grid-item-card} DRD3 (or 3pbl) docking (using `tdc`)
57+
:link: ./drd3_docking.html
5258
:columns: 6
5359
A wrapper around the Therapeutics Data Commons implmenetation of 3pbl docking.
5460
:::
@@ -73,7 +79,7 @@ Some of the oracles provided by GuacaMol. [WIP]
7379

7480
::::
7581

76-
### Proteins
82+
## Proteins
7783

7884
::::{grid}
7985
:gutter: 3
@@ -104,7 +110,7 @@ LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.
104110

105111
::::
106112

107-
### Other
113+
## Other
108114

109115
::::{grid}
110116
:gutter: 3
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# dockstring
2+
3+
![Type of objective function: discrete inputs](https://img.shields.io/badge/Type-discrete_inputs-blue)
4+
![Environment to run this objective function: poli dockstring](https://img.shields.io/badge/Environment-poli____dockstring-teal
5+
)
6+
7+
## About
8+
9+
[dockstring](https://github.com/dockstring/dockstring) is an objective function for ligand design (i.e. for finding a small molecule that docks well against a certain protein) {cite:p}`GarciaOrtegon:dockstring:2022`. It uses OpenBabel and rdkit to prepare the ligand, and AutoDock Vina for docking.
10+
11+
Since the score given by Vina is supposed to be minimized, our implementation returns the negative.
12+
13+
## Prerequisites
14+
15+
None! The implementation of [dockstring](https://github.com/dockstring/dockstring) comes with batteries included. You don't even have to set up Autodock Vina. Unfortunately, it will only work on MacOS and Linux.
16+
17+
If you want to run the black box function directly, we would recommend installing (or having all the pre-requisites) of the `poli__dockstring` environment. See below.
18+
19+
## How to run
20+
21+
You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
22+
23+
::::{tab-set}
24+
25+
:::{tab-item} In the `poli__dockstring` environment
26+
27+
To run this black box function directly (which is useful for debugging, or when you are interested in setting breakpoints and inspecting the objects directly), we recommend you run it from inside the `poli__dockstring` environment, or make sure you satisfy all its requirements.
28+
29+
To create this environment, run
30+
31+
```bash
32+
# From the root of the `poli` repository
33+
conda env create --file src/poli/objective_repository/dockstring/environment.yml
34+
```
35+
36+
Follow that with
37+
38+
```
39+
conda activate poli__dockstring
40+
```
41+
42+
Once you are in this environment (or in an environment that satisfies all its requirements), you can run the canonical example of risperidone and DRD3 docking:
43+
44+
```python
45+
import numpy as np
46+
47+
from poli import objective_factory
48+
49+
if __name__ == "__main__":
50+
problem_info, f, x0, y0, _ = objective_factory.create(
51+
name="dockstring",
52+
target_name="DRD2",
53+
string_representation="SMILES",
54+
)
55+
56+
# The smiles for Risperidone
57+
risperidone_smiles = "CC1=C(C(=O)N2CCCCC2=N1)CCN3CCC(CC3)C4=NOC5=C4C=CC(=C5)F"
58+
59+
x = np.array([list(risperidone_smiles)])
60+
61+
# y is 11.9. Notice how this is -(what dockstring would give)
62+
y = f(x)
63+
```
64+
65+
:::
66+
67+
:::{tab-item} In isolation
68+
69+
If you want `poli` to handle dependencies and environments under the hood, you can simply run
70+
71+
```python
72+
import numpy as np
73+
74+
from poli import objective_factory
75+
76+
if __name__ == "__main__":
77+
problem_info, f, x0, y0, _ = objective_factory.create(
78+
name="dockstring",
79+
target_name="DRD2",
80+
string_representation="SMILES",
81+
)
82+
83+
# The smiles for Risperidone
84+
risperidone_smiles = "CC1=C(C(=O)N2CCCCC2=N1)CCN3CCC(CC3)C4=NOC5=C4C=CC(=C5)F"
85+
86+
x = np.array([list(risperidone_smiles)])
87+
88+
# y is 11.9. Notice how this is -(what dockstring would give)
89+
y = f(x)
90+
91+
f.terminate()
92+
```
93+
94+
```{warning}
95+
Registering the objective function in this way will create a `conda` environment called `poli__dockstring` with the relevant dependencies. You can find the exact environment description in the following file: `src/poli/objective_repository/dockstring/environment.yml`
96+
97+
```
98+
99+
:::
100+
101+
::::
102+

docs/protein-optimization/using_poli/the_basics/defining_a_problem_solver.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ class RandomMutation(AbstractSolver):
114114

115115
Pretty lean! Notice how **the `next_candidate` method could perform all sorts of complicated logic** like latent space Bayesian Optimization, evolutionary algorithms... Moreover, the conda environment where you do the optimization has nothing to do with the enviroment where the objective function was defined: `poli` is set up in such a way that you can query the objective functions without having to worry!
116116

117-
:::{admonition}
117+
:::{note}
118118
Our implementation of `RandomMutation` is slightly different, since we allow users to query e.g. integer indices instead of strings.
119119

120120
[Take a look at the exact implementation on `poli_baselines/solvers/simple/random_mutation.py`](https://github.com/MachineLearningLifeScience/poli-baselines/blob/main/src/poli_baselines/solvers/simple/random_mutation.py).

0 commit comments

Comments
 (0)