Adds dockstring to the documentation

miguelgondu · miguelgondu · commit ad2051fbf0d0 · 2023-11-09T14:32:57.000+01:00
diff --git a/docs/protein-optimization/_toc.yml b/docs/protein-optimization/_toc.yml
@@ -18,19 +18,31 @@ parts:
   - caption: Examples
     chapters:
       - file: using_poli/optimization_examples/protein-stability-foldx/optimizing_protein_stability.ipynb
-  - caption: Some objective functions available
+  - caption: Objective functions
     chapters:
       - file: using_poli/objective_repository/all_objectives.md
       - file: using_poli/objective_repository/white_noise.md
       - file: using_poli/objective_repository/aloha.md
       - file: using_poli/objective_repository/rdkit_qed.md
       - file: using_poli/objective_repository/rdkit_logp.md
-      - file: using_poli/objective_repository/ddr3_docking.md
+      - file: using_poli/objective_repository/dockstring.md
+      - file: using_poli/objective_repository/drd3_docking.md
       - file: using_poli/objective_repository/penalized_logp_lambo.md
       - file: using_poli/objective_repository/sa_tdc.md
       - file: using_poli/objective_repository/foldx_stability.md
       - file: using_poli/objective_repository/foldx_sasa.md
+      - file: using_poli/objective_repository/RaSP.md
       - file: using_poli/objective_repository/super_mario_bros.md
+  - caption: "Optimizers"
+    chapters:
+    - file: using_poli_baselines/cma_es.md
+    - file: using_poli_baselines/graph_ga.md
+    - file: using_poli_baselines/latent_space_bo.md
+    - file: using_poli_baselines/nsga_2.md
+    - file: using_poli_baselines/random_mutations.md
+  - caption: "Observers"
+    chapters:
+    - file: observers/registering_an_observer.md
   - caption: "Contributing"
     chapters:
     - file: contributing/a_new_problem.md
diff --git a/docs/protein-optimization/references.bib b/docs/protein-optimization/references.bib
@@ -56,3 +56,17 @@ @inproceedings{Boomsma:spherical_conv:2017
   year      = {2017}
 }
 
+ @article{GarciaOrtegon:dockstring:2022,
+  title    = {DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design},
+  volume   = {62},
+  issn     = {1549-9596, 1549-960X},
+  doi      = {10.1021/acs.jcim.1c01334},
+  number   = {15},
+  journal  = {Journal of Chemical Information and Modeling},
+  author   = {García-Ortegón, Miguel and Simm, Gregor N. C. and Tripp, Austin J. and Hernández-Lobato, José Miguel and Bender, Andreas and Bacallado, Sergio},
+  year     = {2022},
+  month    = aug,
+  pages    = {3486–3502},
+  language = {en}
+}
+
diff --git a/docs/protein-optimization/using_poli/objective_repository/RaSP.md b/docs/protein-optimization/using_poli/objective_repository/RaSP.md
@@ -3,7 +3,7 @@
 ![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
 ![Environment to run this objective function: poli rasp](https://img.shields.io/badge/Environment-poli____rasp-teal)
 
-*Rapid Stability Predictions* (RaSP) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype).
+[*Rapid Stability Predictions* (RaSP)](https://github.com/KULL-Centre/_2022_ML-ddG-Blaabjerg) {cite:p}`Blaabjerg:RaSP:2023` predicts the stability of a protein using a supervised learning approach. Starting from features learned using self supervision {cite:p}`Boomsma:spherical_conv:2017`, RaSP learns to predict Rosetta scores using neural networks. The drawback being that only additive mutations could be computed simultaneously. We limit the edit distance to 1 (i.e. we only consider mutations that are one-away from the wildtype).
 
 This objective function is quite similar to [`foldx_stability`](./foldx_stability.md), and can be considered a drop-in replacement for single mutations. Be aware that the scales are different, though. 
 
diff --git a/docs/protein-optimization/using_poli/objective_repository/all_objectives.md b/docs/protein-optimization/using_poli/objective_repository/all_objectives.md
@@ -3,7 +3,7 @@
 These are all the objective functions available on `poli`. Click on one of them to get a minimal working example of how to use it.
 
 
-### Toy problems
+## Toy problems
 
 ::::{grid}
 :gutter: 3
@@ -22,7 +22,7 @@ A toy example about optimizing 5-letter words to spell "ALOHA"
 
 ::::
 
-### Small molecules
+## Small molecules
 
 Most of our problems are based on the [PMO]() benchmark, by Gao et al [TODO: add cite].
 
@@ -47,8 +47,14 @@ Computing the log-quotient of solubilities using `RDKit`.
 Computing the penalized log-quotient of solubilities using `lambo`'s implementation.
 :::
 
-:::{grid-item-card} DDR3 (or 3pbl) docking (using `tdc`)
-:link: ./ddr3_docking.html
+:::{grid-item-card} `dockstring` for ligand design
+:link: ./dockstring.html
+:columns: 6
+Using `dockstring` to assess the docking score of a small molecule.
+:::
+
+:::{grid-item-card} DRD3 (or 3pbl) docking (using `tdc`)
+:link: ./drd3_docking.html
 :columns: 6
 A wrapper around the Therapeutics Data Commons implmenetation of 3pbl docking.
 :::
@@ -73,7 +79,7 @@ Some of the oracles provided by GuacaMol. [WIP]
 
 ::::
 
-### Proteins
+## Proteins
 
 ::::{grid}
 :gutter: 3
@@ -104,7 +110,7 @@ LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.
 
 ::::
 
-### Other
+## Other
 
 ::::{grid}
 :gutter: 3
diff --git a/docs/protein-optimization/using_poli/objective_repository/dockstring.md b/docs/protein-optimization/using_poli/objective_repository/dockstring.md
@@ -0,0 +1,102 @@
+# dockstring
+
+![Type of objective function: discrete inputs](https://img.shields.io/badge/Type-discrete_inputs-blue)
+![Environment to run this objective function: poli dockstring](https://img.shields.io/badge/Environment-poli____dockstring-teal
+)
+
+## About
+
+[dockstring](https://github.com/dockstring/dockstring) is an objective function for ligand design (i.e. for finding a small molecule that docks well against a certain protein) {cite:p}`GarciaOrtegon:dockstring:2022`. It uses OpenBabel and rdkit to prepare the ligand, and AutoDock Vina for docking.
+
+Since the score given by Vina is supposed to be minimized, our implementation returns the negative.
+
+## Prerequisites
+
+None! The implementation of [dockstring](https://github.com/dockstring/dockstring) comes with batteries included. You don't even have to set up Autodock Vina. Unfortunately, it will only work on MacOS and Linux.
+
+If you want to run the black box function directly, we would recommend installing (or having all the pre-requisites) of the `poli__dockstring` environment. See below.
+
+## How to run
+
+You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
+
+::::{tab-set}
+
+:::{tab-item} In the `poli__dockstring` environment
+
+To run this black box function directly (which is useful for debugging, or when you are interested in setting breakpoints and inspecting the objects directly), we recommend you run it from inside the `poli__dockstring` environment, or make sure you satisfy all its requirements.
+
+To create this environment, run
+
+```bash
+# From the root of the `poli` repository
+conda env create --file src/poli/objective_repository/dockstring/environment.yml
+```
+
+Follow that with
+
+```
+conda activate poli__dockstring
+```
+
+Once you are in this environment (or in an environment that satisfies all its requirements), you can run the canonical example of risperidone and DRD3 docking:
+
+```python
+import numpy as np
+
+from poli import objective_factory
+
+if __name__ == "__main__":
+    problem_info, f, x0, y0, _ = objective_factory.create(
+        name="dockstring",
+        target_name="DRD2",
+        string_representation="SMILES",
+    )
+
+    # The smiles for Risperidone
+    risperidone_smiles = "CC1=C(C(=O)N2CCCCC2=N1)CCN3CCC(CC3)C4=NOC5=C4C=CC(=C5)F"
+
+    x = np.array([list(risperidone_smiles)])
+    
+    # y is 11.9. Notice how this is -(what dockstring would give)
+    y = f(x)
+```
+
+:::
+
+:::{tab-item} In isolation
+
+If you want `poli` to handle dependencies and environments under the hood, you can simply run
+
+```python
+import numpy as np
+
+from poli import objective_factory
+
+if __name__ == "__main__":
+    problem_info, f, x0, y0, _ = objective_factory.create(
+        name="dockstring",
+        target_name="DRD2",
+        string_representation="SMILES",
+    )
+
+    # The smiles for Risperidone
+    risperidone_smiles = "CC1=C(C(=O)N2CCCCC2=N1)CCN3CCC(CC3)C4=NOC5=C4C=CC(=C5)F"
+
+    x = np.array([list(risperidone_smiles)])
+    
+    # y is 11.9. Notice how this is -(what dockstring would give)
+    y = f(x)
+
+    f.terminate()
+```
+
+```{warning}
+Registering the objective function in this way will create a `conda` environment called `poli__dockstring` with the relevant dependencies. You can find the exact environment description in the following file: `src/poli/objective_repository/dockstring/environment.yml`
+
+```
+
+:::
+
+::::
+
diff --git a/docs/protein-optimization/using_poli/the_basics/defining_a_problem_solver.md b/docs/protein-optimization/using_poli/the_basics/defining_a_problem_solver.md
@@ -114,7 +114,7 @@ class RandomMutation(AbstractSolver):
 
 Pretty lean! Notice how **the `next_candidate` method could perform all sorts of complicated logic** like latent space Bayesian Optimization, evolutionary algorithms... Moreover, the conda environment where you do the optimization has nothing to do with the enviroment where the objective function was defined: `poli` is set up in such a way that you can query the objective functions without having to worry!
 
-:::{admonition}
+:::{note}
 Our implementation of `RandomMutation` is slightly different, since we allow users to query e.g. integer indices instead of strings.
 
 [Take a look at the exact implementation on `poli_baselines/solvers/simple/random_mutation.py`](https://github.com/MachineLearningLifeScience/poli-baselines/blob/main/src/poli_baselines/solvers/simple/random_mutation.py).