Adds documentation to the new lambo tasks

miguelgondu · miguelgondu · commit 7ba4b0972dd2 · 2023-09-06T10:54:54.000+02:00
diff --git a/docs/protein-optimization/using_poli/objective_repository/ddr3_docking.md b/docs/protein-optimization/using_poli/objective_repository/ddr3_docking.md
@@ -0,0 +1,118 @@
+# DDR3 docking (using TDC)
+
+![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
+![Environment to run this objective function: poli lambo](https://img.shields.io/badge/Environment-poli____base-teal
+)
+
+## About
+
+This objective function computes the docking score of a small molecule w.r.t. the protein `3pbl`, [which is the canonical example in the Therapeutics Data Common's docking oracles](https://tdcommons.ai/functions/oracles/#docking-scores). Under the hood, it uses pyscreener, vina and the ADFR suite.
+
+## Prerequisites
+
+### Installing AutoDock Vina
+
+#### Download the files
+
+[Download AutoDock Vina from the Center for Computational Structural Biology's website](https://vina.scripps.edu/downloads/). Uncompress them.
+
+#### Add the binary folder to the path.
+
+Add this to the path by including `export PATH=path/to/AutoDock_vina/bin:$PATH` in your `~/.bashrc` or `~/.zshrc`.
+
+```bash
+# In your ~/.bashrc or ~/.zshrc
+export PATH=path/to/AutoDock_vina/bin:$PATH
+```
+
+### Installing the ADFR suite
+
+#### Download the files
+
+[Download the installable files](https://ccsb.scripps.edu/adfr/downloads/). It's likely that you will have to run the `./install.sh` script inside the folder, and thus you might have to change its permissions for execution using `chmod +x`
+
+#### Install it
+
+After running `./install.sh`, you should be able to find `.../bin/prepare_receptor`.
+
+#### Add `prepare_receptor` to the path
+
+For the docking to run, `pyscreener` needs access to the `prepare_receptor` binary. However, adding all of the ADFR `bin` folder is sometimes problematic, since it has a version of Python inside.
+
+Thus, we recommend creating a symlink. Write this in your `~/.bashrc` or `~/.zshrc`:
+
+```bash
+# In your ~/.bashrc or ~/.zshrc
+ln -sf /path/to/ADFR/bin/prepare_receptor /path/to/AutoDock_vina/bin
+```
+
+### Create the `poli__lambo` environment
+
+#### Create the environment from the yml file
+
+This can easily be done by running
+
+```bash
+# From the base of the poli repo
+conda env create --file src/poli/objective_repository/ddr3_docking/environment.yml
+```
+
+This particular example _doesn't_ need to have the `lambo` package installed.
+
+### Making sure you're all set
+
+If the set-up above was successful, you should be able to run
+
+```bash
+which vina
+# /path/to/your/bin/vina
+```
+
+and
+
+```bash
+which prepare_receptor
+# /path/to/your/bin/prepare_receptor
+```
+
+## How to run
+
+You can only run this objective function either in the `poli__lambo` environment, or as an isolated process (which runs this environment underneath).
+
+:::{warning}
+Running this objective function will create an `./oracle` folder on your working directory, where it will download the relevant `.pdb` files.
+:::
+
+::::{tab-set}
+
+:::{tab-item} (Isolated) in the `poli__lambo` environment
+
+After the setup described above, you can simply run the following code from 
+
+```python
+from pathlib import Path
+
+import numpy as np
+
+from poli import objective_factory
+
+# How to create
+problem_info, f, x0, y0, run_info = objective_factory.create(
+    name="ddr3_docking",
+    force_register=True
+)
+
+# Example input:
+print(x0)  # [['c' '1' 'c' 'c' 'c' 'c' 'c' '1']]
+
+# Querying:
+print(y0)  # [[-4.1]]
+```
+
+:::
+
+::::
+
+## See also
+
+- [an internal link of sorts]()
diff --git a/docs/protein-optimization/using_poli/objective_repository/foldx_rfp_lambo.md b/docs/protein-optimization/using_poli/objective_repository/foldx_rfp_lambo.md
@@ -1,38 +1,58 @@
 # Protein (RFP) stability and SASA (using `foldx`,`lambo`)
 ![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
-![Environment to run this objective function: poli protein](https://img.shields.io/badge/Environment-poli____lambo
-)
+![Environment to run this objective function: poli protein](https://img.shields.io/badge/Environment-poli____lambo-teal)
 
 ## About
 
 This objective function returns stability using `foldx` and SASA, _exactly_ as done in the `lambo` implementation.
 
 ## Prerequisites
 
-- Have `foldx` installed, and available in your home directory. We expect the following files to be there:
+### `foldx`
+
+We need you to have `foldx` installed, and available in your home directory. We expect the following files to be there:
   - `~/foldx/foldx`: the binary. You might need to rename it.
   - `~/foldx/rotabase.txt`: a text file necessary for `foldx` to run.
-- Have `lambo` checked out, preferrably in the home directory, specifically containing: 
-  - `lambo.tasks.proxy_rfp.proxy_rfp.ProxyRFPTask`
-  - the rfp data: see `~/lambo/assets/fpbase`
 
-## How to run
+### Python environment
 
-You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
+We expect you to have [cloned and installed the `lambo` repository](https://github.com/samuelstanton/lambo). Since there are some files we can't install automatically using `pip install git+...`, we ask you to create a `conda` environment for the lambo tasks:
 
-::::{tab-set}
+```
+# From the root of the poli repository
+conda env create --file src/poli/objective_repository/foldx_rfp_lambo/environment.yml
+```
 
-:::{tab-item} In current environment
+Activate the environment you just created using
+```
+conda activate poli__lambo
+```
+### `lambo`
 
-You will have to install the following two dependencies:
+We also need `lambo`'s tasks to be available in Python's path for `poli__lambo`:
 
 ```bash
-pip install -r ~/lambo/requirements.txt
+# In the poli__lambo environment
+git clone https://github.com/samuelstanton/lambo    # For reference, we use 431b052
+cd lambo
+pip install -e .  
 ```
 
-This contains: `pytorch botorch python-levenshtein wandb biopython hydra-core pymoo pandas deepchem transformers selfies jupyter seaborn pyscreener` and other packages.
+In particular, we need
+- `lambo.tasks.proxy_rfp.proxy_rfp.ProxyRFPTask`
+- the rfp data: see `~/lambo/assets/fpbase`
+
+Make sure the data is avaliable.
+
+## How to run
+
+You can only run this objective function either in the `poli__lambo` environment, or as an isolated process (which runs this environment underneath).
+
+::::{tab-set}
+
+:::{tab-item} (Isolated) in the `poli__lambo` environment
 
-Then run
+After the setup described above, you can simply run the following code from 
 
 ```python
 from pathlib import Path
@@ -50,7 +70,7 @@ problem_info, f, x0, y0, run_info = objective_factory.create(
 print(x0)
 
 # Querying:
-print(y0)  # The stability of your wildtype
+print(y0)  # [[-11189.00587946    -39.8155    ], ...]
 ```
 
 You could also pass an `problem: ProblemSetupInformation` to the create method. For the alphabet reference by default, [we use this encoding](https://github.com/MachineLearningLifeScience/poli/blob/44cad2a5c95f209aeb24d4893d162b3359ca91a3/src/poli/core/util/proteins/defaults.py#L1).
diff --git a/docs/protein-optimization/using_poli/objective_repository/penalized_logp_lambo.md b/docs/protein-optimization/using_poli/objective_repository/penalized_logp_lambo.md
@@ -0,0 +1,79 @@
+# Objective function name
+
+![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
+![Environment to run this objective function: poli lambo](https://img.shields.io/badge/Environment-poli____lambo-teal
+)
+
+## About
+
+This objective function computes the penalized logP _exactly_ as is done in the `lambo` implementation.[^1]
+
+[^1]: If you check carefully, you might have noticed that they add to their implementation the empirical means and standard deviations of the ZINC dataset for the values they compute.
+
+To do so, we import their scoring function.
+
+## Prerequisites
+
+### Python environment
+
+We expect you to have [cloned and installed the `lambo` repository](https://github.com/samuelstanton/lambo). Since there are some files we can't install automatically using `pip install git+...`, we ask you to create a `conda` environment for the lambo tasks:
+
+```
+# From the root of the poli repository
+conda env create --file src/poli/objective_repository/foldx_rfp_lambo/environment.yml
+```
+
+Activate the environment you just created using
+```
+conda activate poli__lambo
+```
+
+### `lambo`
+
+We also need `lambo`'s tasks to be available in Python's path for `poli__lambo`:
+
+```bash
+# In the poli__lambo environment
+git clone https://github.com/samuelstanton/lambo    # For reference, we use 431b052
+cd lambo
+pip install -e .  
+```
+
+## How to run
+
+You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
+
+You can only run this objective function either in the `poli__lambo`, or as an isolated process (which runs this environment underneath).
+
+::::{tab-set}
+
+:::{tab-item} (Isolated) in the `poli__lambo` environment
+
+After the setup described above, you can simply run the following code from 
+
+```python
+import numpy as np
+
+from poli import objective_factory
+
+# Using create
+_, f, x0, y0, _ = objective_factory.create(name="penalized_logp_lambo")
+
+# An example input
+print(x0)
+
+# The example's output
+print(y0)
+
+# Terminating the isolated process (if it was created)
+f.terminate()
+```
+
+:::
+
+::::
+
+### Other keyword arguments:
+
+- `penalized: bool = True`. Whether we are evaluating penalized logP or not.
+- `string_representation: str = "SMILES"`. Can be either `"SMILES"` or `"SELFIES"`.