Adds an intro to poli, cleans the registration chapter

miguelgondu · miguelgondu · commit d28b7286b8ce · 2023-08-14T16:43:55.000+02:00
diff --git a/docs/protein-optimization/_toc.yml b/docs/protein-optimization/_toc.yml
@@ -7,10 +7,11 @@ parts:
   - caption: Using poli - the basics
     numbered: true
     chapters:
-      - file: using_poli/desired_design_patterns/intro_to_poli.md
-      - file: using_poli/desired_design_patterns/registering_an_objective_function.md
-      - file: using_poli/desired_design_patterns/defining_a_problem_solver.md
-      - file: using_poli/desired_design_patterns/optimizing_an_objective_function.md
+      - file: using_poli/the_basics/intro_to_poli.md
+      - file: using_poli/the_basics/registering_an_objective_function.md
+      - file: using_poli/the_basics/defining_a_problem_solver.md
+      - file: using_poli/the_basics/optimizing_an_objective_function.md
+      - file: using_poli/the_basics/diving_deeper.md
   - caption: Using poli - examples
     numbered: true
     chapters:
diff --git a/docs/protein-optimization/understanding_foldx/01-single-mutation-using-foldx/tmp/Raw_101m_Repair.fxout b/docs/protein-optimization/understanding_foldx/01-single-mutation-using-foldx/tmp/Raw_101m_Repair.fxout
@@ -9,3 +9,5 @@ Output type: BuildModel
 Pdb	total energy	Backbone Hbond	Sidechain Hbond	Van der Waals	Electrostatics	Solvation Polar	Solvation Hydrophobic	Van der Waals clashes	entropy sidechain	entropy mainchain	sloop_entropy	mloop_entropy	cis_bond	torsional clash	backbone clash	helix dipole	water bridge	disulfide	electrostatic kon	partial covalent bonds	energy Ionisation	Entropy Complex
 101m_Repair_1.pdb	-31.7457	-141.841	-48.2413	-177.827	-8.5183	243.998	-235.896	3.3294	104.051	231.196	0	0	0	5.25497	157.841	-8.81857	0	0	0	0	1.5666	0
 WT_101m_Repair_1.pdb	-34.3436	-141.831	-47.9784	-179.662	-8.13848	243.99	-239.232	3.40664	105.266	231.722	0	0	0	5.28162	157.882	-8.73035	0	0	0	0	1.56224	0
+101m_Repair_1.pdb	-31.7457	-141.841	-48.2413	-177.827	-8.5183	243.998	-235.896	3.3294	104.051	231.196	0	0	0	5.25497	157.841	-8.81857	0	0	0	0	1.5666	0
+WT_101m_Repair_1.pdb	-34.3436	-141.831	-47.9784	-179.662	-8.13848	243.99	-239.232	3.40664	105.266	231.722	0	0	0	5.28162	157.882	-8.73035	0	0	0	0	1.56224	0
diff --git a/docs/protein-optimization/using_poli/the_basics/defining_a_problem_solver.md b/docs/protein-optimization/using_poli/the_basics/defining_a_problem_solver.md
@@ -114,4 +114,6 @@ class RandomMutation(AbstractSolver):
         return next_x
 ```
 
-Pretty lean! Notice how **the `next_candidate` method could perform all sorts of complicated logic** like latent space Bayesian Optimization, evolutionary algorithms...
+Pretty lean! Notice how **the `next_candidate` method could perform all sorts of complicated logic** like latent space Bayesian Optimization, evolutionary algorithms... Moreover, the conda environment where you do the optimization has nothing to do with the enviroment where the objective function was defined: `poli` is set up in such a way that you can query the objective functions without having to worry!
+
+In the next chapter, we apply this solver to the `aloha` problem we defined in [the first chapter](./registering_an_objective_function.md).
diff --git a/docs/protein-optimization/using_poli/the_basics/diving_deeper.md b/docs/protein-optimization/using_poli/the_basics/diving_deeper.md
@@ -0,0 +1,6 @@
+# Diving deeper: how does poli work under the hood?
+
+```{contents}
+```
+
+TODO: write.
diff --git a/docs/protein-optimization/using_poli/the_basics/intro_to_poli.md b/docs/protein-optimization/using_poli/the_basics/intro_to_poli.md
@@ -1,4 +1,9 @@
 # What is poli?
 
-[TODO: write an intro to the library, and discuss problems, how to register them, and how to call them.]
+`poli` is a library for registering black box optimization functions, with a special focus on *discrete* sequence optimization. It stands for *Protein Optimization Library*, since some of the work done on drug design is done through representing proteins as discrete sequences, or sentences of amino acids.
 
+We also build `poli-baselines` on top, allowing you to define black box optimization algorithms for discrete sequences.
+
+These next chapters detail a basic example of how to use `poli` and `poli-baselines`. Continue to [the next chapter](./registering_an_objective_function.md)!
+
+After these, feel free to dive deeper into how `poli` works underneath in [the chapter about the details](./diving_deeper.md).
diff --git a/docs/protein-optimization/using_poli/the_basics/optimizing_an_objective_function.md b/docs/protein-optimization/using_poli/the_basics/optimizing_an_objective_function.md
@@ -1,7 +1,101 @@
 # Optimizing an objective function
 
-In this chapter, we discuss how an objective function should be optimized in POLi _ideally_. The current implementation might diverge for now, but this is what we aim for.
+```{contents}
+```
 
-## Getting a list of registered objective functions
+In this chapter, we combine what we have discussed in the previous two chapters to optimize a black box objective function using `poli` and its baselines. In particular, we'll solve the `aloha` problem using discrete random mutations.
 
-[TODO: finish this]
+> This tutorial follows `optimizing_aloha.py` in `poli-baselines/examples`.
+
+## Prerequisites
+
+Before running this, you need to make sure you have
+
+- You have both `poli` and `poli_baselines` installed.
+- run [the first chapter on registering black box functions](./registering_an_objective_function.md).
+- read [the second chapter on implementing solvers](./defining_a_problem_solver.md)
+
+By the end, you should have registered the `aloha` problem.
+
+## Is aloha registered?
+
+We can start by checking that the `aloha` problem is indeed among the registered objectives:
+
+```python
+# optimizing_aloha.py
+from poli.core.registry import get_problems
+
+if __name__ == "__main__":
+    assert "aloha" in get_problems()
+```
+
+This script should run without raising any problems.
+
+:::{admonition} Is aloha not registered?
+:class: dropdown
+
+If the past snippet fails and raises an `AssertionError`, then it's likely you haven't registered `aloha` as a problem. Check [the first chapter for the process of registering this problem](./registering_an_objective_function.md).
+
+:::
+
+## Instancing the problem and solver
+
+Since the problem is registered, optimizing it is really easy!
+
+```python
+# optimizing_aloha.py
+from poli import objective_factory
+from poli.core.registry import get_problems
+from poli_baselines.solvers.simple.random_mutation import RandomMutation
+
+if __name__ == "__main__":
+    assert "aloha" in get_problems()
+
+    # Creating an instance of the problem
+    problem_info, f, x0, y0, run_info = objective_factory.create(
+        name="aloha", caller_info=None, observer=None
+    )
+
+    # Creating an instance of the solver
+    solver = RandomMutation(
+        black_box=f,
+        x0=x0,
+        y0=y0,
+        alphabet=problem_info.get_alphabet(),
+    )
+```
+
+## Optimizing
+
+Solvers in `poli_baselines` have a `solve` method. Its only required argument is the number of iterations we want to run the optimization for (`max_iters: int`). Other keyword arguments include e.g. `break_at_performance: float = None`, or `verbose: bool = False`.
+
+Once instantiated, the solver can optimize our `aloha` problem easily:
+
+```python
+# optimizing_aloha.py
+
+...
+
+if __name__ == "__main__"
+    ...
+    
+    # Running the optimization for 1000 steps,
+    # breaking if we find a performance above 5.0, 
+    # and printing a small summary at each step.
+    solver.solve(max_iter=1000, break_at_performance=5.0, verbose=True)
+    print(solver.get_best_solution())
+```
+
+Just by running random mutations, you can find the "ALOHA" string in usually less than 1000 random mutations.
+
+## Conclusion
+
+In this tutorial we used `RandomMutations` to optimize a toy example: the `aloha` problem described in [the first chapter](./registering_an_objective_function.md).
+
+This concludes the "Getting Started" section of this tutorial. The key takeaways are these:
+
+1. With `poli`, you can register black box objective functions which, when instantiated, will run on an independent process in a custom environment you specify at registration.
+2. `poli_baselines` allows you to define black box optimization algorithms that operate well with `poli`'s registered problems.
+3. `poli_baselines` also comes with several solvers, including `RandomMutations`.
+
+The next chapter discusses a more advanced set-up: registering a black-box objective function with Java dependencies, as well as `torch`, and loading up certain autoencoders.
diff --git a/docs/protein-optimization/using_poli/the_basics/registering_an_objective_function.md b/docs/protein-optimization/using_poli/the_basics/registering_an_objective_function.md
@@ -37,7 +37,7 @@ class AlohaBlackBox(AbstractBlackBox):
         super().__init__(L=L)
     
     # The only method you have to define
-    def _black_box(self, x: np.ndarray) -> np.ndarray:
+    def _black_box(self, x: np.ndarray, context: dict = None) -> np.ndarray:
         """
         A function that takes 5-letter words
         in numpy arrays (as letters, not as
@@ -48,7 +48,7 @@ class AlohaBlackBox(AbstractBlackBox):
         return np.sum(matches, axis=1, keepdims=True)
 ```
 
-As the code says, the only method you need to define is `_black_box`, returning a numpy array of size `[1, 1]`. `AbstractBlackBox` takes it from there, making sure that the length of the inputs is correct and matches `L`. You can opt-out of length-checking by saying `L=np.inf` in the `__init__`.[^details-on-black-box]
+As the code says, the only method you need to define is `_black_box(x: np.ndarray, context: dict = None)`, returning a numpy array of size `[1, 1]`. `AbstractBlackBox` takes it from there, making sure that the length of the inputs is correct and matches `L`. You can opt-out of length-checking by saying `L=np.inf` in the `__init__`.[^details-on-black-box]
 
 [^details-on-black-box]: You can check the exact implementation in [TODOADD]().
 
@@ -65,6 +65,7 @@ Let's build a problem factory for the `AlohaBlackBox`:
 ```python
 # registering_aloha.py
 from typing import Tuple
+from string import ascii_uppercase
 
 import numpy as np
 
@@ -113,7 +114,7 @@ class AlohaProblemFactory(AbstractProblemFactory):
 
 First step is always **creating a conda environment for your problem**. In this case, we could do with just the base enviroment. However, for completion in the presentation, we will create a conda enviroment called `poli_aloha`. This is the enviroment description (which can be found under `environment.yml` in the examples folder for `aloha`):
 
-TODO: remove the dependency on click, and move the dependency to our github after merging.
+TODO: move the dependency to our github after merging.
 
 ```yml
 # environment.yml
@@ -125,7 +126,6 @@ dependencies:
   - pip
   - pip:
     - numpy
-    - click
     - "git+https://github.com/miguelgondu/poli.git"
 ```
 
@@ -176,6 +176,14 @@ if __name__ == "__main__":
 
 :::
 
+:::{admonition} Where is this problem registered?
+
+`poli` registers this objective as a shell file `.sh` inside `~/.poli_objectives`.
+
+As you can check, this script runs `poli/objective.py` inside the conda environment you specified on registration. `objective.py` is the main workhorse of `poli`: it starts a process in which the objective function waits for next inputs.
+
+:::
+
 ## Calling the registered problem
 
 To check that we can indeed call the problem from somewhere else, let's write a second file called `querying_aloha.py` where we instantiate the objective function and query it. We emphasize that this second file can run on **any other conda environment** (as long as you have `poli` installed, and the problem registered).
@@ -197,7 +205,7 @@ If you are running this script from an environment that has `poli`, and that has
 True
 ```
 
-Amazing. Let's remove this print. Now we can create an instance of the problem, making absolutely sure the problem is registered:
+Amazing. Let's remove this print. Now we can create an instance of the problem:
 
 ```python
 # querying_aloha.py
@@ -236,4 +244,4 @@ In this tutorial you
 
 This is a trivial example, since the only dependency is numpy. In other examples you will see problems with more subtle dependencies (e.g. Java runtimes, torch, cheminformatics tools like `FoldX`, `RDKit`, or the therapeutics data commons...).
 
-A good next step: checking how to define problem solvers (i.e. black box optimization algorithms).
+In the next chapter, we will define a simple "Problem Solver" (i.e. a black box optimization algorithm), and in the one after that we will apply it to solve this `aloha` problem.