Skip to content

Commit d14554a

Browse files
committed
Adapts one of the tutorials as instructions to create a new solver
1 parent 6cf1020 commit d14554a

File tree

5 files changed

+181
-37
lines changed

5 files changed

+181
-37
lines changed

docs/protein-optimization/_autosummary/poli.tests.registry.proteins.test_foldx.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
test_registering_foldx_sasa
2222
test_registering_foldx_stability
2323
test_registering_foldx_stability_and_sasa
24+
test_registering_foldx_stability_and_sasa_with_verbose_output
2425

2526

2627

docs/protein-optimization/_toc.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@ parts:
3535
chapters:
3636
- file: using_poli_baselines/random_mutations.md
3737
- file: using_poli_baselines/cma_es.md
38-
- file: using_poli_baselines/nsga_2.md
3938
- file: using_poli_baselines/bayesian_optimization.md
4039
- file: using_poli_baselines/line_bayesian_optimization.md
4140
- caption: "Observers"

docs/protein-optimization/contributing/a_new_problem.md

Lines changed: 35 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -61,17 +61,24 @@ class YourBlackBox(AbstractBlackBox):
6161
def __init__(
6262
self,
6363
info: ProblemSetupInformation,
64+
your_arg: str,
65+
your_second_arg: List[float],
66+
your_kwarg: str=...,
6467
batch_size: int = None,
6568
parallelize: bool = False,
66-
num_workers: int = None
69+
num_workers: int = None,
70+
evaluation_budget: int = float("inf")
6771
):
6872
super().__init__(
6973
info=info,
7074
batch_size=batch_size,
7175
parallelize=parallelize,
7276
num_workers=num_workers,
77+
evaluation_budget=evaluation_budget,
7378
)
7479

80+
#... your manipulation of args and kwargs.
81+
7582
# The only method you have to define
7683
def _black_box(self, x: np.ndarray, context: dict = None) -> np.ndarray:
7784
return ...
@@ -95,14 +102,16 @@ class YourProblemFactory(AbstractProblemFactory):
95102
def create(
96103
self,
97104
seed: int = None,
105+
your_arg: str = ...,
106+
your_second_arg: List[float] = ...,
107+
your_kwarg: str = ...,
98108
batch_size: int = None,
99109
parallelize: bool = False,
100110
num_workers: int = None,
101-
your_keyword_1: str = ...,
102-
your_keyword_2: int = ...,
103-
your_keyword_3: List[float] = ...,
111+
evaluation_budget: int = float("inf"),
112+
your_second_arg: List[float] = ...,
104113
) -> Tuple[AbstractBlackBox, np.ndarray, np.ndarray]:
105-
# Manipulate keywords you might need at creation time...
114+
# Manipulate args and kwargs you might need at creation time...
106115
...
107116

108117
# Getting the problem information
@@ -111,12 +120,16 @@ class YourProblemFactory(AbstractProblemFactory):
111120
# Creating your black box function
112121
f = YourBlackBox(
113122
info=problem_info,
123+
your_arg=your_arg,
124+
your_second_arg=your_second_arg,
125+
your_kwarg=your_kwarg,
114126
batch_size=batch_size,
115127
parallelize=parallelize,
116128
num_workers=num_workers,
129+
evaluation_budget=evaluation_budget,
117130
)
118131

119-
# Your first input (an np.array[str])
132+
# Your first input (an np.array[str] of shape [b, L] or [b,])
120133
x0 = ...
121134

122135
return f, x0, f(x0)
@@ -146,7 +159,7 @@ It is important that name of your problem should be the name of the folder it's
146159

147160
:::{warning}
148161

149-
`poli` is experimental. The input kwargs to the abstract black box
162+
`poli` is under active development. The input kwargs to the abstract black box
150163
and to the create method are under active development. Your IDE should
151164
tell you automatically, though!
152165

@@ -165,7 +178,7 @@ dependencies:
165178
- pip
166179
- pip:
167180
- numpy
168-
- "git+https://github.com/MachineLearningLifeScience/poli.git@master"
181+
- "git+https://github.com/MachineLearningLifeScience/poli.git@dev"
169182
- YOUR OTHER DEPENDENCIES
170183
```
171184
@@ -191,11 +204,11 @@ dependencies:
191204
- pip:
192205
- numpy
193206
- click
194-
- "git+https://github.com/MachineLearningLifeScience/poli.git@master"
207+
- "git+https://github.com/MachineLearningLifeScience/poli.git@dev"
195208
196209
```
197210

198-
It installs an `openjdk` that will be added to the path when the environment is active. Moreover, you can also hack your way around installing conda and creating conda environments inside Colab.
211+
It installs an `openjdk` that will be added to the path when the environment is active. Moreover, [`conda` is also installable in Google Colab, allowing you to use `poli` there](https://colab.research.google.com/drive/1-IISCebWYfu0QhuCJ11wOag8aKOiPtls?usp=sharing).
199212

200213
:::
201214

@@ -208,18 +221,11 @@ If you
208221

209222
then you should be set!
210223

211-
You can test that your problem is registerable by creating a fresh environment that includes poli, and running
224+
You can test that your problem is registerable by running
212225

213226
```bash
214227
$ python -c "from poli.core.registry import get_problems; print(get_problems())"
215-
[...] # A list, without your problem in it.
216-
```
217-
218-
Your problem is not registered yet, so don't fret. You can check _if_ you can register it by running
219-
220-
```bash
221-
$ python -c "from poli.core.registry import get_problems; print(get_problems(include_repository=True))"
222-
[..., "your_problem", ...] # If all goes well, you should see "your_problem" here.
228+
[..., "your_problem", ...] # A list with your problem in it
223229
```
224230

225231
If you can find your problem in this list, then you're set! You should be able to run
@@ -230,18 +236,20 @@ from poli import objective_factory
230236
problem_info, f, x0, y0, _ = objective_factory.create(
231237
name="your_problem",
232238
...,
233-
your_keyword_1=..., # <-- Keywords you (maybe) needed
234-
your_keyword_2=... # <-- at your_factory.create(...)
235-
# For now, only string kwargs are
236-
# supported.
239+
your_arg_1=..., # <-- Keywords you (maybe) needed
240+
your_arg_2=..., # <-- at your_factory.create(...)
241+
your_kwarg=..., # <--
242+
# For now, only certain types are
243+
# supported: str, int, bool, float,
244+
# None, and lists thereof.
237245
)
238246
```
239247

240-
`poli` will ask you to confirm that you want to register your problem (you can force the registration by passing `force_register=True` to `objective_factory.create`).
241-
242248
## (Optional) Making your problem be available if dependencies are met
243249

244-
At this point, you can run your objective function in an isolated process (which will literally import the factory and the black box function from the `register.py` you wrote). A better alternative is to get direct access to the object itself. Having access to the actual class makes your life easy, especially when it comes to using debugging tools like the ones in VSCode.
250+
At this point, you can run your objective function in an isolated process (which will literally import the factory and the black box function from the `register.py` you wrote).
251+
252+
A better alternative is to get direct access to the object itself. Having access to the actual class makes your life easy, especially when it comes to using debugging tools like the ones in VSCode.
245253

246254
If you want to make your problem available if it can be imported, take a look at `src/poli/objective_repository/__init__.py`. Add a block like this one at the end of it:
247255

@@ -263,4 +271,4 @@ except ImportError: # Maybe you'll need to check for other errors.
263271

264272
## Submitting a pull request
265273

266-
If you want to share your problem with us, feel free to create a pull request in our repository: https://github.com/MachineLearningLifeScience/poli
274+
If you want to share your problem with us, feel free to create a pull request in our repository following the instructions in our `CONTRIBUTING.md`: https://github.com/MachineLearningLifeScience/poli
Lines changed: 134 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,135 @@
1-
# Adding a new black box optimization algorithm
1+
# Adding a new optimizer to `poli-baselines`
22

3-
[TODO: write] For now, check [the chapter on creating solvers](../using_poli/the_basics/defining_a_problem_solver.md).
3+
The main use-case for `poli_baselines` is **defining optimizers for objective functions**.
4+
5+
The main design objective of `poli` is for it to be almost trivial to **query** complicated black box objective functions; likewise, the design objective of `poli_baselines` is to allow developers of black-box optimization algorithms to test them on said objective functions.
6+
7+
This chapter explains how to define a "solver", or a black-box optimization algorithm.
8+
9+
:::{note}
10+
11+
By default, all our optimizers **maximize**.
12+
13+
:::
14+
15+
## An abstract problem solver
16+
17+
All problem solvers in `poli_baselines` inherit from an `AbstractSolver`, which is implemented as follows:
18+
19+
```python
20+
# poli_baselines/core/abstract_solver.py
21+
class AbstractSolver:
22+
def __init__(
23+
self,
24+
black_box: AbstractBlackBox,
25+
x0: np.ndarray,
26+
y0: np.ndarray,
27+
):
28+
self.black_box = black_box
29+
self.x0 = x0
30+
self.y0 = y0
31+
32+
self.history = {
33+
"x": [x0_i.reshape(1, -1) for x0_i in x0],
34+
"y": [y0_i.reshape(1, -1) for y0_i in y0],
35+
}
36+
37+
self.iteration = 0
38+
```
39+
40+
i.e. the minimal ingredients required to instantiate a solver are a black-box function defined through `poli`, the initial design `x0`, and its evaluation `y0`.
41+
42+
**The only abstract method required** is a `next_candidate() -> np.ndarray`, which uses the `self.history` to propose a new candidate. Using this method, the abstract solver implements a `.solve(max_iter: int)` as follows:
43+
44+
```python
45+
# poli_baselines/core/abstract_solver.py
46+
class AbstractSolver:
47+
...
48+
49+
def next_candidate(self) -> np.ndarray:
50+
"""
51+
Returns the next candidate solution
52+
after checking the history.
53+
"""
54+
raise NotImplementedError(
55+
"This method is abstract, and should be implemented by a subclass."
56+
)
57+
58+
def solve(self, max_iter: int = 100):
59+
"""
60+
Runs the solver for the given number of iterations.
61+
"""
62+
for i in range(max_iter):
63+
# Call the pre-step callbacks
64+
if pre_step_callbacks is not None:
65+
for callback in pre_step_callbacks:
66+
callback(self)
67+
68+
# Take a step, which in turn updates the local history.
69+
_, y = self.step()
70+
71+
# Call the post-step callbacks
72+
if post_step_callbacks is not None:
73+
for callback in post_step_callbacks:
74+
callback(self)
75+
76+
if verbose:
77+
print(f"Iteration {i}: {y}, best so far: {self.get_best_performance()}")
78+
79+
if break_at_performance is not None:
80+
if y >= break_at_performance:
81+
break
82+
```
83+
84+
## An example: `RandomMutations`
85+
86+
Leveraging the fact that we are usually working with discrete sequences, we can implement the simplest version of an optimizer: one that takes the best performing sequence, and randomly mutates one of its positions.
87+
88+
The following is an implementation of exactly this:
89+
90+
```python
91+
# poli_baselines/solvers/simple/random_mutation.py (almost)
92+
class RandomMutation(AbstractSolver):
93+
def __init__(
94+
self,
95+
black_box: AbstractBlackBox,
96+
x0: np.ndarray,
97+
y0: np.ndarray,
98+
):
99+
super().__init__(black_box, x0, y0)
100+
self.alphabet = black_box.info.alphabet
101+
self.alphabet_size = len(self.alphabet)
102+
103+
def next_candidate(self) -> np.ndarray:
104+
"""
105+
Returns the next candidate solution
106+
after checking the history.
107+
108+
In this case, the RandomMutation solver
109+
simply returns a random mutation of the
110+
best performing solution so far.
111+
"""
112+
# Get the best performing solution so far
113+
best_x = self.history["x"][np.argmax(self.history["y"])]
114+
115+
# Perform a random mutation
116+
# (Assuming that x is always [1, L] in shape)
117+
next_x = best_x.copy()
118+
pos = np.random.randint(0, len(next_x.flatten()))
119+
mutant = np.random.choice(self.alphabet)
120+
next_x[0][pos] = mutant
121+
122+
return next_x
123+
```
124+
125+
Pretty lean! Notice how **the `next_candidate` method could perform all sorts of complicated logic** like latent space Bayesian Optimization, evolutionary algorithms... Moreover, the conda environment where you do the optimization has nothing to do with the enviroment where the objective function was defined: `poli` is set up in such a way that you can query the objective functions without having to worry!
126+
127+
:::{note}
128+
Our implementation of `RandomMutation` is slightly different, since we allow users to query e.g. integer indices instead of strings.
129+
130+
[Take a look at the exact implementation on `poli_baselines/solvers/simple/random_mutation.py`](https://github.com/MachineLearningLifeScience/poli-baselines/blob/main/src/poli_baselines/solvers/simple/random_mutation.py).
131+
:::
132+
133+
## Submitting a pull request
134+
135+
If you want to share your problem with us, feel free to create a pull request in our repository following the instructions in our `CONTRIBUTING.md`: https://github.com/MachineLearningLifeScience/poli-baselines

docs/protein-optimization/index.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,20 @@ This page contains documentation on how to use `poli`, a library of discrete obj
55
A core feature of `poli` is isolating calls to complicated objective functions which might, for example, depend on simulators, binaries, and highly specific package requirements.
66
Our promise is: if you can run your objective function reliably in a `conda` environment, then you can register it and call it from other projects and environments without having to worry about re-installing all the dependencies.
77

8-
## Get started!
8+
## Getting started
99

1010
A good place to start is the next chapter! [Go to Getting Started](./getting_started/getting_started.md).
1111

12+
To install `poli` and `poli-baselines`, we recommend creating a fresh conda environment
13+
14+
```bash
15+
conda create -n poli-base python=3.9
16+
conda activate poli-base
17+
pip install git+https://github.com/MachineLearningLifeScience/poli.git@dev
18+
pip install git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main
19+
```
20+
21+
`poli` also [runs on colab](https://colab.research.google.com/drive/1-IISCebWYfu0QhuCJ11wOag8aKOiPtls?usp=sharing).
1222

1323
## Black-box objective functions
1424

@@ -121,12 +131,6 @@ On top of `poli`, we provide `poli-baselines`, a collection of **black-box optim
121131
Optimizing a discrete sequence by performing random mutations
122132
:::
123133

124-
:::{grid-item-card} Discrete NSGA-2
125-
:link: ./using_poli_baselines/nsga_2.html
126-
:columns: 6
127-
A Genetic algorithm for multi-objective optimization of discrete sequences
128-
:::
129-
130134
:::{grid-item-card} CMA-ES
131135
:link: ./using_poli_baselines/cma_es.html
132136
:columns: 6

0 commit comments

Comments
 (0)