Skip to content

Commit c5e405d

Browse files
committed
Adds how to run white noise, aloha, foldx and rdkit related objective functions
1 parent f3595dd commit c5e405d

File tree

11 files changed

+572
-35
lines changed

11 files changed

+572
-35
lines changed

docs/protein-optimization/_toc.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,17 @@ parts:
2020
- caption: Some objective functions available
2121
chapters:
2222
- file: using_poli/objective_repository/white_noise.md
23+
- file: using_poli/objective_repository/aloha.md
24+
- file: using_poli/objective_repository/rdkit_qed.md
25+
- file: using_poli/objective_repository/rdkit_logp.md
2326
- file: using_poli/objective_repository/foldx_stability.md
27+
- file: using_poli/objective_repository/foldx_sasa.md
2428
- file: using_poli/objective_repository/super_mario_bros.md
25-
- file: using_poli/objective_repository/small_molecule.md
2629
- caption: "Contributing"
2730
chapters:
2831
- file: contributing/a_new_problem.md
2932
- file: contributing/a_new_solver.md
30-
- caption: "Appendix: Understanding FoldX"
33+
- caption: "Appendix: Understanding foldx"
3134
chapters:
3235
- file: understanding_foldx/00-installing-foldx.md
3336
- file: understanding_foldx/01-single-mutation-using-foldx/index.ipynb

docs/protein-optimization/index.md

Lines changed: 61 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -11,49 +11,85 @@ Our promise is: if you can run your objective function reliably in a `conda` env
1111

1212
These are some objective functions available on `poli`. **For a full list**, check [getting started](./getting_started/getting_started.md).
1313

14+
15+
### Toy problems
16+
1417
::::{grid}
1518
:gutter: 3
1619

1720
:::{grid-item-card} White noise
1821
:link: ./using_poli/objective_repository/white_noise.html
19-
:columns: 4
22+
:columns: 6
2023
White noise drawn from a unit Gaussian
2124
:::
2225

2326
:::{grid-item-card} Aloha
2427
:link: ./using_poli/objective_repository/aloha.html
25-
:columns: 4
28+
:columns: 6
2629
A toy example about optimizing 5-letter words to spell "ALOHA"
2730
:::
2831

29-
:::{grid-item-card} Protein Stability
30-
:link: ./using_poli/objective_repository/foldx_stability.html
31-
:columns: 4
32-
Stability of mutations of a wildtype using `foldx`
32+
::::
33+
34+
### Small molecules
35+
36+
::::{grid}
37+
:gutter: 3
38+
39+
:::{grid-item-card} Quantitative Estimate of Druglikeness (QED)
40+
:link: ./using_poli/objective_repository/rdkit_qed.html
41+
:columns: 6
42+
Computing the QED using `RDKit`.
3343
:::
3444

35-
:::{grid-item-card} Mario jumps
36-
:link: ./using_poli/objective_repository/super_mario_bros.html
37-
:columns: 4
38-
Optimizing the number of jumps on a Super Mario Bros level
45+
:::{grid-item-card} Log-solubility (LogP)
46+
:link: ./using_poli/objective_repository/rdkit_logp.html
47+
:columns: 6
48+
Computing the log-quotient of solubilities using `RDKit`.
49+
:::
50+
51+
:::{grid-item-card} TDC oracles [WIP]
52+
:link: ./using_poli/objective_repository/tdc_oracles.html
53+
:columns: 6
54+
Some of the oracles provided by the Therapeutics Data Commons. [WIP]
55+
:::
56+
57+
::::
58+
59+
### Proteins
60+
61+
::::{grid}
62+
:gutter: 3
63+
64+
:::{grid-item-card} Protein Stability (using `foldx`)
65+
:link: ./using_poli/objective_repository/foldx_stability.html
66+
:columns: 6
67+
Stability of mutations of a wildtype using `foldx`
3968
:::
4069

41-
:::{grid-item-card} Small molecule properties
42-
:link: ./using_poli/objective_repository/small_molecule.html
43-
:columns: 4
44-
Small molecule's synthesizability, druglikeness, and more. [WIP]
70+
:::{grid-item-card} Protein SASA score (using `foldx`)
71+
:link: ./using_poli/objective_repository/foldx_sasa.html
72+
:columns: 6
73+
Solvent accessibility of mutations of a wildtype using `foldx`
4574
:::
4675

47-
:::{grid-item-card} RaSP
76+
:::{grid-item-card} Protein Stability (using `RaSP`) [WIP]
4877
:link: ./using_poli/objective_repository/RaSP.html
49-
:columns: 4
78+
:columns: 6
5079
Rapid Stability Predictions of single mutations from a wildtype. [WIP]
5180
:::
5281

53-
:::{grid-item-card} TDC oracles
54-
:link: ./using_poli/objective_repository/tdc_oracles.html
55-
:columns: 4
56-
Some of the oracles provided by the Therapeutics Data Commons. [WIP]
82+
::::
83+
84+
### Other
85+
86+
::::{grid}
87+
:gutter: 3
88+
89+
:::{grid-item-card} Mario jumps
90+
:link: ./using_poli/objective_repository/super_mario_bros.html
91+
:columns: 6
92+
Optimizing the number of jumps on a Super Mario Bros level
5793
:::
5894

5995
::::
@@ -66,31 +102,31 @@ On top of `poli`, we provide `poli-baselines`, a collection of **black-box optim
66102

67103
:::{grid-item-card} Random Mutations
68104
:link: ./using_poli_baselines/random_mutations.html
69-
:columns: 4
105+
:columns: 6
70106
Optimizing a discrete sequence by performing random mutations
71107
:::
72108

73109
:::{grid-item-card} NSGA-2
74110
:link: ./using_poli_baselines/nsga_2.html
75-
:columns: 4
111+
:columns: 6
76112
A Genetic algorithm for optimizing more than one metric [WIP]
77113
:::
78114

79115
:::{grid-item-card} CMA-ES
80116
:link: ./using_poli_baselines/cma_es.html
81-
:columns: 4
117+
:columns: 6
82118
An evolutionary strategy for continuous problems [WIP]
83119
:::
84120

85121
:::{grid-item-card} Latent Space Bayesian Optimization
86122
:link: ./using_poli_baselines/latent_space_bo.html
87-
:columns: 4
123+
:columns: 6
88124
Learning continuous representations and optimizing in latent space. [WIP]
89125
:::
90126

91127
:::{grid-item-card} Graph GA
92128
:link: ./using_poli_baselines/graph_ga.html
93-
:columns: 4
129+
:columns: 6
94130
Graph Genetic Algorithms for small molecules. [WIP]
95131
:::
96132

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
# Installing FoldX
1+
# Installing `foldx`
22

33
[TODO: write]
Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
11
# Aloha objective function
22

3-
[TODO: write]
3+
![Type of objective function: discrete](https://img.shields.io/badge/Input_type-discrete-blue)
4+
![Environment to run this objective function: poli base](https://img.shields.io/badge/Environment-poli____base-teal
5+
)
6+
7+
## About
8+
9+
This toy objective function takes 5-letter sequences (in all-caps) and returns the distance to the string "ALOHA".
10+
11+
## Prerequisites
12+
13+
None, this function should always run out-of-the-box
14+
15+
## How to run
16+
17+
```python
18+
import numpy as np
19+
from poli import objective_factory
20+
21+
# How to create
22+
problem_info, f, x0, y0, run_info = objective_factory.create(name="aloha")
23+
24+
# Example input:
25+
x = np.array([["A", "L", "O", "O", "F"]]) # must be of shape [b, L], in this case [1, 3].
26+
27+
# Querying:
28+
print(f(x)) # Should be 3 (A, L, and the first O).
29+
```
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Protein solvent accessibility (using `foldx`)
2+
![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
3+
![Environment to run this objective function: poli protein](https://img.shields.io/badge/Environment-poli____protein-teal
4+
)
5+
6+
## About
7+
8+
This objective function returns the solvent accesible surface area (SASA) of a wildtype and its mutations using `foldx` and `biopython`.
9+
10+
## Prerequisites
11+
12+
- Have `foldx` installed, and available in your home directory. We expect the following files to be there:
13+
- `~/foldx/foldx`: the binary. You might need to rename it.
14+
- `~/foldx/rotabase.txt`: a text file necessary for `foldx` to run.
15+
- A `wildtype_pdb_file`: a (repaired) pdb file of the wildtype.
16+
17+
## How to run
18+
19+
You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
20+
21+
::::{tab-set}
22+
23+
:::{tab-item} In current environment
24+
25+
You will have to install the following two dependencies:
26+
27+
```bash
28+
pip install biopython python-levenshtein
29+
```
30+
31+
Then run
32+
33+
```python
34+
from pathlib import Path
35+
36+
import numpy as np
37+
38+
from poli import objective_factory
39+
40+
# The path to your wildtype pdb
41+
wildtype_pdb_file = Path("path/to/wildtype.pdb")
42+
43+
# How to create
44+
problem_info, f, x0, y0, run_info = objective_factory.create(
45+
name="foldx_sasa",
46+
wildtype_pdb_file=wildtype_pdb_file
47+
)
48+
49+
# Example input:
50+
print(x0)
51+
52+
# Querying:
53+
print(y0) # The stability of your wildtype
54+
```
55+
56+
You could also pass an `alphabet: Dict[str, int]` to the create method. By default, [we use this encoding](https://github.com/MachineLearningLifeScience/poli/blob/44cad2a5c95f209aeb24d4893d162b3359ca91a3/src/poli/core/util/proteins/defaults.py#L1).
57+
58+
:::
59+
60+
:::{tab-item} In isolation
61+
62+
If you want us to handle dependencies, run
63+
64+
```python
65+
from pathlib import Path
66+
67+
import numpy as np
68+
69+
from poli import objective_factory
70+
71+
# The path to your wildtype pdb
72+
wildtype_pdb_file = Path("path/to/wildtype.pdb")
73+
74+
# How to create
75+
problem_info, f, x0, y0, run_info = objective_factory.create(
76+
name="foldx_sasa",
77+
wildtype_pdb_file=wildtype_pdb_file,
78+
force_register=True,
79+
)
80+
81+
# Example input:
82+
print(x0)
83+
84+
# Querying:
85+
print(y0) # The stability of your wildtype
86+
```
87+
88+
```{warning}
89+
Registering the objective function in this way will create a `conda` environment called `poli__protein` with the relevant dependencies.
90+
```
91+
92+
:::
93+
94+
::::
Lines changed: 93 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,94 @@
1-
# Protein stability using `foldx`
1+
# Protein stability (using `foldx`)
2+
![Type of objective function: discrete](https://img.shields.io/badge/Type-discrete_inputs-blue)
3+
![Environment to run this objective function: poli protein](https://img.shields.io/badge/Environment-poli____protein-teal
4+
)
25

3-
[TODO: write]
6+
## About
7+
8+
This objective function returns the stability (i.e. negative energy) using `foldx`.
9+
10+
## Prerequisites
11+
12+
- Have `foldx` installed, and available in your home directory. We expect the following files to be there:
13+
- `~/foldx/foldx`: the binary. You might need to rename it.
14+
- `~/foldx/rotabase.txt`: a text file necessary for `foldx` to run.
15+
- A `wildtype_pdb_file`: a (repaired) pdb file of the wildtype.
16+
17+
## How to run
18+
19+
You can either run this objective function in your current environment (assuming that you have the correct dependencies installed), or you can run it in an isolated environment.
20+
21+
::::{tab-set}
22+
23+
:::{tab-item} In current environment
24+
25+
You will have to install the following two dependencies:
26+
27+
```bash
28+
pip install biopython python-levenshtein
29+
```
30+
31+
Then run
32+
33+
```python
34+
from pathlib import Path
35+
36+
import numpy as np
37+
38+
from poli import objective_factory
39+
40+
# The path to your wildtype pdb
41+
wildtype_pdb_file = Path("path/to/wildtype.pdb")
42+
43+
# How to create
44+
problem_info, f, x0, y0, run_info = objective_factory.create(
45+
name="foldx_stability",
46+
wildtype_pdb_file=wildtype_pdb_file
47+
)
48+
49+
# Example input:
50+
print(x0)
51+
52+
# Querying:
53+
print(y0) # The stability of your wildtype
54+
```
55+
56+
You could also pass an `alphabet: Dict[str, int]` to the create method. By default, [we use this encoding](https://github.com/MachineLearningLifeScience/poli/blob/44cad2a5c95f209aeb24d4893d162b3359ca91a3/src/poli/core/util/proteins/defaults.py#L1).
57+
58+
:::
59+
60+
:::{tab-item} In isolation
61+
62+
If you want us to handle dependencies, run
63+
64+
```python
65+
from pathlib import Path
66+
67+
import numpy as np
68+
69+
from poli import objective_factory
70+
71+
# The path to your wildtype pdb
72+
wildtype_pdb_file = Path("path/to/wildtype.pdb")
73+
74+
# How to create
75+
problem_info, f, x0, y0, run_info = objective_factory.create(
76+
name="foldx_stability",
77+
wildtype_pdb_file=wildtype_pdb_file,
78+
force_register=True,
79+
)
80+
81+
# Example input:
82+
print(x0)
83+
84+
# Querying:
85+
print(y0) # The stability of your wildtype
86+
```
87+
88+
```{warning}
89+
Registering the objective function in this way will create a `conda` environment called `poli__protein` with the relevant dependencies.
90+
```
91+
92+
:::
93+
94+
::::

0 commit comments

Comments
 (0)