Update docs to current state

miguelgondu · miguelgondu · commit 2714dc7e6c15 · 2023-10-12T13:52:00.000+02:00
diff --git a/docs/protein-optimization/contributing/a_new_problem.md b/docs/protein-optimization/contributing/a_new_problem.md
@@ -14,7 +14,7 @@ poli/objective_repository
 │   └── register.py           # Definition and registration of the problem
 ```
 
-You can also have as many other files as you want. Think of the folder `.../problem_name` as a small project as of itself: you can have Python dependencies that will get appended to the path at runtime.
+You can also have as many other files as you want. Think of the folder `.../problem_name` as a small project as of itself: you can use any internal code you write here, since it'll be carried with `poli` at installation time.
 
 **For example:** let's take a look at the problem folder of `super_mario_bros`
 
@@ -26,7 +26,7 @@ You can also have as many other files as you want. Think of the folder `.../prob
 │   ├── example.pt           # < --
 │   ├── level_utils.py       # < --
 │   ├── model.py             # < --  extra files needed
-│   ├── readme.md            # < --  for the simulation
+│   ├── readme.md            # < --  for the black box
 │   ├── simulator.jar        # < --  to run.
 │   └── simulator.py         # < --
 ```
@@ -41,7 +41,7 @@ The average `register.py` has the following structure
 
 ```python
 # An average register.py
-from typing import Tuple
+from typing import Tuple, List
 
 import numpy as np
 
@@ -52,12 +52,25 @@ from poli.core.problem_setup_information import ProblemSetupInformation
 # Files that are in the same folder as
 # register will get added to the
 # PYTHONPATH at runtime.
+# Imagining you have your_local_dependency.py
+# in the same folder as register.py...
 from your_local_dependency import ...
 
 
 class YourBlackBox(AbstractBlackBox):
-    def __init__(self, info: ProblemSetupInformation, batch_size: int = None):
-        super().__init__(info=info, batch_size=batch_size)
+    def __init__(
+        self,
+        info: ProblemSetupInformation,
+        batch_size: int = None,
+        parallelize: bool = False,
+        num_workers: int = None
+    ):
+        super().__init__(
+            info=info,
+            batch_size=batch_size,
+            parallelize=parallelize,
+            num_workers=num_workers,
+        )
 
     # The only method you have to define
     def _black_box(self, x: np.ndarray, context: dict = None) -> np.ndarray:
@@ -82,17 +95,26 @@ class YourProblemFactory(AbstractProblemFactory):
     def create(
         self,
         seed: int = None,
+        batch_size: int = None,
+        parallelize: bool = False,
+        num_workers: int = None,
         your_keyword_1: str = ...,
-        your_keyword_2: str = ...,
+        your_keyword_2: int = ...,
+        your_keyword_3: List[float] = ...,
     ) -> Tuple[AbstractBlackBox, np.ndarray, np.ndarray]:
         # Manipulate keywords you might need at creation time...
         ...
         
-        # The maximum length you defined above
+        # Getting the problem information
         problem_info = self.get_setup_information()
         
         # Creating your black box function
-        f = YourBlackBox(info=problem_info)
+        f = YourBlackBox(
+            info=problem_info,
+            batch_size=batch_size,
+            parallelize=parallelize,
+            num_workers=num_workers,
+        )
         
         # Your first input (an np.array[str])
         x0 = ...
@@ -122,6 +144,14 @@ That is, **the script creates and registers** your problem factory.
 It is important that name of your problem should be the name of the folder it's contained, **exactly**. (We advice using `camel_case`).
 :::
 
+:::{warning}
+
+`poli` is experimental. The input kwargs to the abstract black box
+and to the create method are under active development. Your IDE should
+tell you automatically, though!
+
+:::
+
 ## A generic `environment.yml`
 
 You will usually develop your black-box objective function inside an environment, say `your_env`. You need to specify all these requirements in the `environment.yml`, generically:
@@ -209,6 +239,28 @@ problem_info, f, x0, y0, _ = objective_factory.create(
 
 `poli` will ask you to confirm that you want to register your problem (you can force the registration by passing `force_register=True` to `objective_factory.create`).
 
+## (Optional) Making your problem be available if dependencies are met
+
+At this point, you can run your objective function in an isolated process (which will literally import the factory and the black box function from the `register.py` you wrote). A better alternative is to get direct access to the object itself. Having access to the actual class makes your life easy, especially when it comes to using debugging tools like the ones in VSCode.
+
+If you want to make your problem available if it can be imported, take a look at `src/poli/objective_repository/__init__.py`. Add a block like this one at the end of it:
+
+```python
+#... the rest of poli/objective_repository/__init__.py
+
+# A block you could add:
+try:
+    from .your_problem.register import YourProblemFactory
+
+    AVAILABLE_PROBLEM_FACTORIES["your_problem"] = YourProblemFactory
+except ImportError:  # Maybe you'll need to check for other errors.
+    pass
+
+```
+
+`AVAILABLE_PROBLEM_FACTORIES` is keeping track of all the problem factories we can import without needing to isolate the process. If a problem is in this dict, it will appear when querying `get_problems()`, and it will be passed at `objective_factory.create` time.
+
+
 ## Submitting a pull request
 
 If you want to share your problem with us, feel free to create a pull request in our repository: https://github.com/MachineLearningLifeScience/poli
diff --git a/docs/protein-optimization/getting_started/getting_started.md b/docs/protein-optimization/getting_started/getting_started.md
@@ -29,7 +29,7 @@ $ conda activate poli-base
 $ pip install numpy
 ```
 
-Right now, we only support two ways of installing `poli`: by cloning the repo and installing, or using `pip` and `git+`. [TODO: change from my fork to MLLS repo after merging]
+Right now, we only support two ways of installing `poli`: by cloning the repo and installing, or using `pip` and `git+`.
 
 ::::{tab-set}
 
diff --git a/docs/protein-optimization/using_poli/optimization_examples/protein-stability-foldx/optimizing_protein_stability.ipynb b/docs/protein-optimization/using_poli/optimization_examples/protein-stability-foldx/optimizing_protein_stability.ipynb
@@ -141,7 +141,7 @@
     "\n",
     "In general, it is a good idea to check how to create instances of individual problems in their documentation, since they might need extra inputs.\n",
     "\n",
-    "For example, [`foldx_stability` only needs one extra keyword argument: a `wildtype_pdb_path`](../../objective_repository/foldx_stability.md). `poli` will hopefully remind you what you forgot with its error messages.\n",
+    "For example, [`foldx_stability` only needs one extra keyword argument: a single/list of `wildtype_pdb_path`](../../objective_repository/foldx_stability.md). `poli` will hopefully remind you what you forgot with its error messages.\n",
     "\n",
     ":::"
    ]
@@ -229,11 +229,11 @@
    "source": [
     "`objective_factory.create` returns four things:\n",
     "\n",
-    "1. a `problem_info` with a description of the problem, including useful attributes like `alphabet` or `max_sequence_length`. (See more [here (TODO: ADD)]()).\n",
+    "1. a `problem_info` with a description of the problem, including useful attributes like `alphabet` or `max_sequence_length`. (See more [here (TODO: ADD API REFERENCE)]()).\n",
     "2. a black-box function `f: AbstractBlackBox` from `poli`.\n",
     "3. an initial design `x0: np.ndarray`, and\n",
     "4. an initial evaluation `y0: np.ndarray`.\n",
-    "5. `run_info`, or the output of the observer (?)."
+    "5. `run_info`, or the output of the observer's initialization (see more [in our chapter about making observers](../../the_basics/defining_an_observer.ipynb))."
    ]
   },
   {
diff --git a/docs/protein-optimization/using_poli/the_basics/defining_an_observer.ipynb b/docs/protein-optimization/using_poli/the_basics/defining_an_observer.ipynb
@@ -177,7 +177,7 @@
     "\n",
     ":::{warning}\n",
     "\n",
-    "Remember that this is a simple example! We are essentially re-inventing the wheel. You should write more complex logic for logging, or use libraries like `tensorflow`, `mlflow` or `wandb`.\n",
+    "Remember that this is a simple example! We are essentially re-inventing the wheel. You should write more complex logic for logging, or use libraries like `tensorboard`, `mlflow` or `wandb`.\n",
     "\n",
     ":::"
    ]
diff --git a/docs/protein-optimization/using_poli/the_basics/intro_to_poli.ipynb b/docs/protein-optimization/using_poli/the_basics/intro_to_poli.ipynb
@@ -36,7 +36,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Black-box optimization algorithms inside `poli-baselines` are treated as **solvers** of **problems** defined using `poli`.\n",
+    "Black-box optimization algorithms inside `poli-baselines` are treated as **solvers**, and the discrete objective functions of `poli` are described as **problems**.\n",
     "\n",
     "We propose to you the following process for using `poli-baselines`' optimizers, or developing your own:"
    ]
@@ -66,7 +66,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "['aloha', 'white_noise']\n"
+      "['aloha', 'gfp_select', 'white_noise']\n"
      ]
     }
    ],
@@ -91,7 +91,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "['aloha', 'foldx_rfp_lambo', 'foldx_sasa', 'foldx_stability', 'foldx_stability_and_sasa', 'rdkit_logp', 'rdkit_qed', 'super_mario_bros', 'white_noise']\n"
+      "['aloha', 'drd3_docking', 'foldx_rfp_lambo', 'foldx_sasa', 'foldx_stability', 'foldx_stability_and_sasa', 'gfp_select', 'penalized_logp_lambo', 'rdkit_logp', 'rdkit_qed', 'sa_tdc', 'super_mario_bros', 'white_noise']\n"
      ]
     }
    ],
@@ -103,7 +103,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Each one of these objective functions can be run without modifying your environment, but you might need to check their prerequisites. We do our best to keep the list updated in [the introduction page](../../index.md), where you can find links to the requirements and installation descriptions for each one of these.\n",
+    "Each one of these objective functions can be run without modifying your environment, but you might need to check their prerequisites. We do our best to keep the list updated in [the page on all objective functions](../objective_repository/all_objectives.md), where you can find links to the requirements and installation descriptions for each one of these.\n",
+    "\n",
+    ":::{note}\n",
+    "\n",
+    "Some objective functions have more requirements, like installing external dependencies. Check [the page on all objective functions](../objective_repository/all_objectives.md) and click on the objective function you are interested in to get a detailed set of instructions on how to install and run it.\n",
+    "\n",
+    ":::\n",
     "\n",
     "If the function still isn't there, **implement it yourself!** An example of how to do this can be found in `poli_baselines/examples/00_a_simple_objective_function_registration`, or in our chapter on [registering optimization functions](./registering_an_objective_function.md).\n",
     "\n",
@@ -112,7 +118,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -146,15 +152,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "x0: [['1' '2' '3']]\n",
-      "y0: [[-0.70206057]]\n"
+      "y0: [[-0.85037523]]\n"
      ]
     }
    ],
@@ -180,16 +186,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "array([['4', '2', '3']], dtype='<U1')"
+       "array([['1', '7', '3']], dtype='<U1')"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -207,7 +213,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
@@ -216,7 +222,7 @@
        "['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']"
       ]
      },
-     "execution_count": 11,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -248,7 +254,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -257,14 +263,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "[['1' '5' '6']]\n"
+      "[['1' '2' '5']]\n"
      ]
     }
    ],
diff --git a/docs/protein-optimization/using_poli/the_basics/registering_an_objective_function.md b/docs/protein-optimization/using_poli/the_basics/registering_an_objective_function.md
@@ -80,7 +80,11 @@ class OurAlohaProblemFactory(AbstractProblemFactory):
             alphabet=alphabet,
         )
 
-    def create(self, seed: int = None) -> Tuple[AbstractBlackBox, np.ndarray, np.ndarray]:
+    # Adding **kwargs is necessary, since several things usually
+    # get passed to the create method at initialization.
+    def create(
+        self, seed: int = None, **kwargs
+    ) -> Tuple[AbstractBlackBox, np.ndarray, np.ndarray]:
         problem_info = self.get_setup_information()
         f = OurAlohaBlackBox(info=problem_info)
         x0 = np.array([["A", "L", "O", "O", "F"]])
@@ -102,8 +106,6 @@ Check the exact implementation on [`poli/objective_repository/aloha/register.py`
 
 First step is always **creating a conda environment for your problem**. In this case, we could do with just the base enviroment. However, for completion in the presentation, we will create a conda enviroment called `poli_aloha`. This is the enviroment description (which can be found under `environment.yml` in the examples folder for `aloha`):
 
-TODO: move the dependency to our github after merging.
-
 ```yml
 # environment.yml
 name: poli_aloha_problem

Original file line number	Diff line number	Diff line change
`@@ -177,7 +177,7 @@`
`177`	`177`	`"\n",`
`178`	`178`	`":::{warning}\n",`
`179`	`179`	`"\n",`
`180`		- "Remember that this is a simple example! We are essentially re-inventing the wheel. You should write more complex logic for logging, or use libraries like `tensorflow`, `mlflow` or `wandb`.\n",
	`180`	+ "Remember that this is a simple example! We are essentially re-inventing the wheel. You should write more complex logic for logging, or use libraries like `tensorboard`, `mlflow` or `wandb`.\n",
`181`	`181`	`"\n",`
`182`	`182`	`":::"`
`183`	`183`	`]`