Updates readme with instructions on how to build the documentation, and adds an intro to poli

miguelgondu · miguelgondu · commit 793e06bdfd11 · 2023-08-16T14:37:02.000+02:00
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 .DS_Store
+.vscode/
+.ipynb_checkpoints/
 
 # Example's working directory
 examples/understanding_foldx/tmp
diff --git a/docs/protein-optimization/_toc.yml b/docs/protein-optimization/_toc.yml
@@ -9,6 +9,7 @@ parts:
       - file: getting_started/getting_started.md
   - caption: Using poli - the basics
     chapters:
+      - file: using_poli/the_basics/intro_to_poli.ipynb
       - file: using_poli/the_basics/registering_an_objective_function.md
       - file: using_poli/the_basics/defining_a_problem_solver.md
       - file: using_poli/the_basics/optimizing_an_objective_function.md
diff --git a/docs/protein-optimization/getting_started/getting_started.md b/docs/protein-optimization/getting_started/getting_started.md
@@ -99,4 +99,8 @@ for _ in range(5):
 
 If we run this script, `poli` will ask us to confirm that we want to register/install `"white_noise"` as an objective function (you can deactivate this confirmation step by passing the flag `force_register=True` to `.create`). Afterwards, it will print 5 evaluations of the objective function on the same input.
 
+## Conclusion
 
+This tutorial showed you how to install `poli`, and register `white_noise`, which is available inside `poli` itself.
+
+The next chapters explain the basics of `poli`, such as registering your own objective functions, defining problem solvers in `poli-baselines`, and optimizing.
diff --git a/docs/protein-optimization/using_poli/the_basics/intro_to_poli.ipynb b/docs/protein-optimization/using_poli/the_basics/intro_to_poli.ipynb
@@ -0,0 +1,347 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# What is poli?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```{contents}\n",
+    "```\n",
+    "\n",
+    "`poli` is a library for registering black box optimization functions, with a special focus on *discrete* sequence optimization. It stands for *Protein Optimization Library*, since some of the work done on drug design is done through representing proteins as discrete sequences, or sentences of amino acids.\n",
+    "\n",
+    "We also build `poli-baselines` on top, allowing you to define black box optimization algorithms for discrete sequences.\n",
+    "\n",
+    "These next chapters detail a basic example of how to use `poli` and `poli-baselines`. If you want to start coding now, continue to [the next chapter](./registering_an_objective_function.md)!\n",
+    "\n",
+    "After these, feel free to dive deeper into how `poli` works underneath in [the chapter about the details](./diving_deeper.md).\n",
+    "\n",
+    "The rest of this intro details the usual development loops we assume you'll follow when using `poli` and `poli-baselines`:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The usual development loop"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Black-box optimization algorithms inside `poli-baselines` are treated as **solvers** of **problems** defined using `poli`.\n",
+    "\n",
+    "We propose to you the following process for using `poli-baselines`' optimizers, or developing your own:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Identify the objective function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start by identify the black-box objective function you want to optimize, and check if it's already registered in `poli`, or available in `poli`'s objective repository.\n",
+    "\n",
+    "This can be done by running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['white_noise']\n"
+     ]
+    }
+   ],
+   "source": [
+    "from poli.core.registry import get_problems\n",
+    "print(get_problems())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The output is a list of registered problems. If the function that you're interested in is not registered, you can check whether we have it in `poli`'s internal repository:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['aloha', 'super_mario_bros', 'white_noise']\n"
+     ]
+    }
+   ],
+   "source": [
+    "from poli.objective_repository import AVAILABLE_OBJECTIVES\n",
+    "print(AVAILABLE_OBJECTIVES)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If the function isn't there, **implement it yourself!** An example of how to do this can be found in `poli_baselines/examples/00_a_simple_objective_function_registration`, or in our chapter on [registering optimization functions](./registering_an_objective_function.md).\n",
+    "\n",
+    "In what follows, we will use the `white_noise` objective function. You could drop-in your own function if desired."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from poli import objective_factory\n",
+    "\n",
+    "problem_info, f, x0, y0, _ = objective_factory.create(\n",
+    "    name=\"white_noise\",\n",
+    "    force_register=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "At this point, you can call `f` on arrays of shape `[b, L]`. In the specific case of `white_noise`, `L` can be any positive integer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using a solver, or creating your own"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`poli-baselines` also comes with black-box optimizers out-of-the-box. You can find them inside the library.\n",
+    "\n",
+    "For example, let's use the `RandomMutation` solver, which takes the initial `x0` and randomly mutates it according to the alphabet provided in `problem_info`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "x0: [['1' '2' '3']]\n",
+      "y0: [[1.58015034]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from poli_baselines.solvers.simple.random_mutation import RandomMutation\n",
+    "\n",
+    "solver = RandomMutation(\n",
+    "    black_box=f,\n",
+    "    x0=x0,\n",
+    "    y0=y0,\n",
+    "    alphabet=problem_info.alphabet\n",
+    ")\n",
+    "\n",
+    "print(f\"x0: {x0}\")\n",
+    "print(f\"y0: {y0}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Solvers implement a `next_candidate()` method, based on their history:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'x': [array([['1', '2', '3']], dtype='<U1')], 'y': [array([[1.58015034]])]}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([['1', '3', '3']], dtype='<U1')"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "print(solver.history)\n",
+    "solver.next_candidate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`RandomMutation` simply selects one token at random from the alphabet:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'0': 0,\n",
+       " '1': 1,\n",
+       " '2': 2,\n",
+       " '3': 3,\n",
+       " '4': 4,\n",
+       " '5': 5,\n",
+       " '6': 6,\n",
+       " '7': 7,\n",
+       " '8': 8,\n",
+       " '9': 9}"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[alphabet_symbol for alphabet_symbol in solver.alphabet.keys()]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**If you are interested in building your own solver**, check out [the chapter detailing how `RandomMutation` is implemented](./defining_a_problem_solver.md)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Optimizing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you have a black box objective function `f` and a solver on top, the optimization is quite easy:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "solver.solve(max_iter=100)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[['0' '2' '3']]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(solver.get_best_solution())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Of course, this example is trivial. We dive deeper in the next chapters."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This chapter discusses the usual development loop using `poli` and `poli-baselines`:\n",
+    "1. Start by identifying/building your objective function,\n",
+    "2. continue by creating/using a solver in `poli_baselines`, and\n",
+    "3. use the `solve` method to run a number of iterations from the solver.\n",
+    "\n",
+    "The next three chapters talk about another trivial example, diving deeper in the process of defining your own objective functions and solvers. You can continue there, or by checking [the currently implemented repository of objective functions inside `poli` TODO: ADD]()."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/protein-optimization/using_poli/the_basics/intro_to_poli.md b/docs/protein-optimization/using_poli/the_basics/intro_to_poli.md
diff --git a/readme.md b/readme.md