HumanCompatibleAI · ernestum · Jul 14, 2023 · Jul 4, 2023 · Jul 5, 2023 · Jul 5, 2023
diff --git a/docs/getting-started/cli.rst b/docs/getting-started/cli.rst
@@ -0,0 +1,166 @@
+======================
+Command Line Interface
+======================
+
+Many features of the core library are accessible via the command line interface built
+using the `Sacred <https://github.com/idsia/sacred>`_ package.
+
+Sacred is used to configure and run the algorithms.
+It is centered around the concept of `experiments <https://sacred.readthedocs.io/en/stable/experiment.html>`_
+which are composed of reusable `ingredients <https://sacred.readthedocs.io/en/stable/ingredients.html>`_.
+Each experiment and each ingredient has its own configuration namespace.
+Named configurations are used to specify a coherent set of configuration values.
+It is recommended to at least read the
+`Sacred documentation about the command line interface <https://sacred.readthedocs.io/en/stable/command_line.html>`_.
+
+The :py:mod:`scripts <imitation.scripts>` package contains a number of sacred experiments to either execute algorithms or perform utility tasks.
+The most important :py:mod:`ingredients <imitation.scripts.ingredients>` for imitation learning are:
+
+- :py:mod:`Environments <imitation.scripts.ingredients.environment>`
+- :py:mod:`Expert Policies <imitation.scripts.ingredients.expert>`
+- :py:mod:`Expert Demonstrations <imitation.scripts.ingredients.demonstrations>`
+- :py:mod:`Reward Functions <imitation.scripts.ingredients.reward>`
+
+
+Usage Examples
+==============
+
+Here we demonstrate some usage examples for the command line interface.
+You can always find out all the configurable values by running:
+
+.. code-block:: bash
+
+    python -m imitation.scripts.<script> print_config
+
+Run BC on the ``CartPole-v1`` environment with a pre-trained PPO policy as expert:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. note:: Here the cartpole environment is specified via a named configuration.
+
+.. code-block:: bash
+
+    python -m imitation.scripts.train_imitation bc with \
+        cartpole \
+        demonstrations.n_expert_demos=50 \
+        bc.train_kwargs.n_batches=2000 \
+        expert.policy_type=ppo \
+        expert.loader_kwargs.path=tests/testdata/expert_models/cartpole_0/policies/final/model.zip
+
+50 expert demonstrations are sampled from the PPO policy that is included in the testdata folder.
+2000 batches are enough to train a good policy.
+
+Run DAgger on the ``CartPole-v0`` environment with a random policy as expert:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    python -m imitation.scripts.train_imitation dagger with \
+        cartpole \
+        dagger.total_timesteps=2000 \
+        demonstrations.n_expert_demos=10 \
+        expert.policy_type=random
+
+This will not produce any meaningful results, since a random policy is not a good expert.
+
+
+Run AIRL on the ``MountainCar-v0`` environment with a expert from the HuggingFace model hub:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    python -m imitation.scripts.train_adversarial airl with \
+        seals_mountain_car \
+        total_timesteps=5000 \
+        expert.policy_type=ppo-huggingface \
+        demonstrations.n_expert_demos=500
+
+.. note:: The small number of total timesteps is only for demonstration purposes and will not produce a good policy.
+
+
+Run GAIL on the ``seals/Swimmer-v0`` environment
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Here we do not use the named configuration for the seals environment, but instead specify the gym_id directly.
+The ``seals:`` prefix ensures that the seals package is imported and the environment is registered.
+
+.. note:: The Swimmer environment needs `mujoco_py` to be installed.
+
+.. code-block:: bash
+
+    python -m imitation.scripts.train_adversarial gail with \
+            environment.gym_id="seals:seals/Swimmer-v0" \
+            total_timesteps=5000 \
+            demonstrations.n_expert_demos=50
+
+
+Algorithm Scripts
+=================
+
+Call the algorithm scripts like this:
+
+.. code-block:: bash
+
+    python -m imitation.scripts.<script> [command] with <named_config> <config_values>
+
++---------------------------------+------------------------------+----------+
+|  algorithm                      | script                       |  command |
++=================================+==============================+==========+
+| BC                              | train_imitation              |  bc      |
++---------------------------------+------------------------------+----------+
+| DAgger                          | train_imitation              |  dagger  |
++---------------------------------+------------------------------+----------+
+| AIRL                            | train_adversarial            |  airl    |
++---------------------------------+------------------------------+----------+
+| GAIL                            | train_adversarial            |  gail    |
++---------------------------------+------------------------------+----------+
+| Preference Comparison           | train_preference_comparisons |  -       |
++---------------------------------+------------------------------+----------+
+| MCE IRL                         | none                         |  -       |
++---------------------------------+------------------------------+----------+
+| Density Based Reward Estimation | none                         |  -       |
++---------------------------------+------------------------------+----------+
+
+Utility Scripts
+===============
+
+Call the utility scripts like this:
+
+.. code-block:: bash
+
+    python -m imitation.scripts.<script>
+
++-----------------------------------------+-----------------------------------------------------------+
+| Functionality                           | Script                                                    |
++=========================================+===========================================================+
+| Reinforcement Learning                  | :py:mod:`train_rl <imitation.scripts.train_rl>`           |
++-----------------------------------------+-----------------------------------------------------------+
+| Evaluating a Policy                     | :py:mod:`eval_policy <imitation.scripts.eval_policy>`     |
++-----------------------------------------+-----------------------------------------------------------+
+| Parallel Execution of Algorithm Scripts | :py:mod:`parallel <imitation.scripts.parallel>`           |
++-----------------------------------------+-----------------------------------------------------------+
+| Converting Trajectory Formats           | :py:mod:`convert_trajs <imitation.scripts.convert_trajs>` |
++-----------------------------------------+-----------------------------------------------------------+
+| Analyzing Experimental Results          | :py:mod:`analyze <imitation.scripts.analyze>`             |
++-----------------------------------------+-----------------------------------------------------------+
+
+
+Output Directories
+==================
+
+The results of the script runs are stored in the following directory structure:
+
+.. code-block::
+
+    output
+    ├── <algo>
+    │   └── <environment>
+    │       └── <timestamp>
+    │           ├── log
+    │           ├── monitor
+    │           └── sacred -> ../../../sacred/<script_name>/1
+    └── sacred
+        └── <script_name>
+            ├── 1
+            └── _sources
+
+It contains the final model, tensorboard logs, sacred logs and the sacred source files.
diff --git a/docs/index.rst b/docs/index.rst
@@ -47,6 +47,7 @@ If you use ``imitation`` in your research project, please cite our paper to help
    getting-started/what-is-imitation
    getting-started/variable-horizon
    getting-started/first-steps
+   getting-started/cli
 
 .. toctree::
    :maxdepth: 2

diff --git a/src/imitation/scripts/ingredients/__init__.py b/src/imitation/scripts/ingredients/__init__.py
@@ -1 +1 @@
-"""Ingredients for scripts."""
+"""Ingredients for Sacred experiments."""
diff --git a/src/imitation/scripts/ingredients/bc.py b/src/imitation/scripts/ingredients/bc.py
@@ -1,4 +1,7 @@
-"""Ingredients for training a BC policy."""
+"""This ingredient provides BC algorithm instance.
+
+It is either loaded from disk or constructed from scratch.
+"""
 import warnings
 from typing import Optional, Sequence
 

diff --git a/src/imitation/scripts/ingredients/demonstrations.py b/src/imitation/scripts/ingredients/demonstrations.py
@@ -1,4 +1,8 @@
-"""Ingredient for scripts learning from demonstrations."""
+"""This ingredient provides (expert) demonstrations to learn from.
+
+The demonstrations are either loaded from disk, from the HuggingFace Dataset Hub, or
+sampled from the expert policy provided by the expert ingredient.
+"""
 
 import logging
 from typing import Any, Dict, Optional, Sequence

diff --git a/src/imitation/scripts/ingredients/environment.py b/src/imitation/scripts/ingredients/environment.py
@@ -1,4 +1,4 @@
-"""Environment Ingredient for sacred experiments."""
+"""This ingredient provides a vectorized gym environment."""
 import contextlib
 from typing import Any, Generator, Mapping
 

diff --git a/src/imitation/scripts/ingredients/expert.py b/src/imitation/scripts/ingredients/expert.py
@@ -1,4 +1,19 @@
-"""Common configuration elements for loading of expert policies."""
+"""This ingredient provides an expert policy.
+
+The expert policy is either loaded from disk or from the HuggingFace Model Hub or is
+a test policy (e.g., random or zero).
+The supported policy types are:
+
+- :code:`ppo` and :code:`sac`: A policy trained with SB3.
+    Needs a `path` in the `loader_kwargs`.
+- :code:`<algo>-huggingface` (algo can be `ppo` or `sac`):
+    A policy trained with SB3 and uploaded to the HuggingFace Model Hub.
+    Will load the model from the repo :code:`<organization>/<algo>-<env_name>`.
+    You can set the organization with the `organization` key in :code:`loader_kwargs`.
+    The default is `HumanCompatibleAI`.
+- :code:`random`: A policy that takes random actions.
+- :code:`zero`: A policy that takes zero actions.
+"""
 import sacred
 
 from imitation.policies import serialize

diff --git a/src/imitation/scripts/ingredients/logging.py b/src/imitation/scripts/ingredients/logging.py
@@ -1,4 +1,8 @@
-"""Logging ingredient for scripts."""
+"""This ingredient provides a number of logging utilities.
+
+It is responsible for logging to WandB, TensorBoard, and stdout.
+It will also create a symlink to the sacred logging directory in the log directory.
+"""
 
 import logging
 import pathlib

diff --git a/src/imitation/scripts/ingredients/policy.py b/src/imitation/scripts/ingredients/policy.py
@@ -1,4 +1,4 @@
-"""Ingredient implementation for a SB3 policy."""
+"""This ingredient provides a newly constructed stable-baselines3 policy."""
 
 import logging
 from typing import Any, Mapping, Type

diff --git a/src/imitation/scripts/ingredients/policy_evaluation.py b/src/imitation/scripts/ingredients/policy_evaluation.py
@@ -1,4 +1,8 @@
-"""Sacred ingredient for evaluating a policy on a VecEnv."""
+"""This ingredient performs evaluation of learned policy.
+
+It takes care of the right wrappers, does some rollouts
+and computes statistics of the rollouts.
+"""
 
 from typing import Mapping, Union
 

diff --git a/src/imitation/scripts/ingredients/reward.py b/src/imitation/scripts/ingredients/reward.py
@@ -1,4 +1,4 @@
-"""Common configuration elements for reward network training."""
+"""This ingredient provides a reward network."""
 
 import logging
 import typing

diff --git a/src/imitation/scripts/ingredients/rl.py b/src/imitation/scripts/ingredients/rl.py
@@ -1,4 +1,7 @@
-"""Common configuration elements for reinforcement learning."""
+"""This ingredient provides a reinforcement learning algorithm from stable-baselines3.
+
+The algorithm instance is either freshly constructed or loaded from a file.
+"""
 
 import logging
 import warnings

diff --git a/src/imitation/scripts/ingredients/wb.py b/src/imitation/scripts/ingredients/wb.py
@@ -1,4 +1,4 @@
-"""Weights & Biases configuration elements for scripts."""
+"""This ingredient provides Weights & Biases logging."""
 
 import logging
 from typing import Any, Mapping, Optional
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		"""Ingredients for scripts."""
		"""Ingredients for Sacred experiments."""