update docs

jakegrigsby · jakegrigsby · commit b3856dca4ff5 · 2025-07-05T15:00:59.000-05:00
diff --git a/README.md b/README.md
@@ -254,7 +254,16 @@ Offline RL on the (original) [D4RL](https://arxiv.org/pdf/2004.07219) datasets.
 
 [Example `wandb`](https://wandb.ai/jakegrigsby/amago-v3-reference/runs/9ab15rr8)
 
+<br>
+
+### **15. Human-Level Competitive Pokémon: Metamon**
+[`metamon`](https://github.com/UT-Austin-RPL/metamon) used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data. 
+
+<img src="docs/media/metamon_icon.png" alt="metamon_diagram" width="100" align="left" />
 
+Check out our [project website](https://metamon.tech)!
+
+<br>
 
 <br>
 
diff --git a/amago/agent.py b/amago/agent.py
@@ -196,6 +196,7 @@ def V(state, critic, action_dist, k) -> float:
         use_multigamma: If True, train on multiple discount horizons (:py:class:`~amago.agent.Multigammas`) in parallel. Defaults to True.
         actor_type: Actor MLP head for producing action distributions. Defaults to :py:class:`~amago.nets.actor_critic.Actor`.
         critic_type: Critic MLP head for producing Q-values. Defaults to :py:class:`~amago.nets.actor_critic.NCritics`.
+        pass_obs_keys_to_actor: List of keys from the observation space to pass directly to the actor network's forward pass if needed for some reason (e.g., for masking actions). Defaults to None.
     """
 
     def __init__(
diff --git a/docs/examples/index.rst b/docs/examples/index.rst
@@ -233,6 +233,21 @@ Offline RL on the (original) `D4RL <https://arxiv.org/pdf/2004.07219>`_ datasets
 
 `Example wandb <https://wandb.ai/jakegrigsby/amago-v3-reference/runs/9ab15rr8>`_
 
+
+15. Human-Level Competitive Pokémon: Metamon
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. image:: ../media/metamon_icon.png``
+   :alt: metamon_diagram
+   :width: 100
+   :align: left
+
+`metamon <https://github.com/UT-Austin-RPL/metamon>`_ used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data. 
+
+Check out our `project website <https://metamon.tech>`_!
+
+|
+
 |
 |
 |
diff --git a/docs/media/metamon_icon.png b/docs/media/metamon_icon.png
diff --git a/docs/tutorial/async.rst b/docs/tutorial/async.rst
@@ -36,9 +36,9 @@ And that's it! Let's say our ``Experiment.parallel_actors=32``, ``Experiment.tra
 Asynchronous Training/Rollouts
 --------------------------------
 
-Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
+There is rough support for completely asynchronous training/collection with an arbitrary number of processes. Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
 
-After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). This leads to a very hacky but fun way to add extra data collection or do training/learning asychronously. For example, we can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want. All we need to do is make sure the ``dset_root``, ``dset_name``, ``run_name`` are the same (so that all the experiments are working from the same directory), and the network architecture settings are the same (so that checkpoints load correctly). For example:
+After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). We can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want.
 
 .. code-block:: python
 
@@ -49,21 +49,11 @@ After we create an ``experiment = Experiment()``, but before ``experiment.start(
     parser = ArgumentParser()
     parser.add_argument("--mode", options=["learn", "collect", "both"])
     args = parser.parse_args()
-
     config = {
         ...
     }
     use_config(config)
-
-    experiment = Experiment(
-        dset_root="~/amago_dsets",
-        dset_name="agi_training_data",
-        run_name="v1",
-        tstep_encoder_type=FFTstepEncoder,
-        traj_encoder_type=TformerTrajEncoder,
-        agent_type=MultiTaskAgent,
-        ...
-    )
+    experiment = Experiment(...)
     switch_async_mode(experiment, args.mode)
     experiment.start()
     experiment.learn()