You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[`metamon`](https://github.com/UT-Austin-RPL/metamon) used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data.
use_multigamma: If True, train on multiple discount horizons (:py:class:`~amago.agent.Multigammas`) in parallel. Defaults to True.
197
197
actor_type: Actor MLP head for producing action distributions. Defaults to :py:class:`~amago.nets.actor_critic.Actor`.
198
198
critic_type: Critic MLP head for producing Q-values. Defaults to :py:class:`~amago.nets.actor_critic.NCritics`.
199
+
pass_obs_keys_to_actor: List of keys from the observation space to pass directly to the actor network's forward pass if needed for some reason (e.g., for masking actions). Defaults to None.
`metamon <https://github.com/UT-Austin-RPL/metamon>`_ used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data.
246
+
247
+
Check out our `project website <https://metamon.tech>`_!
Copy file name to clipboardExpand all lines: docs/tutorial/async.rst
+3-13Lines changed: 3 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,9 +36,9 @@ And that's it! Let's say our ``Experiment.parallel_actors=32``, ``Experiment.tra
36
36
Asynchronous Training/Rollouts
37
37
--------------------------------
38
38
39
-
Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
39
+
There is rough support for completely asynchronous training/collection with an arbitrary number of processes. Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
40
40
41
-
After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). This leads to a very hacky but fun way to add extra data collection or do training/learning asychronously. For example, we can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want. All we need to do is make sure the ``dset_root``, ``dset_name``, ``run_name`` are the same (so that all the experiments are working from the same directory), and the network architecture settings are the same (so that checkpoints load correctly). For example:
41
+
After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). We can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want.
42
42
43
43
.. code-block:: python
44
44
@@ -49,21 +49,11 @@ After we create an ``experiment = Experiment()``, but before ``experiment.start(
0 commit comments