Skip to content

Commit b3856dc

Browse files
committed
update docs
1 parent 7ed412f commit b3856dc

File tree

5 files changed

+28
-13
lines changed

5 files changed

+28
-13
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,16 @@ Offline RL on the (original) [D4RL](https://arxiv.org/pdf/2004.07219) datasets.
254254

255255
[Example `wandb`](https://wandb.ai/jakegrigsby/amago-v3-reference/runs/9ab15rr8)
256256

257+
<br>
258+
259+
### **15. Human-Level Competitive Pokémon: Metamon**
260+
[`metamon`](https://github.com/UT-Austin-RPL/metamon) used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data.
261+
262+
<img src="docs/media/metamon_icon.png" alt="metamon_diagram" width="100" align="left" />
257263

264+
Check out our [project website](https://metamon.tech)!
265+
266+
<br>
258267

259268
<br>
260269

amago/agent.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ def V(state, critic, action_dist, k) -> float:
196196
use_multigamma: If True, train on multiple discount horizons (:py:class:`~amago.agent.Multigammas`) in parallel. Defaults to True.
197197
actor_type: Actor MLP head for producing action distributions. Defaults to :py:class:`~amago.nets.actor_critic.Actor`.
198198
critic_type: Critic MLP head for producing Q-values. Defaults to :py:class:`~amago.nets.actor_critic.NCritics`.
199+
pass_obs_keys_to_actor: List of keys from the observation space to pass directly to the actor network's forward pass if needed for some reason (e.g., for masking actions). Defaults to None.
199200
"""
200201

201202
def __init__(

docs/examples/index.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,21 @@ Offline RL on the (original) `D4RL <https://arxiv.org/pdf/2004.07219>`_ datasets
233233

234234
`Example wandb <https://wandb.ai/jakegrigsby/amago-v3-reference/runs/9ab15rr8>`_
235235

236+
237+
15. Human-Level Competitive Pokémon: Metamon
238+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
239+
240+
.. image:: ../media/metamon_icon.png``
241+
:alt: metamon_diagram
242+
:width: 100
243+
:align: left
244+
245+
`metamon <https://github.com/UT-Austin-RPL/metamon>`_ used `amago` to train top decile agents in Pokémon Showdown from human and self-collected battle data.
246+
247+
Check out our `project website <https://metamon.tech>`_!
248+
249+
|
250+
236251
|
237252
|
238253
|

docs/media/metamon_icon.png

121 KB
Loading

docs/tutorial/async.rst

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ And that's it! Let's say our ``Experiment.parallel_actors=32``, ``Experiment.tra
3636
Asynchronous Training/Rollouts
3737
--------------------------------
3838

39-
Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
39+
There is rough support for completely asynchronous training/collection with an arbitrary number of processes. Each ``epoch`` alternates between rollouts --> gradient updates. AMAGO saves environment data and checkpoints to disk, so changing some :py:class:`~amago.experiment.Experiment`` kwargs would let these two steps be completely separate.
4040

41-
After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). This leads to a very hacky but fun way to add extra data collection or do training/learning asychronously. For example, we can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want. All we need to do is make sure the ``dset_root``, ``dset_name``, ``run_name`` are the same (so that all the experiments are working from the same directory), and the network architecture settings are the same (so that checkpoints load correctly). For example:
41+
After we create an ``experiment = Experiment()``, but before ``experiment.start()``, :py:func:`~amago.cli_utils.switch_async_mode` can override settings to ``"learn"``, ``"collect"`` or do ``"both"`` (the default). We can ``accelerate launch`` a multi-gpu script that only does gradient updates, and collect data for that model to train on with as many collect-only processes as we want.
4242

4343
.. code-block:: python
4444
@@ -49,21 +49,11 @@ After we create an ``experiment = Experiment()``, but before ``experiment.start(
4949
parser = ArgumentParser()
5050
parser.add_argument("--mode", options=["learn", "collect", "both"])
5151
args = parser.parse_args()
52-
5352
config = {
5453
...
5554
}
5655
use_config(config)
57-
58-
experiment = Experiment(
59-
dset_root="~/amago_dsets",
60-
dset_name="agi_training_data",
61-
run_name="v1",
62-
tstep_encoder_type=FFTstepEncoder,
63-
traj_encoder_type=TformerTrajEncoder,
64-
agent_type=MultiTaskAgent,
65-
...
66-
)
56+
experiment = Experiment(...)
6757
switch_async_mode(experiment, args.mode)
6858
experiment.start()
6959
experiment.learn()

0 commit comments

Comments
 (0)