Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ and how to implement new MDPs and new algorithms.
RL2 <user/algo_rl2>
SAC <user/algo_sac>
TD3 <user/algo_td3>
TEPPO <user/algo_teppo>
TRPO <user/algo_trpo>
REINFORCE <user/algo_vpg>

Expand Down
76 changes: 76 additions & 0 deletions docs/user/algo_teppo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Proximal Policy Optimization with Task Embedding (TEPPO)


```eval_rst
.. list-table::
:header-rows: 0
:stub-columns: 1
:widths: auto

* - **Paper**
- Learning Skill Embeddings for Transferable Robot Skills :cite:`hausman2018learning`
* - **Framework(s)**
- .. figure:: ./images/tf.png
:scale: 20%
:class: no-scaled-link

Tensorflow
* - **API Reference**
- `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_
* - **Code**
- `garage/tf/algos/te_ppo.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
* - **Examples**
- :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
```

Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space.

## Default Parameters

```py
discount=0.99,
gae_lambda=0.98,
lr_clip_range=0.01,
max_kl_step=0.01,
policy_ent_coeff=1e-3,
encoder_ent_coeff=1e-3,
inference_ce_coeff=1e-3
```

## Examples

### te_ppo_metaworld_mt1_push

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
```

### te_ppo_metaworld_mt10

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py
```

### te_ppo_metaworld_mt50

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py
```

### te_ppo_point

```eval_rst
.. literalinclude:: ../../examples/tf/te_ppo_point.py
```

## References

```eval_rst
.. bibliography:: references.bib
:style: unsrt
:filter: docname in docnames
```

----

*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).*
9 changes: 9 additions & 0 deletions docs/user/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@ @article{yu2019metaworld
journal={arXiv:1910.10897},
}

@inproceedings{hausman2018learning,
title={Learning an Embedding Space for Transferable Robot Skills},
author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller},
booktitle={International Conference on Learning Representations},
year={2018},
journal={},
url={https://openreview.net/forum?id=rk07ZXZRb},
}

@article{lillicrap2015continuous,
title={Continuous control with deep reinforcement learning},
author={Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
Expand Down