File tree Expand file tree Collapse file tree 4 files changed +51
-2
lines changed
Expand file tree Collapse file tree 4 files changed +51
-2
lines changed Original file line number Diff line number Diff line change @@ -62,6 +62,7 @@ and how to implement new MDPs and new algorithms.
6262 user/algo_vpg
6363 user/algo_td3
6464 user/algo_ddpg
65+ user/algo_cem
6566
6667.. toctree::
6768 :maxdepth: 2
Original file line number Diff line number Diff line change 1+ # Cross Entropy Method
2+
3+ ``` eval_rst
4+ +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
5+ | **Paper** | The cross-entropy method: A unified approach to Monte Carlo simulation, randomized optimization and machine learning :cite:`rubinstein2004cross` |
6+ +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
7+ | **Framework(s)** | .. figure:: ./images/numpy.png |
8+ | | :scale: 40% |
9+ | | :class: no-scaled-link |
10+ +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
11+ | **API Reference** | `garage.np.algos.CEM <../_autoapi/garage/np/algos/index.html#garage.np.algos.CEM>`_ |
12+ +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
13+ | **Code** | `garage/np/algos/cem.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/np/algos/cem.py>`_ |
14+ +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
15+ ```
16+
17+ Cross Entropy Method (CEM) works by iteratively optimizing a gaussian
18+ distribution of policy.
19+
20+ In each epoch, CEM does the following:
21+
22+ 1 . Sample n_samples policies from a gaussian distribution of mean cur_mean and
23+ std cur_std.
24+
25+ 2 . Collect episodes for each policy.
26+
27+ 3 . Update cur_mean and cur_std by doing Maximum Likelihood Estimation over the
28+ n_best top policies in terms of return.
29+
30+ ## Examples
31+
32+ ### NumPy
33+
34+ ``` eval_rst
35+ .. literalinclude:: ../../examples/np/cem_cartpole.py
36+ ```
37+
38+ ## References
39+
40+ ``` eval_rst
41+ .. bibliography:: references.bib
42+ :style: unsrt
43+ :filter: docname in docnames
44+ ```
45+
46+ ----
47+
48+ * This page was authored by Ruofu Wang ([ @yeukfu ] ( https://github.com/yeukfu ) ).*
Original file line number Diff line number Diff line change @@ -35,13 +35,13 @@ regularization adds the mean entropy to the surrogate objective. See
3535
3636Garage has implementations of PPO with PyTorch and TensorFlow.
3737
38- ## PyTorch
38+ ### PyTorch
3939
4040``` eval_rst
4141.. literalinclude:: ../../examples/torch/ppo_pendulum.py
4242```
4343
44- ## TensorFlow
44+ ### TensorFlow
4545
4646``` eval_rst
4747.. literalinclude:: ../../examples/tf/ppo_pendulum.py
You can’t perform that action at this time.
0 commit comments