You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
184
181
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
182
+
| pretraining | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations). | PPO |
183
+
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Training-RewardSignals.md) for configuration options. | PPO |
185
184
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
186
185
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
187
186
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, (online)BC |
188
-
| trainer | The type of training to perform: "ppo" or "imitation". | PPO, BC |
189
-
| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
187
+
| trainer | The type of training to perform: "ppo", "offline_bc" or "online_bc". | PPO, BC |
190
188
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
191
189
190
+
192
191
\*PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation)
193
192
194
193
For specific advice on setting hyperparameters based on the type of training you
0 commit comments