Skip to content

Commit b0f3bfe

Browse files
authored
fix the training doc (#1193)
1 parent d907223 commit b0f3bfe

File tree

2 files changed

+9
-7
lines changed

2 files changed

+9
-7
lines changed

docs/Training-ML-Agents.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -162,20 +162,20 @@ Sections for the example environments are included in the provided config file.
162162
| :-- | :-- | :-- |
163163
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
164164
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
165-
| beta | The strength of entropy regularization.| PPO, BC |
165+
| beta | The strength of entropy regularization.| PPO |
166166
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
167167
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
168168
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
169169
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
170-
| epsilon | Influences how rapidly the policy can evolve during training.| PPO, BC |
170+
| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
171171
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |
172172
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
173173
| lambd | The regularization parameter. | PPO |
174174
| learning_rate | The initial learning rate for gradient descent. | PPO, BC |
175175
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
176176
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
177-
| normalize | Whether to automatically normalize observations. | PPO, BC |
178-
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO, BC |
177+
| normalize | Whether to automatically normalize observations. | PPO |
178+
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
179179
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
180180
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
181181
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |

ml-agents/mlagents/trainers/bc/trainer.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,11 @@ def __init__(self, sess, brain, trainer_parameters, training, seed, run_id):
2828
"""
2929
super(BehavioralCloningTrainer, self).__init__(sess, brain, trainer_parameters, training, run_id)
3030

31-
self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon', 'graph_scope',
32-
'summary_freq', 'max_steps', 'batches_per_epoch', 'use_recurrent', 'hidden_units',
33-
'num_layers', 'sequence_length', 'memory_size']
31+
self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon',
32+
'graph_scope', 'summary_freq', 'max_steps',
33+
'batches_per_epoch', 'use_recurrent',
34+
'hidden_units','learning_rate', 'num_layers',
35+
'sequence_length', 'memory_size']
3436

3537
for k in self.param_keys:
3638
if k not in trainer_parameters:

0 commit comments

Comments
 (0)