fix the training doc (#1193)

xiaomaogy · web-flow · commit b0f3bfe2b502 · 2018-09-06T14:41:51.000-07:00
diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md
@@ -162,20 +162,20 @@ Sections for the example environments are included in the provided config file.
 | :--         | :--             | :--                   |
 | batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
 | batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
-| beta | The strength of entropy regularization.| PPO, BC |
+| beta | The strength of entropy regularization.| PPO |
 | brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
 | buffer_size | The number of experiences to collect before updating the policy model. | PPO |
 | curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
 | curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
-| epsilon | Influences how rapidly the policy can evolve during training.| PPO, BC |
+| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
 | gamma | The reward discount rate for the Generalized Advantage Estimator (GAE).  | PPO  |
 | hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
 | lambd | The regularization parameter. | PPO  |
 | learning_rate | The initial learning rate for gradient descent. | PPO,  BC |
 | max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
 | memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
-| normalize | Whether to automatically normalize observations. | PPO, BC |
-| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO, BC |
+| normalize | Whether to automatically normalize observations. | PPO |
+| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
 | num_layers | The number of hidden layers in the neural network. | PPO, BC |
 | sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
 | summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
diff --git a/ml-agents/mlagents/trainers/bc/trainer.py b/ml-agents/mlagents/trainers/bc/trainer.py
@@ -28,9 +28,11 @@ def __init__(self, sess, brain, trainer_parameters, training, seed, run_id):
         """
         super(BehavioralCloningTrainer, self).__init__(sess, brain, trainer_parameters, training, run_id)
 
-        self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon', 'graph_scope',
-                           'summary_freq', 'max_steps', 'batches_per_epoch', 'use_recurrent', 'hidden_units',
-                           'num_layers', 'sequence_length', 'memory_size']
+        self.param_keys = ['brain_to_imitate', 'batch_size', 'time_horizon',
+                           'graph_scope', 'summary_freq', 'max_steps',
+                           'batches_per_epoch', 'use_recurrent',
+                           'hidden_units','learning_rate', 'num_layers',
+                           'sequence_length', 'memory_size']
 
         for k in self.param_keys:
             if k not in trainer_parameters: