Skip to content

Commit 3dfd852

Browse files
authored
Hotfix 0.3.0c (#618)
Fixes the following issues: * Missing component reference in BananaRL environment. * Neural Network for multiple visual observations was not properly generated. * Episode time-out value estimate bootstrapping used incorrect observation as input.
1 parent 0be36ec commit 3dfd852

File tree

9 files changed

+67
-41
lines changed

9 files changed

+67
-41
lines changed

docs/Getting-Started-with-Balance-Ball.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ on the same graph.
269269

270270
To summarize, go to your command line, enter the `ml-agents` directory and type:
271271

272-
```python
272+
```
273273
python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train
274274
```
275275
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.

docs/Learning-Environment-Examples.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -154,15 +154,15 @@ If you would like to contribute environments, please see our
154154
![Banana](images/banana.png)
155155

156156
* Set-up: A multi-agent environment where agents compete to collect bananas.
157-
* Goal: The agents must learn to move to as many yellow bananas as possible while avoiding red bananas.
158-
* Agents: The environment contains 10 agents linked to a single brain.
157+
* Goal: The agents must learn to move to as many yellow bananas as possible while avoiding blue bananas.
158+
* Agents: The environment contains 5 agents linked to a single brain.
159159
* Agent Reward Function (independent):
160160
* +1 for interaction with yellow banana
161-
* -1 for interaction with red banana.
161+
* -1 for interaction with blue banana.
162162
* Brains: One brain with the following observation/action space.
163-
* Vector Observation space: (Continuous) 51 corresponding to velocity of agent, plus ray-based perception of objects around agent's forward direction.
163+
* Vector Observation space: (Continuous) 53 corresponding to velocity of agent (2), whether agent is frozen and/or shot its laser (2), plus ray-based perception of objects around agent's forward direction (49; 7 raycast angles with 7 measurements for each).
164164
* Vector Action space: (Continuous) Size of 3, corresponding to forward movement, y-axis rotation, and whether to use laser to disable other agents.
165-
* Visual Observations (Optional): First-person view for each agent.
165+
* Visual Observations (Optional; None by default): First-person view for each agent.
166166
* Reset Parameters: None
167167

168168
## Hallway

python/trainer_config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,12 @@ GoalieBrain:
8080

8181
Ball3DBrain:
8282
normalize: true
83+
batch_size: 1200
84+
buffer_size: 12000
85+
summary_freq: 1000
86+
time_horizon: 1000
87+
gamma: 0.995
88+
beta: 0.001
8389

8490
BouncerBrain:
8591
normalize: true

python/unitytrainers/bc/trainer.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,13 +229,14 @@ def add_experiences(self, curr_info: AllBrainInfo, next_info: AllBrainInfo, take
229229
self.episode_steps[agent_id] = 0
230230
self.episode_steps[agent_id] += 1
231231

232-
def process_experiences(self, info: AllBrainInfo):
232+
def process_experiences(self, current_info: AllBrainInfo, next_info: AllBrainInfo):
233233
"""
234234
Checks agent histories for processing condition, and processes them as necessary.
235235
Processing involves calculating value and advantage targets for model updating step.
236-
:param info: Current AllBrainInfo
236+
:param current_info: Current AllBrainInfo
237+
:param next_info: Next AllBrainInfo
237238
"""
238-
info_teacher = info[self.brain_to_imitate]
239+
info_teacher = next_info[self.brain_to_imitate]
239240
for l in range(len(info_teacher.agents)):
240241
if ((info_teacher.local_done[l] or
241242
len(self.training_buffer[info_teacher.agents[l]]['actions']) > self.trainer_parameters[
@@ -246,7 +247,7 @@ def process_experiences(self, info: AllBrainInfo):
246247
training_length=self.sequence_length)
247248
self.training_buffer[agent_id].reset_agent()
248249

249-
info_student = info[self.brain_name]
250+
info_student = next_info[self.brain_name]
250251
for l in range(len(info_student.agents)):
251252
if info_student.local_done[l]:
252253
agent_id = info_student.agents[l]

python/unitytrainers/models.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,16 @@ def create_continuous_state_encoder(self, h_size, activation, num_layers):
8080
kernel_initializer=c_layers.variance_scaling_initializer(1.0))
8181
return hidden
8282

83-
def create_visual_encoder(self, h_size, activation, num_layers):
83+
def create_visual_encoder(self, image_input, h_size, activation, num_layers):
8484
"""
8585
Builds a set of visual (CNN) encoders.
86+
:param image_input: The placeholder for the image input to use.
8687
:param h_size: Hidden layer size.
8788
:param activation: What type of activation function to use for layers.
8889
:param num_layers: number of hidden layers to create.
8990
:return: List of hidden layer tensors.
9091
"""
91-
conv1 = tf.layers.conv2d(self.visual_in[-1], 16, kernel_size=[8, 8], strides=[4, 4],
92+
conv1 = tf.layers.conv2d(image_input, 16, kernel_size=[8, 8], strides=[4, 4],
9293
activation=tf.nn.elu)
9394
conv2 = tf.layers.conv2d(conv1, 32, kernel_size=[4, 4], strides=[2, 2],
9495
activation=tf.nn.elu)
@@ -136,7 +137,7 @@ def create_new_obs(self, num_streams, h_size, num_layers):
136137
hidden_state, hidden_visual = None, None
137138
if brain.number_visual_observations > 0:
138139
for j in range(brain.number_visual_observations):
139-
encoded_visual = self.create_visual_encoder(h_size, activation_fn, num_layers)
140+
encoded_visual = self.create_visual_encoder(self.visual_in[j], h_size, activation_fn, num_layers)
140141
visual_encoders.append(encoded_visual)
141142
hidden_visual = tf.concat(visual_encoders, axis=1)
142143
if brain.vector_observation_space_size > 0:

python/unitytrainers/ppo/trainer.py

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -260,34 +260,39 @@ def add_experiences(self, curr_all_info: AllBrainInfo, next_all_info: AllBrainIn
260260
self.episode_steps[agent_id] = 0
261261
self.episode_steps[agent_id] += 1
262262

263-
264-
def process_experiences(self, all_info: AllBrainInfo):
263+
def process_experiences(self, current_info: AllBrainInfo, new_info: AllBrainInfo):
265264
"""
266265
Checks agent histories for processing condition, and processes them as necessary.
267266
Processing involves calculating value and advantage targets for model updating step.
268-
:param all_info: Dictionary of all current brains and corresponding BrainInfo.
267+
:param current_info: Dictionary of all current brains and corresponding BrainInfo.
268+
:param new_info: Dictionary of all next brains and corresponding BrainInfo.
269269
"""
270270

271-
info = all_info[self.brain_name]
271+
info = new_info[self.brain_name]
272+
last_info = current_info[self.brain_name]
272273
for l in range(len(info.agents)):
273274
agent_actions = self.training_buffer[info.agents[l]]['actions']
274275
if ((info.local_done[l] or len(agent_actions) > self.trainer_parameters['time_horizon'])
275276
and len(agent_actions) > 0):
276277
if info.local_done[l] and not info.max_reached[l]:
277278
value_next = 0.0
278279
else:
279-
feed_dict = {self.model.batch_size: len(info.vector_observations), self.model.sequence_length: 1}
280+
if info.max_reached[l]:
281+
bootstrapping_info = last_info
282+
else:
283+
bootstrapping_info = info
284+
feed_dict = {self.model.batch_size: len(bootstrapping_info.vector_observations), self.model.sequence_length: 1}
280285
if self.use_observations:
281-
for i in range(len(info.visual_observations)):
282-
feed_dict[self.model.visual_in[i]] = info.visual_observations[i]
286+
for i in range(len(bootstrapping_info.visual_observations)):
287+
feed_dict[self.model.visual_in[i]] = bootstrapping_info.visual_observations[i]
283288
if self.use_states:
284-
feed_dict[self.model.vector_in] = info.vector_observations
289+
feed_dict[self.model.vector_in] = bootstrapping_info.vector_observations
285290
if self.use_recurrent:
286-
if info.memories.shape[1] == 0:
287-
info.memories = np.zeros((len(info.vector_observations), self.m_size))
288-
feed_dict[self.model.memory_in] = info.memories
291+
if bootstrapping_info.memories.shape[1] == 0:
292+
bootstrapping_info.memories = np.zeros((len(bootstrapping_info.vector_observations), self.m_size))
293+
feed_dict[self.model.memory_in] = bootstrapping_info.memories
289294
if not self.is_continuous_action and self.use_recurrent:
290-
feed_dict[self.model.prev_action] = np.reshape(info.previous_vector_actions, [-1])
295+
feed_dict[self.model.prev_action] = np.reshape(bootstrapping_info.previous_vector_actions, [-1])
291296
value_next = self.sess.run(self.model.value, feed_dict)[l]
292297
agent_id = info.agents[l]
293298

python/unitytrainers/trainer.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,11 +103,12 @@ def add_experiences(self, curr_info: AllBrainInfo, next_info: AllBrainInfo, take
103103
"""
104104
raise UnityTrainerException("The add_experiences method was not implemented.")
105105

106-
def process_experiences(self, info: AllBrainInfo):
106+
def process_experiences(self, current_info: AllBrainInfo, next_info: AllBrainInfo):
107107
"""
108108
Checks agent histories for processing condition, and processes them as necessary.
109109
Processing involves calculating value and advantage targets for model updating step.
110-
:param info: Dictionary of all current brains and corresponding BrainInfo.
110+
:param current_info: Dictionary of all current-step brains and corresponding BrainInfo.
111+
:param next_info: Dictionary of all next-step brains and corresponding BrainInfo.
111112
"""
112113
raise UnityTrainerException("The process_experiences method was not implemented.")
113114

python/unitytrainers/trainer_controller.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -253,13 +253,11 @@ def start_learning(self):
253253

254254
for brain_name, trainer in self.trainers.items():
255255
trainer.add_experiences(curr_info, new_info, take_action_outputs[brain_name])
256-
curr_info = new_info
257-
for brain_name, trainer in self.trainers.items():
258-
trainer.process_experiences(curr_info)
256+
trainer.process_experiences(curr_info, new_info)
259257
if trainer.is_ready_update() and self.train_model and trainer.get_step <= trainer.get_max_steps:
260258
# Perform gradient descent with experience buffer
261259
trainer.update_model()
262-
# Write training statistics to tensorboard.
260+
# Write training statistics to Tensorboard.
263261
trainer.write_summary(self.env.curriculum.lesson_number)
264262
if self.train_model and trainer.get_step <= trainer.get_max_steps:
265263
trainer.increment_step()
@@ -269,7 +267,7 @@ def start_learning(self):
269267
if global_step % self.save_freq == 0 and global_step != 0 and self.train_model:
270268
# Save Tensorflow model
271269
self._save_model(sess, steps=global_step, saver=saver)
272-
270+
curr_info = new_info
273271
# Final save Tensorflow model
274272
if global_step != 0 and self.train_model:
275273
self._save_model(sess, steps=global_step, saver=saver)

unity-environment/Assets/ML-Agents/Examples/BananaCollectors/BananaRL.unity

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ OcclusionCullingSettings:
1313
--- !u!104 &2
1414
RenderSettings:
1515
m_ObjectHideFlags: 0
16-
serializedVersion: 9
16+
serializedVersion: 8
1717
m_Fog: 0
1818
m_FogColor: {r: 0.5, g: 0.5, b: 0.5, a: 1}
1919
m_FogMode: 3
@@ -39,12 +39,11 @@ RenderSettings:
3939
m_CustomReflection: {fileID: 0}
4040
m_Sun: {fileID: 0}
4141
m_IndirectSpecularColor: {r: 0, g: 0, b: 0, a: 1}
42-
m_UseRadianceAmbientProbe: 0
4342
--- !u!157 &3
4443
LightmapSettings:
4544
m_ObjectHideFlags: 0
4645
serializedVersion: 11
47-
m_GIWorkflowMode: 0
46+
m_GIWorkflowMode: 1
4847
m_GISettings:
4948
serializedVersion: 2
5049
m_BounceScale: 1
@@ -55,10 +54,11 @@ LightmapSettings:
5554
m_EnableBakedLightmaps: 1
5655
m_EnableRealtimeLightmaps: 1
5756
m_LightmapEditorSettings:
58-
serializedVersion: 10
57+
serializedVersion: 9
5958
m_Resolution: 2
6059
m_BakeResolution: 40
61-
m_AtlasSize: 1024
60+
m_TextureWidth: 1024
61+
m_TextureHeight: 1024
6262
m_AO: 1
6363
m_AOMaxDistance: 1
6464
m_CompAOExponent: 1
@@ -678,8 +678,13 @@ Prefab:
678678
objectReference: {fileID: 0}
679679
- target: {fileID: 1819751139121548, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
680680
propertyPath: m_IsActive
681-
value: 0
681+
value: 1
682682
objectReference: {fileID: 0}
683+
- target: {fileID: 114508049814297234, guid: 38400a68c4ea54b52998e34ee238d1a7,
684+
type: 2}
685+
propertyPath: myAcademyObj
686+
value:
687+
objectReference: {fileID: 1574236047}
683688
m_RemovedComponents: []
684689
m_ParentPrefab: {fileID: 100100000, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
685690
m_IsPrefabParent: 0
@@ -776,8 +781,13 @@ Prefab:
776781
objectReference: {fileID: 0}
777782
- target: {fileID: 1819751139121548, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
778783
propertyPath: m_IsActive
779-
value: 0
784+
value: 1
780785
objectReference: {fileID: 0}
786+
- target: {fileID: 114508049814297234, guid: 38400a68c4ea54b52998e34ee238d1a7,
787+
type: 2}
788+
propertyPath: myAcademyObj
789+
value:
790+
objectReference: {fileID: 1574236047}
781791
m_RemovedComponents: []
782792
m_ParentPrefab: {fileID: 100100000, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
783793
m_IsPrefabParent: 0
@@ -841,7 +851,6 @@ Camera:
841851
m_TargetEye: 3
842852
m_HDR: 1
843853
m_AllowMSAA: 1
844-
m_AllowDynamicResolution: 0
845854
m_ForceIntoRT: 1
846855
m_OcclusionCulling: 1
847856
m_StereoConvergence: 10
@@ -1204,8 +1213,13 @@ Prefab:
12041213
objectReference: {fileID: 0}
12051214
- target: {fileID: 1819751139121548, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
12061215
propertyPath: m_IsActive
1207-
value: 0
1216+
value: 1
12081217
objectReference: {fileID: 0}
1218+
- target: {fileID: 114508049814297234, guid: 38400a68c4ea54b52998e34ee238d1a7,
1219+
type: 2}
1220+
propertyPath: myAcademyObj
1221+
value:
1222+
objectReference: {fileID: 1574236047}
12091223
m_RemovedComponents: []
12101224
m_ParentPrefab: {fileID: 100100000, guid: 38400a68c4ea54b52998e34ee238d1a7, type: 2}
12111225
m_IsPrefabParent: 0

0 commit comments

Comments
 (0)