Skip to content

Commit 0b55cb6

Browse files
authored
Fix discrete state (#33)
* made BrainParameters a class to set default values Modified the error message if the state is discrete * Add discrete state support to PPO and provide discrete state example environment * Add flexibility to continuous control as well * Finish PPO flexible model generation implementation * Fix formatting * Support color observations * Add best practices document * bug fix for non square observations * Update Readme.md * Remove scipy dependency * Add installation doc
1 parent faeee16 commit 0b55cb6

24 files changed

+1286
-115
lines changed

docs/Getting-Started-with-Balance-Ball.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ If you are a Windows user who is new to Python/TensorFlow, follow [this guide](h
5151
* numpy
5252
* Pillow
5353
* Python (2 or 3)
54-
* scipy
5554
* TensorFlow (1.0+)
5655

5756
### Installing Dependencies

docs/Readme.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,13 @@
22

33
## Basic
44
* [Unity ML Agents Overview](Unity-Agents-Overview.md)
5+
* [Installation & Set-up](installation.md)
56
* [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
67
* [Example Environments](Example-Environments.md)
78

89
## Advanced
910
* [How to make a new Unity Environment](Making-a-new-Unity-Environment.md)
11+
* [Best practices when designing an Environment](best-practices.md)
1012
* [How to organize the Scene](Organizing-the-Scene.md)
1113
* [How to use the Python API](Unity-Agents---Python-API.md)
1214
* [How to use TensorflowSharp inside Unity [Experimental]](Using-TensorFlow-Sharp-in-Unity-(Experimental).md)

docs/best-practices.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Environment Design Best Practices
2+
3+
## General
4+
* It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase
5+
complexity over time.
6+
* When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.
7+
8+
## Rewards
9+
* The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process.
10+
* Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards.
11+
* For locomotion tasks, a small positive reward (+0.1) for forward progress is typically used.
12+
* If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.1).
13+
14+
## States
15+
* The magnitude of each state variable should be normalized to around 1.0.
16+
* States should include all variables relevant to allowing the agent to take the optimally informed decision.
17+
* Categorical state variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie `3` -> `0, 0, 1`).
18+
19+
## Actions
20+
* When using continuous control, action values should be clipped to an appropriate range.

docs/installation.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Installation & Set-up
2+
3+
## Install **Unity 2017.1** or later (required)
4+
5+
Download link available [here](https://store.unity.com/download?ref=update).
6+
7+
## Clone the repository
8+
Once installed, you will want to clone the Agents GitHub repository. References will be made
9+
throughout to `unity-environment` and `python` directories. Both are located at the root of the repository.
10+
11+
## Installing Python API
12+
In order to train an agent within the framework, you will need to install Python 2 or 3, and the dependencies described below.
13+
14+
### Windows Users
15+
16+
If you are a Windows user who is new to Python/TensorFlow, follow [this guide](https://nitishmutha.github.io/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html) to set up your Python environment.
17+
18+
### Requirements
19+
* Jupyter
20+
* Matplotlib
21+
* numpy
22+
* Pillow
23+
* Python (2 or 3)
24+
* docopt (Training)
25+
* TensorFlow (1.0+) (Training)
26+
27+
### Installing Dependencies
28+
To install dependencies, go into the `python` directory and run (depending on your python version):
29+
30+
`pip install .`
31+
32+
or
33+
34+
`pip3 install .`
35+
36+
If your Python environment doesn't include `pip`, see these [instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers) on installing it.
37+
38+
Once the requirements are successfully installed, the next step is to check out the [Getting Started guide](Getting-Started-with-Balance-Ball.md)
39+
40+
## Installation Help
41+
42+
### Using Jupyter Notebook
43+
44+
For a walkthrough of how to use Jupyter notebook, see [here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html).
45+
46+
### General Issues
47+
48+
If you run into issues while attempting to install and run Unity ML Agents, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Limitations-&-Common-Issues.md) for a list of common issues and solutions.
49+
50+
If you have an issue that isn't covered here, feel free to contact us at [email protected]. Alternatively, feel free to create an issue on the repository.
51+
Be sure to include relevant information on OS, Python version, and exact error message if possible.

python/PPO.ipynb

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
"train_model = True # Whether to train the model.\n",
5050
"summary_freq = 10000 # Frequency at which to save training statistics.\n",
5151
"save_freq = 50000 # Frequency at which to save model.\n",
52-
"env_name = \"simple\" # Name of the training environment file.\n",
52+
"env_name = \"environment\" # Name of the training environment file.\n",
5353
"\n",
5454
"### Algorithm-specific parameters for tuning\n",
5555
"gamma = 0.99 # Reward discount rate.\n",
@@ -74,9 +74,7 @@
7474
{
7575
"cell_type": "code",
7676
"execution_count": null,
77-
"metadata": {
78-
"collapsed": true
79-
},
77+
"metadata": {},
8078
"outputs": [],
8179
"source": [
8280
"env = UnityEnvironment(file_name=env_name)\n",
@@ -95,7 +93,6 @@
9593
"cell_type": "code",
9694
"execution_count": null,
9795
"metadata": {
98-
"collapsed": true,
9996
"scrolled": true
10097
},
10198
"outputs": [],
@@ -109,6 +106,7 @@
109106
"\n",
110107
"is_continuous = (env.brains[brain_name].action_space_type == \"continuous\")\n",
111108
"use_observations = (env.brains[brain_name].number_observations > 0)\n",
109+
"use_states = (env.brains[brain_name].state_space_size > 0)\n",
112110
"\n",
113111
"model_path = './models/{}'.format(run_path)\n",
114112
"summary_path = './summaries/{}'.format(run_path)\n",
@@ -133,7 +131,7 @@
133131
" steps = sess.run(ppo_model.global_step)\n",
134132
" summary_writer = tf.summary.FileWriter(summary_path)\n",
135133
" info = env.reset(train_mode=train_model)[brain_name]\n",
136-
" trainer = Trainer(ppo_model, sess, info, is_continuous, use_observations)\n",
134+
" trainer = Trainer(ppo_model, sess, info, is_continuous, use_observations, use_states)\n",
137135
" while steps <= max_steps:\n",
138136
" if env.global_done:\n",
139137
" info = env.reset(train_mode=train_model)[brain_name]\n",

python/ppo.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
1616
Options:
1717
--help Show this message.
18-
--max-steps=<n> Maximum number of steps to run environment [default: 5e6].
18+
--max-steps=<n> Maximum number of steps to run environment [default: 1e6].
1919
--run-path=<path> The sub-directory name for model and summary statistics [default: ppo].
2020
--load Whether to load the model or randomly initialize [default: False].
2121
--train Whether to train model, or only run inference [default: True].
@@ -73,6 +73,7 @@
7373

7474
is_continuous = (env.brains[brain_name].action_space_type == "continuous")
7575
use_observations = (env.brains[brain_name].number_observations > 0)
76+
use_states = (env.brains[brain_name].state_space_size > 0)
7677

7778
if not os.path.exists(model_path):
7879
os.makedirs(model_path)
@@ -94,7 +95,7 @@
9495
steps = sess.run(ppo_model.global_step)
9596
summary_writer = tf.summary.FileWriter(summary_path)
9697
info = env.reset(train_mode=train_model)[brain_name]
97-
trainer = Trainer(ppo_model, sess, info, is_continuous, use_observations)
98+
trainer = Trainer(ppo_model, sess, info, is_continuous, use_observations, use_states)
9899
while steps <= max_steps or not train_model:
99100
if env.global_done:
100101
info = env.reset(train_mode=train_model)[brain_name]

0 commit comments

Comments
 (0)