Skip to content

Commit 03b0493

Browse files
author
Marwan Mattar
authored
Merge pull request #471 from Unity-Technologies/docs-improvements
Docs improvements
2 parents f7ab9ba + 3a26d8f commit 03b0493

20 files changed

+27
-36
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ to the wider research and game developer communities.
2727
* Built-in support for Imitation Learning
2828
* Flexible Agent control with On Demand Decision Making
2929
* Visualizing network outputs within the environment
30-
* Simplified set-up with Docker _(Experimental)_
30+
* Simplified set-up with Docker (Experimental)
3131

3232
## Documentation and References
3333

docs/Background-Machine-Learning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ natural choice for reinforcement learning tasks when a large amount of data
194194
can be generated, say through the use of a simulator or engine such as Unity.
195195
By generating hundreds of thousands of simulations of
196196
the environment within Unity, we can learn policies for very complex environments
197-
(a complex environment is one where the number of observations an agent percieves
197+
(a complex environment is one where the number of observations an agent perceives
198198
and the number of actions they can take are large).
199199
Many of the algorithms we provide in ML-Agents use some form of deep learning,
200200
built on top of the open-source library, [TensorFlow](Background-TensorFlow.md).

docs/Getting-Started-with-Balance-Ball.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ during a successful training session.
313313

314314
![Example TensorBoard Run](images/mlagents-TensorBoard.png)
315315

316-
## Embedding the Trained Brain into the Unity Environment _[Experimental]_
316+
## Embedding the Trained Brain into the Unity Environment (Experimental)
317317

318318
Once the training process completes, and the training process saves the model
319319
(denoted by the `Saved Model` message) you can add it to the Unity project and

docs/Learning-Environment-Best-Practices.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,12 @@ complexity over time. This can either be done manually, or via Curriculum Learni
1515

1616
## Vector Observations
1717
* Vector Observations should include all variables relevant to allowing the agent to take the optimally informed decision.
18-
* Categorical variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie `3` -> `0, 0, 1`).
18+
* Categorical variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`).
1919
* Besides encoding non-numeric values, all inputs should be normalized to be in the range 0 to +1 (or -1 to 1). For example, the `x` position information of an agent where the maximum possible value is `maxValue` should be recorded as `AddVectorObs(transform.position.x / maxValue);` rather than `AddVectorObs(transform.position.x);`. See the equation below for one approach of normalization.
2020
* Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position.
2121

2222
![normalization](images/normalization.png)
2323

2424
## Vector Actions
2525
* When using continuous control, action values should be clipped to an appropriate range.
26-
* Be sure to set the Vector Action's Space Size to the number of used Vector Actions, and not greater, as doing the latter can interfere with the efficency of the training process.
26+
* Be sure to set the Vector Action's Space Size to the number of used Vector Actions, and not greater, as doing the latter can interfere with the efficiency of the training process.

docs/Learning-Environment-Create-New.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,6 @@ So far, our RollerAgent script looks like:
161161

162162
public class RollerAgent : Agent
163163
{
164-
165164
Rigidbody rBody;
166165
void Start () {
167166
rBody = GetComponent<Rigidbody>();
@@ -195,7 +194,7 @@ The Agent sends the information we collect to the Brain, which uses it to make a
195194

196195
In our case, the information our agent collects includes:
197196

198-
* Position of the target. In general, it is better to use the relative position of other objects rather than the absolute position for more generalizable training. Note that the agent only collects the x and z coordinates since the floor is aligned with the xz plane and the y component of the target's position never changes.
197+
* Position of the target. In general, it is better to use the relative position of other objects rather than the absolute position for more generalizable training. Note that the agent only collects the x and z coordinates since the floor is aligned with the x-z plane and the y component of the target's position never changes.
199198

200199
// Calculate relative position
201200
Vector3 relativePosition = Target.position - this.transform.position;

docs/Learning-Environment-Design-Agents.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ For examples of various state observation functions, you can look at the [Exampl
5858
AddVectorObs(ball.transform.GetComponent<Rigidbody>().velocity.z);
5959
}
6060

61-
<!-- Note that the above values aren't normalized, which we recommend! -->
62-
6361
The feature vector must always contain the same number of elements and observations must always be in the same position within the list. If the number of observed entities in an environment can vary you can pad the feature vector with zeros for any missing entities in a specific observation or you can limit an agent's observations to a fixed subset. For example, instead of observing every enemy agent in an environment, you could only observe the closest five.
6462

6563
When you set up an Agent's brain in the Unity Editor, set the following properties to use a continuous vector observation:
@@ -94,11 +92,6 @@ Type enumerations should be encoded in the _one-hot_ style. That is, add an elem
9492
}
9593

9694

97-
<!--
98-
How to handle things like large numbers of words or symbols? Should you use a very long one-hot vector? Or a single index into a table?
99-
Colors? Better to use a single color number or individual components?
100-
-->
101-
10295
#### Normalization
10396

10497
For the best results when training, you should normalize the components of your feature vector to the range [-1, +1] or [0, 1]. When you normalize the values, the PPO neural network can often converge to a solution faster. Note that it isn't always necessary to normalize to these recommended ranges, but it is considered a best practice when using neural networks. The greater the variation in ranges between the components of your observation, the more likely that training will be affected.
@@ -129,7 +122,7 @@ In addition, make sure that the Agent's Brain expects a visual observation. In t
129122

130123
### Discrete Vector Observation Space: Table Lookup
131124

132-
You can use the discrete vector observation space when an agent only has a limited number of possible states and those states can be enumerated by a single number. For instance, the [Basic example environment](Learning-Environment-Examples.md) in the ML Agent SDK defines an agent with a discrete vector observation space. The states of this agent are the integer steps between two linear goals. In the Basic example, the agent learns to move to the goal that provides the greatest reward.
125+
You can use the discrete vector observation space when an agent only has a limited number of possible states and those states can be enumerated by a single number. For instance, the [Basic example environment](Learning-Environment-Examples.md) in ML-Agents defines an agent with a discrete vector observation space. The states of this agent are the integer steps between two linear goals. In the Basic example, the agent learns to move to the goal that provides the greatest reward.
133126

134127
More generally, the discrete vector observation identifier could be an index into a table of the possible states. However, tables quickly become unwieldy as the environment becomes more complex. For example, even a simple game like [tic-tac-toe has 765 possible states](https://en.wikipedia.org/wiki/Game_complexity) (far more if you don't reduce the number of observations by combining those that are rotations or reflections of each other).
135128

@@ -310,7 +303,7 @@ To add an Agent to an environment at runtime, use the Unity `GameObject.Instanti
310303

311304
## Destroying an Agent
312305

313-
Before destroying an Agent Gameobject, you must mark it as done (and wait for the next step in the simulation) so that the Brain knows that this agent is no longer active. Thus, the best place to destroy an agent is in the `Agent.AgentOnDone()` function:
306+
Before destroying an Agent GameObject, you must mark it as done (and wait for the next step in the simulation) so that the Brain knows that this agent is no longer active. Thus, the best place to destroy an agent is in the `Agent.AgentOnDone()` function:
314307

315308
```csharp
316309
public override void AgentOnDone()

docs/Learning-Environment-Design-Brains.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The Brain Inspector window in the Unity Editor displays the properties assigned
2424
* `Space Type` - Corresponds to whether the observation vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
2525
* `Space Size` - Length of vector observation for brain (In _Continuous_ space type). Or number of possible values (in _Discrete_ space type).
2626
* `Stacked Vectors` - The number of previous vector observations that will be stacked before being sent to the brain.
27-
* `Visual Observations` - Describes height, width, and whether to greyscale visual observations for the Brain.
27+
* `Visual Observations` - Describes height, width, and whether to grayscale visual observations for the Brain.
2828
* `Vector Action`
2929
* `Space Type` - Corresponds to whether action vector contains a single integer (Discrete) or a series of real-valued floats (Continuous).
3030
* `Space Size` - Length of action vector for brain (In _Continuous_ state space). Or number of possible values (in _Discrete_ action space).

docs/Learning-Environment-Design-External-Internal-Brains.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The training algorithms included in the ML-Agents SDK produce TensorFlow graph m
2424

2525
To use a graph model:
2626

27-
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.)
27+
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy GameObject and must have a Brain component.)
2828
2. Set the **Brain Type** to **Internal**.
2929

3030
**Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).
@@ -44,7 +44,7 @@ The default values of the TensorFlow graph parameters work with the model produc
4444
![Internal Brain Inspector](images/internal_brain.png)
4545

4646

47-
* `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
47+
* `Graph Model` : This must be the `bytes` file corresponding to the pre-trained TensorFlow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector)
4848

4949
Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using an ML-Agents model:
5050

docs/Learning-Environment-Design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The ML-Agents Academy class orchestrates the agent simulation loop as follows:
2525

2626
To create a training environment, extend the Academy and Agent classes to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAction()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario.
2727

28-
**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [External ML API](Python-API.md) for more information.
28+
**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [Python API](Python-API.md) for more information.
2929

3030
## Organizing the Unity Scene
3131

docs/Python-API.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ A BrainInfo object contains the following fields:
3838
* **`text_observations`** : A list of string corresponding to the agents text observations.
3939
* **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step.
4040
* **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step.
41-
* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (wether or not the agent is done).
41+
* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (whether or not the agent is done).
4242
* **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps.
4343
* **`agents`** : A list of the unique ids of the agents using the brain.
4444
* **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, 1)` if the vector action space is discrete.

0 commit comments

Comments
 (0)