You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: com.unity.ml-agents/Documentation~/Learning-Environment-Create-New.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@ Then, edit the new `RollerAgent` script:
94
94
usingUnity.MLAgents.Actuators;
95
95
```
96
96
then change the base class from `MonoBehaviour` to `Agent`.
97
-
1. Delete `Update()` since we are not using it, but keep `Start()`.
97
+
3. Delete `Update()` since we are not using it, but keep `Start()`.
98
98
99
99
So far, these are the basic steps that you would use to add ML-Agents to any Unity project. Next, we will add the logic that will let our Agent learn to roll to the cube using reinforcement learning. More specifically, we will need to extend three methods from the `Agent` base class:
Copy file name to clipboardExpand all lines: com.unity.ml-agents/Documentation~/Learning-Environment-Design-Agents.md
+14-29Lines changed: 14 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,23 +8,16 @@ The `Policy` class abstracts out the decision making logic from the Agent itself
8
8
9
9
When you create an Agent, you should usually extend the base Agent class. This includes implementing the following methods:
10
10
11
-
-`Agent.OnEpisodeBegin()` — Called at the beginning of an Agent's episode,
12
-
including at the beginning of the simulation.
13
-
-`Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent
14
-
requests a decision. This is one possible way for collecting the Agent's observations of the environment; see [Generating Observations](#generating-observations) below for more options.
15
-
-`Agent.OnActionReceived()` — Called every time the Agent receives an action to
16
-
take. Receives the action chosen by the Agent. It is also common to assign a reward in this method.
17
-
-`Agent.Heuristic()` - When the `Behavior Type` is set to `Heuristic Only` in
18
-
the Behavior Parameters of the Agent, the Agent will use the `Heuristic()` method to generate the actions of the Agent. As such, the `Heuristic()` method writes to the array of floats provided to the Heuristic method as argument. __Note__: Do not create a new float array of action in the `Heuristic()` method, as this will prevent writing floats to the original action array.
11
+
-`Agent.OnEpisodeBegin()` — Called at the beginning of an Agent's episode, including at the beginning of the simulation.
12
+
-`Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent requests a decision. This is one possible way for collecting the Agent's observations of the environment; see [Generating Observations](#generating-observations) below for more options.
13
+
-`Agent.OnActionReceived()` — Called every time the Agent receives an action to take. Receives the action chosen by the Agent. It is also common to assign a reward in this method.
14
+
-`Agent.Heuristic()` - When the `Behavior Type` is set to `Heuristic Only` in the Behavior Parameters of the Agent, the Agent will use the `Heuristic()` method to generate the actions of the Agent. As such, the `Heuristic()` method writes to the array of floats provided to the Heuristic method as argument. __Note__: Do not create a new float array of action in the `Heuristic()` method, as this will prevent writing floats to the original action array.
19
15
20
16
As a concrete example, here is how the Ball3DAgent class implements these methods:
21
17
22
-
-`Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting
23
-
positions. The function randomizes the reset values so that the training generalizes to more than a specific starting position and agent cube orientation.
24
-
-`Agent.CollectObservations(VectorSensor sensor)` — Adds information about the
25
-
orientation of the agent cube, the ball velocity, and the relative position between the ball and the cube. Since the `CollectObservations()` method calls `VectorSensor.AddObservation()` such that vector size adds up to 8, the Behavior Parameters of the Agent are set with vector observation space with a state size of 8.
26
-
-`Agent.OnActionReceived()` — The action results
27
-
in a small change in the agent cube's rotation at each step. In this example, an Agent receives a small positive reward for each step it keeps the ball on the agent cube's head and a larger, negative reward for dropping the ball. An Agent's episode is also ended when it drops the ball so that it will reset with a new ball for the next simulation step.
18
+
-`Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting positions. The function randomizes the reset values so that the training generalizes to more than a specific starting position and agent cube orientation.
19
+
-`Agent.CollectObservations(VectorSensor sensor)` — Adds information about the orientation of the agent cube, the ball velocity, and the relative position between the ball and the cube. Since the `CollectObservations()` method calls `VectorSensor.AddObservation()` such that vector size adds up to 8, the Behavior Parameters of the Agent are set with vector observation space with a state size of 8.
20
+
-`Agent.OnActionReceived()` — The action results in a small change in the agent cube's rotation at each step. In this example, an Agent receives a small positive reward for each step it keeps the ball on the agent cube's head and a larger, negative reward for dropping the ball. An Agent's episode is also ended when it drops the ball so that it will reset with a new ball for the next simulation step.
28
21
-`Agent.Heuristic()` - Converts the keyboard inputs into actions.
29
22
30
23
## Decisions
@@ -36,16 +29,12 @@ In order for an agent to learn, the observations should include all the informat
36
29
37
30
### Generating Observations
38
31
ML-Agents provides multiple ways for an Agent to make observations:
39
-
1. Overriding the `Agent.CollectObservations()` method and passing the
40
-
observations to the provided `VectorSensor`.
41
-
1. Adding the `[Observable]` attribute to fields and properties on the Agent.
42
-
1. Implementing the `ISensor` interface, using a `SensorComponent` attached to
43
-
the Agent to create the `ISensor`.
32
+
1. Overriding the `Agent.CollectObservations()` method and passing the observations to the provided `VectorSensor`.
33
+
2. Adding the `[Observable]` attribute to fields and properties on the Agent.
34
+
3. Implementing the `ISensor` interface, using a `SensorComponent` attached to the Agent to create the `ISensor`.
44
35
45
36
#### Agent.CollectObservations()
46
-
Agent.CollectObservations() is best used for aspects of the environment which are numerical and non-visual. The Policy class calls the
47
-
`CollectObservations(VectorSensor sensor)` method of each Agent. Your
48
-
implementation of this function must call `VectorSensor.AddObservation` to add vector observations.
37
+
Agent.CollectObservations() is best used for aspects of the environment which are numerical and non-visual. The Policy class calls the `CollectObservations(VectorSensor sensor)` method of each Agent. Your implementation of this function must call `VectorSensor.AddObservation` to add vector observations.
49
38
50
39
The `VectorSensor.AddObservation` method provides a number of overloads for adding common types of data to your observation vector. You can add Integers and booleans directly to the observation vector, as well as some common Unity data types such as `Vector2`, `Vector3`, and `Quaternion`.
51
40
@@ -90,8 +79,7 @@ public class Ball3DHardAgent : Agent {
90
79
}
91
80
}
92
81
```
93
-
`ObservableAttribute` currently supports most basic types (e.g. floats, ints,
94
-
bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums.
82
+
`ObservableAttribute` currently supports most basic types (e.g. floats, ints, bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums.
95
83
96
84
The behavior of `ObservableAttribute`s are controlled by the "Observable Attribute Handling" in the Agent's `Behavior Parameters`. The possible values for this are:
97
85
***Ignore** (default) - All ObservableAttributes on the Agent will be ignored. If there are no ObservableAttributes on the Agent, this will result in the fastest initialization time.
@@ -102,9 +90,7 @@ The behavior of `ObservableAttribute`s are controlled by the "Observable Attribu
102
90
103
91
Internally, ObservableAttribute uses reflection to determine which members of the Agent have ObservableAttributes, and also uses reflection to access the fields or invoke the properties at runtime. This may be slower than using CollectObservations or an ISensor, although this might not be enough to noticeably affect performance.
104
92
105
-
**NOTE**: you do not need to adjust the Space Size in the Agent's
106
-
`Behavior Parameters` when you add `[Observable]` fields or properties to an
107
-
Agent, since their size can be computed before they are used.
93
+
**NOTE**: you do not need to adjust the Space Size in the Agent's `Behavior Parameters` when you add `[Observable]` fields or properties to an Agent, since their size can be computed before they are used.
108
94
109
95
#### ISensor interface and SensorComponents
110
96
The `ISensor` interface is generally intended for advanced users. The `Write()` method is used to actually generate the observation, but some other methods such as returning the shape of the observations must also be implemented.
@@ -616,8 +602,7 @@ Multi Agent Groups should be used with the MA-POCA trainer, which is explicitly
616
602
617
603
See the [Cooperative Push Block](Learning-Environment-Examples.md#cooperative-push-block) environment for an example of how to use Multi Agent Groups, and the [Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) environment for an example of how the Multi Agent Group can be used with agents that are removed from the scene mid-episode.
618
604
619
-
**NOTE**: Groups differ from Teams (for competitive settings) in the following way - Agents
620
-
working together should be added to the same Group, while agents playing against each other should be given different Team Ids. If in the Scene there is one playing field and two teams, there should be two Groups, one for each team, and each team should be assigned a different Team Id. If this playing field is duplicated many times in the Scene (e.g. for training speedup), there should be two Groups _per playing field_, and two unique Team Ids _for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and self-play can be used together for training. In the diagram below, there are two agents on each team, and two playing fields where teams are pitted against each other. All the blue agents should share a Team Id (and the orange ones a different ID), and there should be four group managers, one per pair of agents.
605
+
**NOTE**: Groups differ from Teams (for competitive settings) in the following way - Agents working together should be added to the same Group, while agents playing against each other should be given different Team Ids. If in the Scene there is one playing field and two teams, there should be two Groups, one for each team, and each team should be assigned a different Team Id. If this playing field is duplicated many times in the Scene (e.g. for training speedup), there should be two Groups _per playing field_, and two unique Team Ids _for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and self-play can be used together for training. In the diagram below, there are two agents on each team, and two playing fields where teams are pitted against each other. All the blue agents should share a Team Id (and the orange ones a different ID), and there should be four group managers, one per pair of agents.
621
606
622
607
<palign="center"> <imgsrc="images/groupmanager_teamid.png"alt="Group Manager vs Team Id"width="650"border="10" /> </p>
Copy file name to clipboardExpand all lines: com.unity.ml-agents/Documentation~/ML-Agents-Overview.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -245,8 +245,7 @@ Unlike other platforms, where the agent’s observation might be limited to a si
245
245
When visual observations are utilized, the ML-Agents Toolkit leverages convolutional neural networks (CNN) to learn from the input images. We offer three network architectures:
246
246
247
247
- a simple encoder which consists of two convolutional layers
248
-
- the implementation proposed by
249
-
[Mnih et al.](https://www.nature.com/articles/nature14236), consisting of three convolutional layers,
248
+
- the implementation proposed by [Mnih et al.](https://www.nature.com/articles/nature14236), consisting of three convolutional layers,
250
249
- the [IMPALA Resnet](https://arxiv.org/abs/1802.01561) consisting of three stacked layers, each with two residual blocks, making a much larger network than the other two.
251
250
252
251
The choice of the architecture depends on the visual complexity of the scene and the available computational resources.
Copy file name to clipboardExpand all lines: com.unity.ml-agents/Documentation~/Package-Settings.md
-6Lines changed: 0 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,9 +21,3 @@ By clicking the gear on the top right you'll see all available settings listed i
21
21
This allows you to create different settings for different scenarios. For example, you can create two separate settings for training and inference, and specify which one you want to use according to what you're currently running.
0 commit comments