Skip to content

Commit b0fdffc

Browse files
authored
Merge pull request #158 from Unity-Technologies/keypoints
Keypoints (Human Pose Estimate)
2 parents 2c5472f + 3683604 commit b0fdffc

35 files changed

+54180
-40
lines changed

com.unity.perception/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ Before upgrading a project to this version of the Perception package, make sure
1313

1414
### Added
1515

16+
Added keypoint ground truth labeling
17+
18+
Added animation randomization
19+
1620
Added ScenarioConstants base class for all scenario constants objects
1721

1822
Added ScenarioBase.SerializeToConfigFile()

com.unity.perception/Documentation~/PerceptionCamera.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,85 @@ _Example rendered object info for a single object_
7777

7878
The RenderedObjectInfoLabeler records a list of all objects visible in the Camera image, including its instance ID, resolved label ID and visible pixels. If Unity cannot resolve objects to a label in the IdLabelConfig, it does not record these objects.
7979

80+
### KeypointLabeler
81+
82+
The keypoint labeler captures keypoints of a labeled gameobject. The typical use of this labeler is capturing human pose
83+
estimation data. The labeler uses a [keypoint template](#KeypointTemplate) which defines the keypoints to capture for the
84+
model and the skeletal connections between those keypoints. The positions of the keypoints are recorded in pixel coordinates
85+
and saved to the captures json file.
86+
87+
```
88+
keypoints {
89+
label_id: <int> -- Integer identifier of the label
90+
instance_id: <str> -- UUID of the instance.
91+
template_guid: <str> -- UUID of the keypoint template
92+
pose: <str> -- Pose ground truth information
93+
keypoints [ -- Array of keypoint data, one entry for each keypoint defined in associated template file.
94+
{
95+
index: <int> -- Index of keypoint in template
96+
x: <float> -- X pixel coordinate of keypoint
97+
y: <float> -- Y pixel coordinate of keypoint
98+
state: <int> -- 0: keypoint does not exist, 1 keypoint exists
99+
}, ...
100+
]
101+
}
102+
```
103+
104+
#### Keypoint Template
105+
106+
keypoint templates are used to define the keypoints and skeletal connections captured by the KeypointLabeler. The keypoint
107+
template takes advantage of Unity's humanoid animation rig, and allows the user to automatically associate template keypoints
108+
to animation rig joints. Additionally, the user can choose to ignore the rigged points, or add points not defined in the rig.
109+
A Coco keypoint template is included in the perception package.
110+
111+
##### Editor
112+
113+
The keypoint template editor allows the user to create/modify a keypoint template. The editor consists of the header information,
114+
the keypoint array, and the skeleton array.
115+
116+
![Header section of the keypoint template](images/keypoint_template_header.png)
117+
<br/>_Header section of the keypoint template_
118+
119+
In the header section, a user can change the name of the template and supply textures that they would like to use for the keypoint
120+
visualization.
121+
122+
![The keypoint section of the keypoint template](images/keypoint_template_keypoints.png)
123+
<br/>_Keypoint section of the keypoint template_
124+
125+
The keypoint section allows the user to create/edit keypoints and associate them with Unity animation rig points. Each keypoint record
126+
has 4 fields: label (the name of the keypoint), Associate to Rig (a boolean value which, if true, automatically maps the keypoint to
127+
the gameobject defined by the rig), Rig Label (only needed if Associate To Rig is true, defines which rig component to associate with
128+
the keypoint), and Color (RGB color value of the keypoint in the visualization).
129+
130+
![Skeleton section of the keypoint template](images/keypoint_template_skeleton.png)
131+
<br/>_Skeleton section of the keypoint template_
132+
133+
The skeleton section allows the user to create connections between joints, basically defining the skeleton of a labeled object.
134+
135+
##### Format
136+
```
137+
annotation_definition.spec {
138+
template_id: <str> -- The UUID of the template
139+
template_name: <str> -- Human readable name of the template
140+
key_points [ -- Array of joints defined in this template
141+
{
142+
label: <str> -- The label of the joint
143+
index: <int> -- The index of the joint
144+
}, ...
145+
]
146+
skeleton [ -- Array of skeletal connections (which joints have connections between one another) defined in this template
147+
{
148+
joint1: <int> -- The first joint of the connection
149+
joint2: <int> -- The second joint of the connection
150+
}, ...
151+
]
152+
}
153+
```
154+
155+
#### Animation Pose Label
156+
157+
This file is used to define timestamps in an animation to a pose label.
158+
80159
## Limitations
81160

82161
Ground truth is not compatible with all rendering features, especially those that modify the visibility or shape of objects in the frame.

com.unity.perception/Documentation~/Schema/Synthetic_Dataset_Schema.md

Lines changed: 106 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -172,21 +172,21 @@ A grayscale PNG file that stores integer values (label pixel_value in [annotatio
172172

173173
#### capture.annotation.values
174174

175-
<!-- Not yet implemented annotations
176-
##### instance segmentation - polygon
175+
##### instance segmentation - color image
177176

178-
A json object that stores collections of polygons. Each polygon record maps a tuple of (instance, label) to a list of
179-
K pixel coordinates that forms a polygon. This object can be directly stored in annotation.values
177+
A color png file that stores instance ids as a color value per pixel. The png files are located in the "filename" location.
180178

181179
```
182-
semantic_segmentation_polygon {
183-
label_id: <int> -- Integer identifier of the label
184-
label_name: <str> -- String identifier of the label
185-
instance_id: <str> -- UUID of the instance.
186-
polygon: [<int, int>,...] -- List of points in pixel coordinates of the outer edge. Connecting these points in order should create a polygon that identifies the object.
180+
instance_segmentation {
181+
instance_id: <int> -- The instance ID of the labeled object
182+
color { -- The pixel color that correlates with the instance ID
183+
r: <int> -- The red value of the pixel between 0 and 255
184+
g: <int> -- The green value of the pixel between 0 and 255
185+
b: <int> -- The blue value of the pixel between 0 and 255
186+
a: <int> -- The alpha value of the pixel between 0 and 255
187+
}
187188
}
188189
```
189-
-->
190190

191191
##### 2D bounding box
192192

@@ -196,36 +196,77 @@ We follow the OpenCV 2D coordinate [system](https://github.com/vvvv/VL.OpenCV/wi
196196

197197
```
198198
bounding_box_2d {
199-
label_id: <int> -- Integer identifier of the label
200-
label_name: <str> -- String identifier of the label
201-
instance_id: <str> -- UUID of the instance.
199+
label_id: <int> -- Integer identifier of the label
200+
label_name: <str> -- String identifier of the label
201+
instance_id: <str> -- UUID of the instance.
202202
x: <float> -- x coordinate of the upper left corner.
203203
y: <float> -- y coordinate of the upper left corner.
204204
width: <float> -- number of pixels in the x direction
205205
height: <float> -- number of pixels in the y direction
206206
}
207207
```
208-
<!-- Not yet implemented annotations
209208

210209
##### 3D bounding box
211210

212-
A json file that stored collections of 3D bounding boxes.
213-
Each bounding box record maps a tuple of (instance, label) to translation, size and rotation that draws a 3D bounding box, as well as velocity and acceleration (optional) of the 3D bounding box.
214-
All location data is given with respect to the **sensor coordinate system**.
211+
3D bounding box information. Unlike the 2D bounding box, 3D bounding boxes coordinates are captured in **sensor coordinate system**.
212+
Each bounding box record maps a tuple of (instance, label) to translation, size and rotation that draws a 3D bounding box, as well as velocity and acceleration (optional) of the 3D bounding box.
215213

216214
```
217215
bounding_box_3d {
218-
label_id: <int> -- Integer identifier of the label
219-
label_name: <str> -- String identifier of the label
220-
instance_id: <str> -- UUID of the instance.
221-
translation: <float, float, float> -- 3d bounding box's center location in meters as center_x, center_y, center_z with respect to global coordinate system.
222-
size: <float, float, float> -- 3d bounding box size in meters as width, length, height.
223-
rotation: <float, float, float, float> -- 3d bounding box orientation as quaternion: w, x, y, z.
224-
velocity: <float, float, float> -- 3d bounding box velocity in meters per second as v_x, v_y, v_z.
225-
acceleration: <float, float, float> [optional] -- 3d bounding box acceleration in meters per second^2 as a_x, a_y, a_z.
216+
label_id: <int> -- Integer identifier of the label
217+
label_name: <str> -- String identifier of the label
218+
instance_id: <str> -- UUID of the instance.
219+
translation { -- 3d bounding box's center location in meters with respect to global coordinate system.
220+
x: <float> -- The x coordinate
221+
y: <float> -- The y coordinate
222+
z: <float> -- The z coordinate
223+
}
224+
size { -- 3d bounding box size in meters
225+
x: <float> -- The x coordinate
226+
y: <float> -- The y coordinate
227+
z: <float> -- The z coordinate
228+
}
229+
rotation { -- 3d bounding box orientation as quaternion: w, x, y, z.
230+
x: <float> -- The x coordinate
231+
y: <float> -- The y coordinate
232+
z: <float> -- The z coordinate
233+
w: <float> -- The w coordinate
234+
}
235+
velocity { -- [Optional] 3d bounding box velocity in meters per second.
236+
x: <float> -- The x coordinate
237+
y: <float> -- The y coordinate
238+
z: <float> -- The z coordinate
239+
}
240+
acceleration { -- [Optional] 3d bounding box acceleration in meters per second^2.
241+
x: <float> -- The x coordinate
242+
y: <float> -- The y coordinate
243+
z: <float> -- The z coordinate
244+
}
245+
}
246+
```
247+
##### Keypoints
248+
249+
Keypoint data, commonly used for human pose estimation. A keypoint capture is associated to a template that defines the keypoints (see annotation.definition file).
250+
Each keypoint record maps a tuple of (instance, label) to template, pose, and an array of keypoints. A keypoint will exist in this record for each keypoint defined in the template file.
251+
If a given keypoint doesn't exist in the labeled gameobject, then that keypoint will have a state value of 0; if it does exist then it will have a keypoint value of 2.
252+
```
253+
keypoints {
254+
label_id: <int> -- Integer identifier of the label
255+
instance_id: <str> -- UUID of the instance.
256+
template_guid: <str> -- UUID of the keypoint template
257+
pose: <str> -- Pose ground truth information
258+
keypoints [ -- Array of keypoint data, one entry for each keypoint defined in associated template file.
259+
{
260+
index: <int> -- Index of keypoint in template
261+
x: <float> -- X pixel coordinate of keypoint
262+
y: <float> -- Y pixel coordinate of keypoint
263+
state: <int> -- 0: keypoint does not exist, 2 keypoint exists
264+
}, ...
265+
]
226266
}
227267
```
228268

269+
<!-- Not yet implemented annotations
229270
230271
#### instances (V2, WIP)
231272
@@ -303,27 +344,52 @@ Each record describes a particular type of annotation and contains an annotation
303344
Typically, the `spec` key describes all labels_id and label_name used by the annotation.
304345
Some special cases like semantic segmentation might assign additional values (e.g. pixel value) to record the mapping between label_id/label_name and pixel color in the annotated PNG files.
305346

347+
##### annotation definition header
306348
```
307349
annotation_definition {
308-
id: <int> -- Integer identifier of the annotation definition.
309-
name: <str> -- Human readable annotation spec name (e.g. sementic_segmentation, instance_segmentation, etc.)
310-
description: <str, optional> -- Description of this annotation specifications.
311-
format: <str> -- The format of the annotation files. (e.g. png, json, etc.)
312-
spec: [<obj>...] -- Format-specific specification for the annotation values (ex. label-value mappings for semantic segmentation images)
350+
id: <int> -- Integer identifier of the annotation definition.
351+
name: <str> -- Human readable annotation spec name (e.g. sementic_segmentation, instance_segmentation, etc.)
352+
description: <str> -- [Optional] Description of this annotation specifications.
353+
format: <str> -- The format of the annotation files. (e.g. png, json, etc.)
354+
spec: [<obj>...] -- Format-specific specification for the annotation values (ex. label-value mappings for semantic segmentation images)
313355
}
314-
315-
# semantic segmentation
356+
```
357+
##### semantic segmentation
358+
Annotation spec for semantic [segmentation labeler](#semantic-segmentation---grayscale-image)
359+
```
316360
annotation_definition.spec {
317-
label_id: <int> -- Integer identifier of the label
318-
label_name: <str> -- String identifier of the label
319-
pixel_value: <int> -- Grayscale pixel value
320-
color_pixel_value: <int, int, int> [optional] -- Color pixel value
361+
label_id: <int> -- Integer identifier of the label
362+
label_name: <str> -- String identifier of the label
363+
pixel_value: <int> -- Grayscale pixel value
364+
color_pixel_value: <int, int, int> -- [Optional] Color pixel value
321365
}
322-
323-
# label enumeration spec, used for annotations like bounding box 2d. This might be a subset of all labels used in simulation.
366+
```
367+
##### label enumeration spec
368+
This spec is used for annotations like [bounding box 2d](#2d-bounding-box). This might be a subset of all labels used in simulation.
369+
```
324370
annotation_definition.spec {
325-
label_id: <int> -- Integer identifier of the label
326-
label_name: <str> -- String identifier of the label
371+
label_id: <int> -- Integer identifier of the label
372+
label_name: <str> -- String identifier of the label
373+
}
374+
```
375+
##### keypoint template
376+
keypoint templates are used to define the keypoints and skeletal connections captured by the [keypoint labeler](#keypoints).
377+
```
378+
annotation_definition.spec {
379+
template_id: <str> -- The UUID of the template
380+
template_name: <str> -- Human readable name of the template
381+
key_points [ -- Array of joints defined in this template
382+
{
383+
label: <str> -- The label of the joint
384+
index: <int> -- The index of the joint
385+
}, ...
386+
]
387+
skeleton [ -- Array of skeletal connections (which joints have connections between one another) defined in this template
388+
{
389+
joint1: <int> -- The first joint of the connection
390+
joint2: <int> -- The second joint of the connection
391+
}, ...
392+
]
327393
}
328394
```
329395

29.7 KB
Loading
49.7 KB
Loading
22.3 KB
Loading
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
using UnityEditor;
2+
using UnityEditor.UIElements;
3+
using UnityEngine.Experimental.Perception.Randomization.Randomizers;
4+
using UnityEngine.UIElements;
5+
6+
namespace UnityEngine.Experimental.Perception.Randomization.Editor
7+
{
8+
[CustomEditor(typeof(RandomizerTag), true)]
9+
public class RandomizerTagEditor : UnityEditor.Editor
10+
{
11+
public override VisualElement CreateInspectorGUI()
12+
{
13+
var rootElement = new VisualElement();
14+
CreatePropertyFields(rootElement);
15+
return rootElement;
16+
}
17+
18+
void CreatePropertyFields(VisualElement rootElement)
19+
{
20+
var iterator = serializedObject.GetIterator();
21+
iterator.NextVisible(true);
22+
do
23+
{
24+
if (iterator.name == "m_Script")
25+
continue;
26+
var propertyField = new PropertyField(iterator.Copy());
27+
propertyField.Bind(serializedObject);
28+
rootElement.Add(propertyField);
29+
} while (iterator.NextVisible(false));
30+
}
31+
}
32+
}

com.unity.perception/Editor/Randomization/Editors/RandomizerTagEditor.cs.meta

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using System.Linq;
4+
5+
namespace UnityEngine.Perception.GroundTruth
6+
{
7+
/// <summary>
8+
/// Record that maps a pose to a timestamp
9+
/// </summary>
10+
[Serializable]
11+
public class PoseTimestampRecord
12+
{
13+
/// <summary>
14+
/// The percentage within the clip that the pose starts, a value from 0 (beginning) to 1 (end)
15+
/// </summary>
16+
[Tooltip("The percentage within the clip that the pose starts, a value from 0 (beginning) to 1 (end)")]
17+
public float startOffsetPercent;
18+
/// <summary>
19+
/// The label to use for any captures inside of this time period
20+
/// </summary>
21+
public string poseLabel;
22+
}
23+
24+
/// <summary>
25+
/// The animation pose label is a mapping that file that maps a time range in an animation clip to a ground truth
26+
/// pose. The timestamp record is defined by a pose label and a duration. The timestamp records are order dependent
27+
/// and build on the previous entries. This means that if the first record has a duration of 5, then it will be the label
28+
/// for all points in the clip from 0 (the beginning) to the five second mark. The next record will then go from the end
29+
/// of the previous clip to its duration. If there is time left over in the flip, the final entry will be used.
30+
/// </summary>
31+
[CreateAssetMenu(fileName = "AnimationPoseTimestamp", menuName = "Perception/Animation Pose Timestamps")]
32+
public class AnimationPoseLabel : ScriptableObject
33+
{
34+
/// <summary>
35+
/// The animation clip used for all of the timestamps
36+
/// </summary>
37+
public AnimationClip animationClip;
38+
/// <summary>
39+
/// The list of timestamps, order dependent
40+
/// </summary>
41+
public List<PoseTimestampRecord> timestamps;
42+
43+
/// <summary>
44+
/// Retrieves the pose for the clip at the current time.
45+
/// </summary>
46+
/// <param name="time">The time in question</param>
47+
/// <returns>The pose for the passed in time</returns>
48+
public string GetPoseAtTime(float time)
49+
{
50+
if (time < 0 || time > 1) return "unset";
51+
52+
var i = 1;
53+
for (i = 1; i < timestamps.Count; i++)
54+
{
55+
if (timestamps[i].startOffsetPercent > time) break;
56+
}
57+
58+
return timestamps[i - 1].poseLabel;
59+
}
60+
}
61+
}

com.unity.perception/Runtime/GroundTruth/Labelers/AnimationPoseLabel.cs.meta

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)