You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/project_structure.md
+96-41Lines changed: 96 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,11 @@ We mark requirements with italicised *keywords* that should be interpreted as de
8
8
9
9
## Overview
10
10
11
-
A benchmark dataset is organised into a `Train` and a `Test` split. Each split contains one or more **projects** (i.e. datasets contributed by different groups). Each project contains one or more **sessions**. A session centres on a single video file (the **session video**), from which **frames** (individually sampled images) and optionally **clips** (short video segments) are extracted. In the `Train` split, frames and clips are accompanied by keypoint annotations.
11
+
- A benchmark dataset is organised into a `Train` and a `Test` split.
12
+
- Each split contains one or more [projects](#project) (i.e. datasets contributed by different groups).
13
+
- Each project contains one or more [sessions](#session).
14
+
- A session centres on a single video file (the [session video](#session-video)), from which [frames](#frames) (individually sampled images) and optionally [clips](#clips) (short video segments) are extracted.
15
+
- Frames and clips are accompanied by [label files](#label-format) in COCO keypoints format.
12
16
13
17
The current scope is limited to **single-animal pose estimation** from a **single camera view**. Support for multi-camera setups is planned for a future version.
14
18
@@ -32,20 +36,24 @@ The current scope is limited to **single-animal pose estimation** from a **singl
The `Test` split follows the same structure as `Train`, but label files (`framelabels.json` and `cliplabels.json`) *must* not be included so that they can be used for evaluation.
49
+
The `Test` split follows the same structure as `Train`, but includes different label files (see [Label format](#label-format)for details).
43
50
:::
44
51
45
52
### Train / Test
46
53
47
54
* The top level *must* contain a `Train` and a `Test` folder.
48
55
* Each split *must* contain at least one project folder.
56
+
* Each session *must* belong to exactly one split.
49
57
50
58
### Project
51
59
@@ -79,24 +87,26 @@ The `Test` split follows the same structure as `Train`, but label files (`framel
79
87
80
88
### Frames
81
89
82
-
The `Frames` folder contains individually sampled images and their annotations.
90
+
The `Frames` folder contains individually sampled images. In the `Train` split, it also contains a label file with keypoint annotations.
83
91
84
92
* Frames *must* be extracted from the session video.
85
-
* Frame images *must* be in PNG format.
86
-
* Frame image filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png`.
93
+
* Frame images *should* be in PNG format (`.png`). JPEG format (`.jpg` or `.jpeg`) *may* also be used.
94
+
* Frame image filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.<ext>`, where `<ext>` is `.png`, `.jpg`, or `.jpeg`.
87
95
*`<frameID>`*must* be the 0-based index of the frame in the session video.
88
96
*`<frameID>`*must* be padded to a consistent width across all frame files within a session (e.g. `0000`, `1000`).
89
-
* In the `Train` split, a single label file *must* be provided per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`. At present, only one camera view is included, so the split contains exactly one such label file. See [Label format](#label-format) for details.
97
+
* In the `Train` split, a single label file *must* be provided per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`. At present, only one camera view is included, so the split contains exactly one such label file. See [Frame labels](target-framelabels) for details.
90
98
91
99
### Clips
92
100
93
-
A session *may* include a `Clips` folder containing short video segments and their annotations.
101
+
A session *may* include a `Clips` folder containing short video segments and their label files.
94
102
95
103
* Clips *must* be extracted from the session video and *must* have the same file format.
96
104
* Clip filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4`.
97
105
*`<frameID>` in the `start` field *must* be the 0-based index of the first frame of the clip in the session video, padded to a consistent width (e.g. `0500`, `1000`).
98
106
*`<nFrames>` in the `dur` field *must* be the duration of the clip in number of frames (e.g. `5`, `30`).
99
-
* In the `Train` split, a single label file *must* be provided per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json`. See [Label format](#label-format) for details.
107
+
* A single label file *must* be provided per clip:
108
+
* In the `Train` split, the file is named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json` and contains keypoint annotations for every frame in the clip. See [Clip labels](target-cliplabels) for details.
109
+
* In the `Test` split, the file is named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json` and contains keypoint annotations only for the first frame of the clip. See [Clip start labels](target-startlabels) for details.
100
110
101
111
## File naming
102
112
@@ -107,17 +117,22 @@ All filenames follow a key-value pair convention, similar to the [BIDS standard]
107
117
<key>-<value>_<key>-<value>.<extension>
108
118
<key>-<value>_<key>-<value>_<suffix>.<extension>
109
119
```
110
-
The recognised suffixes are `framelabels` (for frame label files) and `cliplabels` (for clip label files).
120
+
The recognised suffixes are:
121
+
122
+
*`framelabels` for [frame label files](target-framelabels).
123
+
*`cliplabels` for [clip label files](target-cliplabels).
124
+
*`startlabels` for [clip start label files](target-startlabels).
|`cam`| Camera identifier |`cam-topdown`, `cam-side2`|
133
+
|`frame`| 0-based frame index in the session video |`frame-0000`, `frame-0500`, `frame-1000`|
134
+
|`start`| 0-based frame index of the first frame of a clip in the session video |`start-0000`, `start-0500`, `start-1000`|
135
+
|`dur`| Clip duration in number of frames |`dur-5`, `dur-30`|
121
136
122
137
* The keys `sub`, `ses`, and `cam`*must* appear in every filename, in that order.
123
138
* Key values *must* be strictly alphanumeric for `sub`, `ses` and `cam` (i.e. only `A-Z`, `a-z`, `0-9`).
@@ -126,20 +141,24 @@ All filenames follow a key-value pair convention, similar to the [BIDS standard]
126
141
127
142
## Label format
128
143
129
-
* Labels (also referred to as annotations) are only included in the `Train` split, and *must* be stored in the same folder as the corresponding frames or clips.
130
-
* Annotations *must* be stored in [COCO keypoints format](https://cocodataset.org/), with some additional requirements described below. Each label file is a JSON file with `images`, `annotations`, and `categories` arrays. Image, annotation and category `id` values *must* be unique integers within a label file.
144
+
* The `Train` split includes ground-truth keypoint annotations both for the sampled frames (`framelabels.json`) and for entire clips (`cliplabels.json`), if present.
145
+
* The `Test` split includes keypoint annotations only for the first frame of each clip (`startlabels.json`), if clips are present. Labels for frames and entire clips are withheld to support evaluation of pose estimation and point tracking methods.
146
+
* Labels *must* be stored in the same folder as the corresponding frames or clips.
147
+
* Labels *must* be stored in [COCO keypoints format](https://cocodataset.org/#format-data), with additional requirements described below. Each label file is a JSON file with `images`, `annotations`, and `categories` arrays. Image, annotation and category `id` values *must* be unique integers within a label file.
131
148
132
149
:::{note}
133
150
Annotation and category `id` values *should* be 1-indexed. This convention follows sleap-io's [`save_coco`](https://io.sleap.ai/latest/reference/sleap_io/io/coco/) function and avoids conflicts with models that treat category `0` as background.
134
151
135
-
Image `id` values are always 0-indexed. However, the indexing origin differs between frame and clip labels — see below for details.
152
+
Image `id` values are always 0-indexed. The indexing origin differs for frame labels and clip labels, and clip start labels follow the same conventions as clip labels. Details are provided below.
136
153
:::
137
154
155
+
(target-framelabels)=
138
156
### Frame labels (`framelabels.json`)
139
157
140
-
* There *must* be one `framelabels.json` per camera view within the `Frames` folder.
158
+
* Frame labels *must* only exist in the `Train` split.
159
+
* Within the `Frames` folder, there *must* be one frame label file per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`.
141
160
* Each entry in the `images` array *must* have an `id` equal to the 0-based frame index in the session video (matching the `<frameID>` in the corresponding image filename).
142
-
* Each entry in the `images` array *must* have a `file_name` that matches the full filename (including the `.png` extension) of an existing frame image in the `Frames` folder.
161
+
* Each entry in the `images` array *must* have a `file_name` that exactly matches the name of an existing [frame image](#frames) in the `Frames` folder (including the extension).
143
162
144
163
:::{admonition} Example
145
164
:class: tip
@@ -159,13 +178,15 @@ For a session with 5 labelled frames sampled from different parts of the video,
159
178
Here each `id` is the 0-based frame index in the session video (matching the `<frameID>` in the filename), and each `file_name` includes the `.png` extension.
160
179
:::
161
180
181
+
(target-cliplabels)=
162
182
### Clip labels (`cliplabels.json`)
163
183
164
-
* There *must* be one `cliplabels.json` per clip.
184
+
* Clip labels *must* only exist in the `Train` split.
185
+
* If a `Clips` folder is present, there *must* be one clip label file per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json`.
165
186
* The `images` array *must* contain an entry for every frame in the clip, in consecutive, monotonically increasing order (covering the entire clip duration).
166
187
* Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image `id` and `file_name` values:
167
188
* Each image `id`*must* be the **0-based index of the frame within the clip** (i.e. `0`, `1`, `2`, ...), not the index in the session video.
168
-
* Each `file_name`*must* follow the same pattern as frame image filenames, but **without the `.png`extension**. The `frame` field in the `file_name`*must*hold the index of that frame in the **session video**.
189
+
* Each `file_name`*must* follow the same pattern as [frame image filenames](#frames), but **without the extension**. The `frame` field in the `file_name`*must*correspond to the index of that frame in the **session video**.
169
190
170
191
This means that each entry in the `images` array encodes two pieces of information: the `id` gives the local position within the clip, while the `frame` field in `file_name` gives the global position in the session video. Note that in both cases the indices are 0-based.
171
192
@@ -187,6 +208,26 @@ For a clip starting at frame 1000 with a duration of 5 frames, the `images` arra
187
208
Here `id: 0` through `id: 4` are the local clip indices, while `frame-1000` through `frame-1004` in the `file_name` values refer to the original frame positions in the session video.
188
209
:::
189
210
211
+
(target-startlabels)=
212
+
### Clip start labels (`startlabels.json`)
213
+
214
+
* Clip start labels *must* only exist in the `Test` split.
215
+
* If a `Clips` folder is present, there *must* be one clip start label file per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json`.
216
+
* Clip start labels provide keypoint annotations for the **first frame of the clip only**. They are intended for point-tracker evaluation, where the annotated points serve as the initial positions from which a tracker should propagate.
217
+
* Clip start labels are identical to [Clip labels](target-cliplabels), except that the `images` array *must* contain exactly one entry corresponding to the first frame of the clip, and therefore must have `id: 0`.
218
+
219
+
:::{admonition} Example
220
+
:class: tip
221
+
222
+
For a clip starting at frame 1000 with a duration of 5 frames, the `images` array would be:
0 commit comments