Skip to content

Commit 8f17ad7

Browse files
niksirbilochhh
andauthored
Spec: support clip start labels (startlabels.json) (#34)
* allod PNG and JPG images * draft startlabels text * fix links to startlabels * fully adapt spec to accommodate startlabels * preclude shared sessions between Train/Test * disable primary sidebar on spec page * Indent keys table * Apply suggestions from code review Co-authored-by: Chang Huan Lo <changhuanlo@yahoo.com> * configure linkcheck --------- Co-authored-by: lochhh <changhuan.lo@ucl.ac.uk> Co-authored-by: Chang Huan Lo <changhuanlo@yahoo.com>
1 parent 00159ea commit 8f17ad7

File tree

3 files changed

+120
-42
lines changed

3 files changed

+120
-42
lines changed

.github/workflows/docs_build_and_deploy.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
with:
3232
python-version: "3.13"
3333
use-requirements-txt: false
34+
github-token: ${{ secrets.GITHUB_TOKEN }}
3435

3536
deploy_sphinx_docs:
3637
name: Deploy Sphinx Docs

docs/source/conf.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
import os
1010
import sys
11-
1211
from importlib.metadata import version as get_version
1312

1413
# Used when building API docs, put the dependencies
@@ -93,6 +92,11 @@
9392
html_theme = "pydata_sphinx_theme"
9493
html_title = "poseinterface"
9594

95+
# Remove the primary (left) sidebar for specific pages
96+
html_sidebars = {
97+
"project_structure": [],
98+
}
99+
96100
# Customize the theme
97101
html_theme_options = {
98102
"icon_links": [
@@ -142,3 +146,21 @@
142146
# To re-enable an example, remove its pattern from this list.
143147
"ignore_pattern": r"SWC-plusmaze_to_benchmark",
144148
}
149+
150+
# -- linkcheck configuration -------------------------------------------------
151+
linkcheck_timeout = 60 # defaut is 30
152+
linkcheck_retries = 3 # default is 1
153+
154+
# The linkcheck builder will skip verifying that anchors exist when checking
155+
# these URLs (because they are generated dynamically)
156+
linkcheck_anchors_ignore_for_url = [
157+
"https://cocodataset.org/",
158+
]
159+
# A list of regular expressions that match URIs that should not be checked
160+
linkcheck_ignore = []
161+
# Add request headers for specific domains (e.g. to avoid rate-limiting)
162+
linkcheck_request_headers = {
163+
"https://github.com": {
164+
"Authorization": f"Bearer {os.environ.get('GITHUB_TOKEN', '')}",
165+
},
166+
}

docs/source/project_structure.md

Lines changed: 96 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ We mark requirements with italicised *keywords* that should be interpreted as de
88

99
## Overview
1010

11-
A benchmark dataset is organised into a `Train` and a `Test` split. Each split contains one or more **projects** (i.e. datasets contributed by different groups). Each project contains one or more **sessions**. A session centres on a single video file (the **session video**), from which **frames** (individually sampled images) and optionally **clips** (short video segments) are extracted. In the `Train` split, frames and clips are accompanied by keypoint annotations.
11+
- A benchmark dataset is organised into a `Train` and a `Test` split.
12+
- Each split contains one or more [projects](#project) (i.e. datasets contributed by different groups).
13+
- Each project contains one or more [sessions](#session).
14+
- A session centres on a single video file (the [session video](#session-video)), from which [frames](#frames) (individually sampled images) and optionally [clips](#clips) (short video segments) are extracted.
15+
- Frames and clips are accompanied by [label files](#label-format) in COCO keypoints format.
1216

1317
The current scope is limited to **single-animal pose estimation** from a **single camera view**. Support for multi-camera setups is planned for a future version.
1418

@@ -32,20 +36,24 @@ The current scope is limited to **single-animal pose estimation** from a **singl
3236
└── <ProjectName>/
3337
└── sub-<subjectID>_ses-<sessionID>/
3438
├── Frames/
35-
│ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
39+
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png
40+
│ └── ...
3641
├── Clips/ (optional)
37-
│ └── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
42+
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4
43+
│ ├── sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json
44+
│ └── ...
3845
└── sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4
3946
```
4047

4148
:::{note}
42-
The `Test` split follows the same structure as `Train`, but label files (`framelabels.json` and `cliplabels.json`) *must* not be included so that they can be used for evaluation.
49+
The `Test` split follows the same structure as `Train`, but includes different label files (see [Label format](#label-format) for details).
4350
:::
4451

4552
### Train / Test
4653

4754
* The top level *must* contain a `Train` and a `Test` folder.
4855
* Each split *must* contain at least one project folder.
56+
* Each session *must* belong to exactly one split.
4957

5058
### Project
5159

@@ -79,24 +87,26 @@ The `Test` split follows the same structure as `Train`, but label files (`framel
7987

8088
### Frames
8189

82-
The `Frames` folder contains individually sampled images and their annotations.
90+
The `Frames` folder contains individually sampled images. In the `Train` split, it also contains a label file with keypoint annotations.
8391

8492
* Frames *must* be extracted from the session video.
85-
* Frame images *must* be in PNG format.
86-
* Frame image filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.png`.
93+
* Frame images *should* be in PNG format (`.png`). JPEG format (`.jpg` or `.jpeg`) *may* also be used.
94+
* Frame image filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_frame-<frameID>.<ext>`, where `<ext>` is `.png`, `.jpg`, or `.jpeg`.
8795
* `<frameID>` *must* be the 0-based index of the frame in the session video.
8896
* `<frameID>` *must* be padded to a consistent width across all frame files within a session (e.g. `0000`, `1000`).
89-
* In the `Train` split, a single label file *must* be provided per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`. At present, only one camera view is included, so the split contains exactly one such label file. See [Label format](#label-format) for details.
97+
* In the `Train` split, a single label file *must* be provided per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`. At present, only one camera view is included, so the split contains exactly one such label file. See [Frame labels](target-framelabels) for details.
9098

9199
### Clips
92100

93-
A session *may* include a `Clips` folder containing short video segments and their annotations.
101+
A session *may* include a `Clips` folder containing short video segments and their label files.
94102

95103
* Clips *must* be extracted from the session video and *must* have the same file format.
96104
* Clip filenames *must* follow the pattern: `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>.mp4`.
97105
* `<frameID>` in the `start` field *must* be the 0-based index of the first frame of the clip in the session video, padded to a consistent width (e.g. `0500`, `1000`).
98106
* `<nFrames>` in the `dur` field *must* be the duration of the clip in number of frames (e.g. `5`, `30`).
99-
* In the `Train` split, a single label file *must* be provided per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json`. See [Label format](#label-format) for details.
107+
* A single label file *must* be provided per clip:
108+
* In the `Train` split, the file is named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json` and contains keypoint annotations for every frame in the clip. See [Clip labels](target-cliplabels) for details.
109+
* In the `Test` split, the file is named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json` and contains keypoint annotations only for the first frame of the clip. See [Clip start labels](target-startlabels) for details.
100110

101111
## File naming
102112

@@ -107,17 +117,22 @@ All filenames follow a key-value pair convention, similar to the [BIDS standard]
107117
<key>-<value>_<key>-<value>.<extension>
108118
<key>-<value>_<key>-<value>_<suffix>.<extension>
109119
```
110-
The recognised suffixes are `framelabels` (for frame label files) and `cliplabels` (for clip label files).
120+
The recognised suffixes are:
121+
122+
* `framelabels` for [frame label files](target-framelabels).
123+
* `cliplabels` for [clip label files](target-cliplabels).
124+
* `startlabels` for [clip start label files](target-startlabels).
125+
111126
* The following keys are used:
112127

113-
| Key | Description | Examples |
114-
|---------|------------------------------------------------|-----------------|
115-
| `sub` | Subject identifier | `sub-001`, `sub-M708149` |
116-
| `ses` | Session identifier | `ses-02`, `ses-25`, `ses-20200317` |
117-
| `cam` | Camera identifier | `cam-topdown`, `cam-side2` |
118-
| `frame` | 0-based frame index in the session video | `frame-0000`, `frame-0500`, `frame-1000` |
119-
| `start` | 0-based frame index of the first frame of a clip in the session video | `start-0000`, `start-0500`, `start-1000` |
120-
| `dur` | Clip duration in number of frames | `dur-5`, `dur-30` |
128+
| Key | Description | Examples |
129+
|---------|------------------------------------------------|-----------------|
130+
| `sub` | Subject identifier | `sub-001`, `sub-M708149` |
131+
| `ses` | Session identifier | `ses-02`, `ses-25`, `ses-20200317` |
132+
| `cam` | Camera identifier | `cam-topdown`, `cam-side2` |
133+
| `frame` | 0-based frame index in the session video | `frame-0000`, `frame-0500`, `frame-1000` |
134+
| `start` | 0-based frame index of the first frame of a clip in the session video | `start-0000`, `start-0500`, `start-1000` |
135+
| `dur` | Clip duration in number of frames | `dur-5`, `dur-30` |
121136

122137
* The keys `sub`, `ses`, and `cam` *must* appear in every filename, in that order.
123138
* Key values *must* be strictly alphanumeric for `sub`, `ses` and `cam` (i.e. only `A-Z`, `a-z`, `0-9`).
@@ -126,20 +141,24 @@ All filenames follow a key-value pair convention, similar to the [BIDS standard]
126141

127142
## Label format
128143

129-
* Labels (also referred to as annotations) are only included in the `Train` split, and *must* be stored in the same folder as the corresponding frames or clips.
130-
* Annotations *must* be stored in [COCO keypoints format](https://cocodataset.org/), with some additional requirements described below. Each label file is a JSON file with `images`, `annotations`, and `categories` arrays. Image, annotation and category `id` values *must* be unique integers within a label file.
144+
* The `Train` split includes ground-truth keypoint annotations both for the sampled frames (`framelabels.json`) and for entire clips (`cliplabels.json`), if present.
145+
* The `Test` split includes keypoint annotations only for the first frame of each clip (`startlabels.json`), if clips are present. Labels for frames and entire clips are withheld to support evaluation of pose estimation and point tracking methods.
146+
* Labels *must* be stored in the same folder as the corresponding frames or clips.
147+
* Labels *must* be stored in [COCO keypoints format](https://cocodataset.org/#format-data), with additional requirements described below. Each label file is a JSON file with `images`, `annotations`, and `categories` arrays. Image, annotation and category `id` values *must* be unique integers within a label file.
131148

132149
:::{note}
133150
Annotation and category `id` values *should* be 1-indexed. This convention follows sleap-io's [`save_coco`](https://io.sleap.ai/latest/reference/sleap_io/io/coco/) function and avoids conflicts with models that treat category `0` as background.
134151

135-
Image `id` values are always 0-indexed. However, the indexing origin differs between frame and clip labels — see below for details.
152+
Image `id` values are always 0-indexed. The indexing origin differs for frame labels and clip labels, and clip start labels follow the same conventions as clip labels. Details are provided below.
136153
:::
137154

155+
(target-framelabels)=
138156
### Frame labels (`framelabels.json`)
139157

140-
* There *must* be one `framelabels.json` per camera view within the `Frames` folder.
158+
* Frame labels *must* only exist in the `Train` split.
159+
* Within the `Frames` folder, there *must* be one frame label file per camera view, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_framelabels.json`.
141160
* Each entry in the `images` array *must* have an `id` equal to the 0-based frame index in the session video (matching the `<frameID>` in the corresponding image filename).
142-
* Each entry in the `images` array *must* have a `file_name` that matches the full filename (including the `.png` extension) of an existing frame image in the `Frames` folder.
161+
* Each entry in the `images` array *must* have a `file_name` that exactly matches the name of an existing [frame image](#frames) in the `Frames` folder (including the extension).
143162

144163
:::{admonition} Example
145164
:class: tip
@@ -159,13 +178,15 @@ For a session with 5 labelled frames sampled from different parts of the video,
159178
Here each `id` is the 0-based frame index in the session video (matching the `<frameID>` in the filename), and each `file_name` includes the `.png` extension.
160179
:::
161180

181+
(target-cliplabels)=
162182
### Clip labels (`cliplabels.json`)
163183

164-
* There *must* be one `cliplabels.json` per clip.
184+
* Clip labels *must* only exist in the `Train` split.
185+
* If a `Clips` folder is present, there *must* be one clip label file per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_cliplabels.json`.
165186
* The `images` array *must* contain an entry for every frame in the clip, in consecutive, monotonically increasing order (covering the entire clip duration).
166187
* Clip labels follow the same COCO keypoints format as frame labels, but with different conventions for image `id` and `file_name` values:
167188
* Each image `id` *must* be the **0-based index of the frame within the clip** (i.e. `0`, `1`, `2`, ...), not the index in the session video.
168-
* Each `file_name` *must* follow the same pattern as frame image filenames, but **without the `.png` extension**. The `frame` field in the `file_name` *must* hold the index of that frame in the **session video**.
189+
* Each `file_name` *must* follow the same pattern as [frame image filenames](#frames), but **without the extension**. The `frame` field in the `file_name` *must* correspond to the index of that frame in the **session video**.
169190

170191
This means that each entry in the `images` array encodes two pieces of information: the `id` gives the local position within the clip, while the `frame` field in `file_name` gives the global position in the session video. Note that in both cases the indices are 0-based.
171192

@@ -187,6 +208,26 @@ For a clip starting at frame 1000 with a duration of 5 frames, the `images` arra
187208
Here `id: 0` through `id: 4` are the local clip indices, while `frame-1000` through `frame-1004` in the `file_name` values refer to the original frame positions in the session video.
188209
:::
189210

211+
(target-startlabels)=
212+
### Clip start labels (`startlabels.json`)
213+
214+
* Clip start labels *must* only exist in the `Test` split.
215+
* If a `Clips` folder is present, there *must* be one clip start label file per clip, named `sub-<subjectID>_ses-<sessionID>_cam-<camID>_start-<frameID>_dur-<nFrames>_startlabels.json`.
216+
* Clip start labels provide keypoint annotations for the **first frame of the clip only**. They are intended for point-tracker evaluation, where the annotated points serve as the initial positions from which a tracker should propagate.
217+
* Clip start labels are identical to [Clip labels](target-cliplabels), except that the `images` array *must* contain exactly one entry corresponding to the first frame of the clip, and therefore must have `id: 0`.
218+
219+
:::{admonition} Example
220+
:class: tip
221+
222+
For a clip starting at frame 1000 with a duration of 5 frames, the `images` array would be:
223+
224+
```json
225+
[
226+
{"id": 0, "file_name": "sub-M708149_ses-20200317_cam-topdown_frame-1000", "width": 1300, "height": 1028}
227+
]
228+
```
229+
:::
230+
190231
### Visibility encoding
191232

192233
* Keypoint visibility *must* use ternary encoding:
@@ -196,21 +237,35 @@ Here `id: 0` through `id: 4` are the local clip indices, while `frame-1000` thro
196237

197238
## Example
198239

199-
Below is a concrete example project structure (only the `Train` split is shown):
240+
Below is a concrete example project structure:
200241

201242
```
202-
Train/
203-
└── SWC-plusmaze/
204-
└── sub-M708149_ses-20200317/
205-
├── Frames/
206-
│ ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png
207-
│ ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png
208-
│ ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png
209-
│ ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png
210-
│ ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png
211-
│ └── sub-M708149_ses-20200317_cam-topdown_framelabels.json
212-
├── Clips/
213-
│ ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4
214-
│ └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json
215-
└── sub-M708149_ses-20200317_cam-topdown.mp4
243+
.
244+
├── Train/
245+
│ └── SWC-plusmaze/
246+
│ └── sub-M708149_ses-20200317/
247+
│ ├── Frames/
248+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-01000.png
249+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-02300.png
250+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-03500.png
251+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-07200.png
252+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_frame-19800.png
253+
│ │ └── sub-M708149_ses-20200317_cam-topdown_framelabels.json
254+
│ ├── Clips/
255+
│ │ ├── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5.mp4
256+
│ │ └── sub-M708149_ses-20200317_cam-topdown_start-1000_dur-5_cliplabels.json
257+
│ └── sub-M708149_ses-20200317_cam-topdown.mp4
258+
└── Test/
259+
└── SWC-plusmaze/
260+
└── sub-M235678_ses-20210415/
261+
├── Frames/
262+
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-00500.png
263+
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-01200.png
264+
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-04800.png
265+
│ ├── sub-M235678_ses-20210415_cam-topdown_frame-09100.png
266+
│ └── sub-M235678_ses-20210415_cam-topdown_frame-15300.png
267+
├── Clips/
268+
│ ├── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5.mp4
269+
│ └── sub-M235678_ses-20210415_cam-topdown_start-0500_dur-5_startlabels.json
270+
└── sub-M235678_ses-20210415_cam-topdown.mp4
216271
```

0 commit comments

Comments
 (0)