Skip to content

Support loading and saving keypoint annotations #112

@sfmig

Description

@sfmig

Context / mental model
In the development of ethology we roughly consider 3 types of data: annotations, detections and tracks. They all refer to types of bounding boxes, keypoints or segmentation masks. Currently ethology focuses on bounding boxes, so in the following paragraphs focus mostly in them. However the definitions apply equally to keypoints or segmentation masks.

  • Annotations are labels drawn manually by a human. They are usually considered ground truth. They refer to images, and are often defined on non-consecutive frames of a video (but not necessarily, we may have annotations for consecutive frames too). Annotations may have an "identity" associated with them or not. If they do, annotations with the same identity across images refer to the same individual. In ethology we consider annotations have no identities associated with them (at least for now).

  • Detections are predictions generated by a trained computer vision model on images. As such, they have a confidence value associated with them (unlike annotations). They can refer to consecutive or non-consecutive frames of a video. In ethology we consider detections have no identities associated with them (at least for now).

  • Tracks are detections that refer to consecutive frames in a video, with identities associated to them. This is the type of data that movement deals with (see here)

Current status
Currently ethology supports loading and saving bounding box annotations datasets (see this example for a realistic use case). An ethology bounding box annotations dataset is defined as follows:

  • it has image_id, space, id as dimensions
  • it has position and shape as data variables / arrays.

This definition is captured in the ethology.io.annotations.validate.ValidBboxesDataset class. An annotations dataset may have more dimensions or arrays, but these are the minimum requirements to consider it a "bounding box annotations dataset". The dimensions in the annotations dataset roughly correspond to the time, space and individual dimensions in a movement dataset

As mentioned, ethology currently considers that annotations have no identities associated with them. The id dimension in the annotations dataset stores an ID for each annotation in an image, but this is not consistent across frames. The id dimension ranges from 0 to the maximum number of annotations per image in the full dataset.

We would like to support loading and saving keypoint annotations into ethology.

Describe the solution you'd like
One way to support keypoint annotations in ethology could be as follows:

  1. Define a ValidKeypointsDataset class
    We define an ethology keypoint annotations dataset to represent the data. Following the bounding boxes case, it could be defined as having:

    • image_id, space, keypoint and id as required dimensions;
    • position as required data variable / array.

    This definition could be captured in a ValidKeypointsDataset class in ethology/io/annotations/validate.py (which would closely follow ValidBboxesDataset).

  2. Define loaders and exporters
    We would maybe then add two modules under ethology.io.annotations called load_keypoints.py and save_keypoints.py (maybe kpts?).

    In load_keypoints.py we would have functionality to read keypoint annotations files (say .slp) as an ethology keypoint annotations dataset. It is probably a good idea to use sleap-io under the hood, since they support a large variety of files and it is a well maintained and nicely written repo. We can use their load_file function to read a variety of keypoint files as Labels objects. Then we can implement the transform from this Labels object into an ethology annotations dataset.

    Similarly with save_keypoints.py, we can use sleap-io's Labels object as an intermediate representation to then use their save_file function and export

Describe alternatives you've considered
Suggestions are more than welcome.

Additional context
It would be nice to also include an example of usage for the gallery. Maybe a workflow of loading a keypoint annotations file, and doing some sanity checks to see if there are erroneous labels (e.g., expressing all keypoints in an egocentric coordinate system, to try to quickly identify outliers - see here for example)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions