-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Context / mental model
In the development of ethology we roughly consider 3 types of data: annotations, detections and tracks. They all refer to types of bounding boxes, keypoints or segmentation masks. Currently ethology focuses on bounding boxes, so in the following paragraphs focus mostly in them. However the definitions apply equally to keypoints or segmentation masks.
-
Annotations are labels drawn manually by a human. They are usually considered ground truth. They refer to images, and are often defined on non-consecutive frames of a video (but not necessarily, we may have annotations for consecutive frames too). Annotations may have an "identity" associated with them or not. If they do, annotations with the same identity across images refer to the same individual. In
ethologywe consider annotations have no identities associated with them (at least for now). -
Detections are predictions generated by a trained computer vision model on images. As such, they have a confidence value associated with them (unlike annotations). They can refer to consecutive or non-consecutive frames of a video. In
ethologywe consider detections have no identities associated with them (at least for now). -
Tracks are detections that refer to consecutive frames in a video, with identities associated to them. This is the type of data that
movementdeals with (see here)
Current status
Currently ethology supports loading and saving bounding box annotations datasets (see this example for a realistic use case). An ethology bounding box annotations dataset is defined as follows:
- it has
image_id,space,idas dimensions - it has
positionandshapeas data variables / arrays.
This definition is captured in the ethology.io.annotations.validate.ValidBboxesDataset class. An annotations dataset may have more dimensions or arrays, but these are the minimum requirements to consider it a "bounding box annotations dataset". The dimensions in the annotations dataset roughly correspond to the time, space and individual dimensions in a movement dataset
As mentioned, ethology currently considers that annotations have no identities associated with them. The id dimension in the annotations dataset stores an ID for each annotation in an image, but this is not consistent across frames. The id dimension ranges from 0 to the maximum number of annotations per image in the full dataset.
We would like to support loading and saving keypoint annotations into ethology.
Describe the solution you'd like
One way to support keypoint annotations in ethology could be as follows:
-
Define a
ValidKeypointsDatasetclass
We define anethologykeypoint annotations dataset to represent the data. Following the bounding boxes case, it could be defined as having:image_id,space,keypointandidas required dimensions;positionas required data variable / array.
This definition could be captured in a
ValidKeypointsDatasetclass inethology/io/annotations/validate.py(which would closely followValidBboxesDataset). -
Define loaders and exporters
We would maybe then add two modules underethology.io.annotationscalledload_keypoints.pyandsave_keypoints.py(maybekpts?).In
load_keypoints.pywe would have functionality to read keypoint annotations files (say .slp) as anethologykeypoint annotations dataset. It is probably a good idea to usesleap-iounder the hood, since they support a large variety of files and it is a well maintained and nicely written repo. We can use theirload_filefunction to read a variety of keypoint files asLabelsobjects. Then we can implement the transform from thisLabelsobject into anethologyannotations dataset.Similarly with
save_keypoints.py, we can usesleap-io'sLabelsobject as an intermediate representation to then use theirsave_filefunction and export
Describe alternatives you've considered
Suggestions are more than welcome.
Additional context
It would be nice to also include an example of usage for the gallery. Maybe a workflow of loading a keypoint annotations file, and doing some sanity checks to see if there are erroneous labels (e.g., expressing all keypoints in an egocentric coordinate system, to try to quickly identify outliers - see here for example)