Skip to content

Introduction of anemoi-metadata? #99

@HCookie

Description

@HCookie

Anemoi-Metadata

Unification of metadata standards and interfaces to a lightweight self contained package

Proposal

This issue seeks to propose and seek a discussion on the creation of a new package for anemoi, one to contain the metadata interfaces and standards. The first goal would be to encapsulate the expectation of "what is a checkpoint", providing the necessary interfaces and abstractions to allow training to write, and inference to read. A secondary goal would be to migrate the Variable class from transforms to make this package a more holisitic representation of those objects throughout anemoi which need a unified representation.

Structure

A rough directory is shown below, showcasing how the raw schema will be hidden behind a top level interface with clear versioning.

src/anemoi/metadata/
├── __init__.py          # Public API
│
│  # Layer 1: Raw Schema
├── base.py              # BaseMetadata ABC
├── registry.py          # Version registry (semver)
├── versions/
│   └── v1.py            # V1 schema (from existing spec)
├── migrations/          # Migration functions
│   └── v1_to_v2.py      # (when v2 exists)
│
│  # Layer 2: Abstraction
├── interface.py         # Metadata class (user API)
│
│  # Infrastructure
├── checkpoint.py        # I/O (moved from anemoi-utils)
├── migration.py         # MetadataMigrator (sequential chaining)
└── commands/
    └── inspect.py

Reasoning

While the current interface in inference works, there is no shared contract between training and inference, which causes issues with major updates to the model, and associated configuration. Additionally, other tools looking to use / inspect a checkpoint rely upon inference which is not a light package and pulls in torch among many others. This causes a burden on what could be thin web interfaces or other tools, where torch is not needed.

Questions

The following questions remain to be answered about this package,

  • Exact scope of included tools
  • Location, inside core, or as seperate repo?

Metadata

Metadata

Type

Projects

Status

Reviewers needed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions