-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Anemoi-Metadata
Unification of metadata standards and interfaces to a lightweight self contained package
Proposal
This issue seeks to propose and seek a discussion on the creation of a new package for anemoi, one to contain the metadata interfaces and standards. The first goal would be to encapsulate the expectation of "what is a checkpoint", providing the necessary interfaces and abstractions to allow training to write, and inference to read. A secondary goal would be to migrate the Variable class from transforms to make this package a more holisitic representation of those objects throughout anemoi which need a unified representation.
Structure
A rough directory is shown below, showcasing how the raw schema will be hidden behind a top level interface with clear versioning.
src/anemoi/metadata/
├── __init__.py # Public API
│
│ # Layer 1: Raw Schema
├── base.py # BaseMetadata ABC
├── registry.py # Version registry (semver)
├── versions/
│ └── v1.py # V1 schema (from existing spec)
├── migrations/ # Migration functions
│ └── v1_to_v2.py # (when v2 exists)
│
│ # Layer 2: Abstraction
├── interface.py # Metadata class (user API)
│
│ # Infrastructure
├── checkpoint.py # I/O (moved from anemoi-utils)
├── migration.py # MetadataMigrator (sequential chaining)
└── commands/
└── inspect.py
Reasoning
While the current interface in inference works, there is no shared contract between training and inference, which causes issues with major updates to the model, and associated configuration. Additionally, other tools looking to use / inspect a checkpoint rely upon inference which is not a light package and pulls in torch among many others. This causes a burden on what could be thin web interfaces or other tools, where torch is not needed.
Questions
The following questions remain to be answered about this package,
- Exact scope of included tools
- Location, inside
core, or as seperate repo?
Metadata
Metadata
Assignees
Type
Projects
Status