Skip to content

Commit 2bca533

Browse files
authored
Merge pull request #91 from stac-extensions/ml-model-migration
add ML-Model migration guide
2 parents b887344 + f265253 commit 2bca533

File tree

4 files changed

+138
-5
lines changed

4 files changed

+138
-5
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- Add [ML-Model Legacy](./docs/legacy/ml-model.md) document providing migration guidance
13+
from the deprecated [ML-Model](https://github.com/stac-extensions/ml-model) extension
14+
(relates to [stac-extensions/ml-model#16](https://github.com/stac-extensions/ml-model/pull/16)).
15+
- Move [DLM Legacy](./docs/legacy/dlm.md) document.
1216
- Add `embedding` as suggested dimension name
1317
(relates to [#77](https://github.com/stac-extensions/mlm/discussions/77)).
1418
- Add [`huggingface/safetensors`](https://github.com/huggingface/safetensors)

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,11 +82,12 @@ reusability and avoid metadata duplication whenever possible. A properly defined
8282
never have the Machine Learning Model Extension exclusively in `stac_extensions`.
8383

8484
For details about the earlier (legacy) version of the MLM Extension, formerly known as
85-
the *Deep Learning Model Extension* (DLM), please refer to the [DLM LEGACY](README_DLM_LEGACY.md) document.
85+
the *Deep Learning Model Extension* (DLM), please refer to the [DLM LEGACY](./docs/legacy/dlm.md) document.
8686
DLM was renamed to the current MLM Extension and refactored to form a cohesive definition across all machine
8787
learning approaches, regardless of whether the approach constitutes a deep neural network or other statistical approach.
8888
It also combines multiple definitions from the predecessor [ML-Model](https://github.com/stac-extensions/ml-model)
89-
extension to synthesize common use cases into a single reference for Machine Learning Models.
89+
extension to synthesize common use cases into a single reference for "*Machine Learning Models*". For migration
90+
details from `ml-model` to `mlm`, please refer to the [ML-Model Legacy](./docs/legacy/ml-model.md) document.
9091

9192
For more details about the [`stac-model`](./stac_model) Python package, which provides definitions of the MLM extension
9293
using both [`Pydantic`](https://docs.pydantic.dev/latest/) and [`PySTAC`](https://pystac.readthedocs.io/en/stable/)
@@ -116,7 +117,7 @@ connectors, please refer to the [STAC Model](./README_STAC_MODEL.md) document.
116117
version [`1.3.0`](https://github.com/stac-extensions/mlm/blob/main/CHANGELOG.md#v130) of the extension.
117118
- [SigSpatial 2024 GeoSearch Workshop presentation](/docs/static/sigspatial_2024_mlm.pdf)
118119
- **Tools**:
119-
- [MLM Form Filler](https://mlm-form.vercel.app/) a two page app to fill out and validate MLM STAC Item metadata. <br>
120+
- [MLM Form Filler](https://mlm-form.vercel.app/) a two-page app to fill out and validate MLM STAC Item metadata. <br>
120121
Check out the [wherobots/mlm-form](https://github.com/wherobots/mlm-form) repository if you have questions, issues,
121122
or want to contribute.
122123

README_DLM_LEGACY.md renamed to docs/legacy/dlm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
<!-- lint disable no-undefined-references -->
44

5-
> [!NOTE]
6-
> This is legacy documentation references of [Deep Learning Model extension](https://github.com/crim-ca/dlm-extension)
5+
> [!WARNING]
6+
> This is legacy documentation reference of [Deep Learning Model extension](https://github.com/crim-ca/dlm-extension)
77
> preceding the current Machine Learning Model (MLM) extension.
88
99
<!-- lint enable no-undefined-references -->

docs/legacy/ml-model.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# ML Model Extension Specification
2+
3+
<!-- lint disable no-undefined-references -->
4+
5+
> [!WARNING]
6+
> This is legacy documentation reference of [ML-Model][ml-model]
7+
> preceding the current Machine Learning Model ([MLM][mlm-spec]) extension.
8+
9+
<!-- lint enable no-undefined-references -->
10+
11+
## Notable Differences
12+
13+
- The [MLM][mlm-spec] extension covers more details at both the [Item](#item-properties) and [Asset](#asset-objects)
14+
levels, making it easier to describe and use model metadata.
15+
16+
- The [MLM][mlm-spec] extension covers runtime requirements using distinct [Asset Roles](#roles)
17+
([Model][mlm-asset-model], [Container][mlm-asset-container] and [Source Code][mlm-asset-code]) which allows
18+
for more flexibility in describing how and which operations are performed by a given model.
19+
This is in contrast to the [ML-Model][ml-model] extension that records [similar information][ml-model-runtimes]
20+
in `ml-model:inference-runtime` or `ml-model:training-runtime` __*all at once*__, which leads to runtime ambiguities
21+
and limited reusability.
22+
23+
- The [MLM][mlm-spec] extension provides additional fields to better describe the model properties, such as
24+
the [Model Inputs][mlm-inputs] to describe the input features, bands, data transforms, or any
25+
other relevant data sources and preparation steps required by the model, the [Model Outputs][mlm-outputs] to describe
26+
the output predictions, regression values, classes or other relevant information about what the model produces, and
27+
the [Model Hyperparameters][mlm-hyperparam] to better describe training configuration
28+
that lead to the model definition. All of these fields are __*undefined*__ in the [ML-Model][ml-model] extension.
29+
30+
- The [MLM][mlm-spec] extension has a corresponding Python library [`stac-model`][mlm-stac-model],
31+
which can be used to create and validate MLM metadata using [pydantic][pydantic].
32+
An example of the library in action is [provided in examples](./../../stac_model/examples.py).
33+
The extension also provides [pystac MLM][pystac-mlm] for easier integration with the STAC ecosystem.
34+
The [MLM Form Filler][mlm-form] is also available to help users create valid MLM metadata in a no-code fashion.
35+
In contrast, [ML-Model][ml-model] extension does not provide any support for Python integration and requires the JSON
36+
to be written manually.
37+
38+
## Migration Tables
39+
40+
Following are the corresponding fields between the legacy [ML-Model][ml-model] and the current [MLM][mlm-spec]
41+
extension, which can be used to completely migrate to the newer *Machine Leaning Model* extension providing
42+
enhanced features and interconnectivity with other STAC extensions (see also [Best Practices][mlm-bp]).
43+
44+
<!-- lint disable no-undefined-references -->
45+
46+
> [!IMPORTANT]
47+
> Only the limited set of [`ml-model`][ml-model] fields are listed below for migration guidelines.
48+
> See the full [MLM Specification](./../../README.md) for all additional fields provided to further describe models.
49+
50+
<!-- lint enable no-undefined-references -->
51+
52+
### Item Properties
53+
54+
| ML-Model Field | MLM Field | Migration Details |
55+
|----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
56+
| `ml-model:type` <br> (`"ml-model"` constant) | *n/a* | Including the MLM URI in `stac_extensions` is sufficient to indicate that the Item is a model. |
57+
| `ml-model:learning_approach` | *n/a* | No direct mapping. Machine Learning training approaches can be very convoluted to describe. Instead, it is recommended to employ `derived_from` collection and other STAC Extension references to describe explicitly how the model was obtained. See [Best Practices][mlm-bp] for more details. |
58+
| `ml-model:prediction_type` <br> (`string`) | `mlm:tasks` <br> (`[string]`) | ML-Model limited to a single task. MLM allows multiple. Use `["<original-mlm-task>"]` to migrate directly. |
59+
| `ml-model:architecture` | `mlm:architecture` | Direct mapping. |
60+
| `ml-model:training-processor-type` <br> `ml-model:training-os` | `mlm:framework` <br> `mlm:framework_version` <br> `mlm:accelerator` <br> `mlm:accelerator_constrained` <br> `mlm:accelerator_summary` <br> `mlm:accelerator_count` | More fields are provided to describe the subtleties of compute hardware and ML frameworks that can be intricated between them. If compute hardware imposes OS dependencies, they are typically reflected through the framework version and/or the specific accelerator. Further subtleties are permitted with [complex accelerator values][mlm-acc-type]. |
61+
62+
### Asset Objects
63+
64+
#### Roles
65+
66+
All [ML-Model Asset Roles](https://github.com/stac-extensions/ml-model/blob/main/README.md#roles)
67+
are available with a prefix change with the same sematic meaning.
68+
69+
Further roles are also proposed in [MLM Asset Roles](./../../README.md#mlm-asset-roles).
70+
71+
| ML-Model Field | MLM Field | Migration Details |
72+
|------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
73+
| `ml-model:inference-runtime` | `mlm:inference-runtime` | Prefix change. |
74+
| `ml-model:training-runtime` | `mlm:training-runtime` | Prefix change. |
75+
| `ml-model:checkpoint` | `mlm:checkpoint` | Prefix change. Recommended addition of further `mlm` properties for [Model Asset](./../../README.md#model-asset) to describe the artifact. |
76+
77+
<!-- lint disable no-undefined-references -->
78+
79+
> [!TIP]
80+
> In the context of [ML-Model][ml-model], Assets providing [Inference/Training Runtimes][ml-model-runtimes]
81+
> are strictly provided as [Docker Compose][docker-compose-file] definitions. While this is still permitted,
82+
> the MLM extension offers alternatives using any relevant definition for the model, as long as it is properly
83+
> identified by its applicable media-type. Additional recommendations and Asset property fields are provided
84+
> under [MLM Assets Objects](./../../README.md#assets-objects) for specific cases.
85+
86+
<!-- lint enable no-undefined-references -->
87+
88+
### Relation Types
89+
90+
| ML-Model Field | MLM Field | Migration Details |
91+
|-------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
92+
| `ml-model:inferencing-image` | *n/a* | Deemed redundant with `mlm:inference-runtime` Asset Role. |
93+
| `ml-model:training-image` | *n/a* | Deemed redundant with `mlm:training-runtime` Asset Role. |
94+
| `ml-model:train-data` <br> `ml-model:test-data` | `derived_from` | Use one or more `derived_from` links (as many as needed with regard to data involved during the model creation. Linked data should employ `ml-aoi` as appropriate (see [ML-AOI Best Practices][mlm-ml-aoi]). |
95+
96+
[mlm-acc-type]: ./../../README.md#accelerator-type-enum
97+
98+
[mlm-asset-model]: ./../../README.md#model-asset
99+
100+
[mlm-asset-container]: ./../../README.md#container-asset
101+
102+
[mlm-asset-code]: ./../../README.md#source-code-asset
103+
104+
[mlm-inputs]: ./../../README.md#model-input-object
105+
106+
[mlm-outputs]: ./../../README.md#model-output-object
107+
108+
[mlm-hyperparam]: ./../../README.md#model-hyperparameters-object
109+
110+
[mlm-stac-model]: https://pypi.org/project/stac-model/
111+
112+
[mlm-form]: https://mlm-form.vercel.app/
113+
114+
[mlm-spec]: ./../../README.md
115+
116+
[mlm-bp]: ./../../best-practices.md
117+
118+
[mlm-ml-aoi]: ./../../best-practices.md#ml-aoi-and-label-extensions
119+
120+
[ml-model]: https://github.com/stac-extensions/ml-model
121+
122+
[ml-model-runtimes]: https://github.com/stac-extensions/ml-model/blob/main/README.md#inferencetraining-runtimes
123+
124+
[pydantic]: https://docs.pydantic.dev/latest/
125+
126+
[pystac-mlm]: https://github.com/stac-utils/pystac/blob/main/pystac/extensions/mlm.py
127+
128+
[docker-compose-file]: https://github.com/compose-spec/compose-spec/blob/master/spec.md#compose-file

0 commit comments

Comments
 (0)