Skip to content

Commit 50389c0

Browse files
Peter Van Bouwelpvbouwel
authored andcommitted
feature: artifact-helper client
Allow uploading artifacts using: ``` from openeo.extra.artifacts import build_artifact_helper artifact_helper = build_artifact_helper(connection) storage_uri = artifact_helper.upload_file(path, object_name) presigned_uri = artifact_helper.get_presigned_url(storage_uri) ```
1 parent d86b4e7 commit 50389c0

23 files changed

+1588
-2
lines changed

docs/api-artifacts.rst

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
.. _api-openeo-extra-artifacts:
2+
3+
====================================
4+
API: openeo.extra.artifacts
5+
====================================
6+
7+
.. warning::
8+
This is a new experimental API, subject to change.
9+
10+
11+
.. important::
12+
The artifacts functionality relies on extra Python packages. They can be installed using:
13+
14+
.. code-block:: shell
15+
16+
pip install "openeo[artifacts]" --upgrade
17+
18+
19+
When running openEO jobs it is not uncommon to require artifacts that should be accessible during job execution. This
20+
requires the artifacts to be accessible from within the openEO processing environment. :py:mod:`openeo.extra.artifacts` tries
21+
to perform the heavy lifting for this use case by allowing staging artifacts to a secure but temporary location using 3
22+
simple steps:
23+
24+
1. Connect to your openEO backend
25+
2. Create an artifact helper from your openEO connection
26+
3. Upload your file using the artifact helper and optionally get a presigned URI
27+
28+
So in code this looks like:
29+
30+
.. code-block:: python
31+
32+
import openeo
33+
from openeo.extra.artifacts import build_artifact_helper
34+
35+
connection = openeo.connect("my-openeo.prod.example").authenticate_oidc()
36+
37+
artifact_helper = build_artifact_helper(connection)
38+
storage_uri = artifact_helper.upload_file(path, object_name)
39+
presigned_uri = artifact_helper.get_presigned_url(storage_uri)
40+
41+
Note:
42+
43+
* The presigned_uri should be used for accessing the objects. It has authentication details embedded so if your data is
44+
sensitive you must make sure to keep this URL secret. You can lower expires_in_seconds in
45+
:py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.get_presigned_url`
46+
to limit the time window in which the URI can be used.
47+
48+
* The openEO backend must expose additional metadata in its capabilities doc to make this possible. Implementers of a
49+
backend can check the extra documentation :ref:`advertising-capabilities`.
50+
51+
52+
User facing API
53+
===============
54+
55+
56+
.. autofunction:: openeo.extra.artifacts.build_artifact_helper
57+
58+
59+
.. autoclass:: openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC
60+
:members: upload_file, get_presigned_url
61+
:no-index:
62+
63+
64+
How does it work ?
65+
==================
66+
67+
1) :py:meth:`openeo.extra.artifacts.build_artifact_helper` is a factory method that
68+
will create an artifact helper where the type is defined by the config type. The openEO connection object is used to
69+
see if the openEO backend advertises a preferred config.
70+
2) :py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.upload_file` and
71+
:py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.get_presigned_url` do the heavy lifting to
72+
store your artifact in provider managed storage and to return references that can be used. In case the backend uses
73+
an Object storage that has an S3 API it will:
74+
75+
1. Get temporary S3 credentials based on config advertised by the backend and the session from your connection
76+
2. Upload the file into object storage and return an S3 URI which the backend can resolve
77+
3. Optional the :py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.get_presigned_url` makes a
78+
URI signed with the temporary credentials such that it works standalone (Some tools and execution steps do not
79+
support handling of internal references. presigned URLs should work in any tool).
80+
81+
82+
Documentation for backend providers
83+
===================================
84+
85+
This section and its subsection is for engineers who operate an openEO backend. If you are a user of an openEO platform
86+
this is unlikely to be of value to you.
87+
88+
.. _advertising-capabilities:
89+
90+
Advertising capabilities from the backend
91+
-----------------------------------------
92+
93+
It is expected that the backend advertises in its capabilities a section on artifacts. The following is an example
94+
for the S3STSConfig (of the :py:mod:`openeo.extra.artifacts._s3sts` package).
95+
96+
.. code-block:: json
97+
98+
{
99+
// ...
100+
"artifacts": {
101+
"providers": [
102+
{
103+
// This id is a logical name
104+
"id": "s3",
105+
// The config type of the ArtifactHelper
106+
"type": "S3STSConfig"
107+
// The config block its keys can differ for other config types
108+
"config": {
109+
// The bucket where the artifacts will be stored
110+
"bucket": "openeo-artifacts",
111+
// The role that will be assumed via STS
112+
"role": "arn:aws:iam::000000000000:role/S3Access",
113+
// Where S3 API calls are sent
114+
"s3_endpoint": "https://my.s3.test",
115+
// Where STS API calls are sent
116+
"sts_endpoint": "https://my.sts.test"
117+
},
118+
}
119+
]
120+
},
121+
// ...
122+
}
123+
124+
125+
Extending support for other types of artifacts
126+
----------------------------------------------
127+
128+
.. warning::
129+
This is a section for developers of the `openeo-python-client` Python package. If you want to walk this road it is
130+
best to create an issue on github and detail what support you are planning to add to get input on feasibility and
131+
whether it will be mergeable early on.
132+
133+
Ideally the user-interface is simple and stable. Unfortunately implementations themselves come with more complexity.
134+
This section explains what is needed to provide support for additional types of artifacts. Below the steps we show
135+
the API that is involved.
136+
137+
1. Create another internal package for the implementation. The following steps should be done inside that package.
138+
This package resides under :py:mod:`openeo.extra.artifacts`
139+
2. Create a config implementation which extends :py:class:`openeo.extra.artifacts._config.ArtifactsStorageConfigABC`
140+
and should be a frozen dataclass. This class implements the logic to determine the configuration used by the
141+
implementation `_load_connection_provided_config(self, provider_config: ProviderConfig) -> None` is used for that.
142+
143+
When this method is called explicit config is already put in place and if not provided default config is put in
144+
place.
145+
Because frozen dataclasses are used for config `object.__setattr__(self, ...)` must be used to manipulate the
146+
values.
147+
148+
So per attribute the same pattern is used. For example an attribute `foo` which has a default `bar` that can be kept
149+
constant would be:
150+
151+
.. code-block:: python
152+
153+
if self.foo is None:
154+
try:
155+
object.__setattr__(self, "foo", provider_config["foo"])
156+
except NoDefaultConfig:
157+
object.__setattr__(self, "foo", "bar")
158+
159+
Here we use :py:exc:`openeo.extra.artifacts.exceptions.NoDefaultConfig`
160+
161+
3. Create an implementation of :py:class:`openeo.extra.artifacts._uri.StorageURI` to model the internal URIs to the
162+
stored artifact
163+
4. Create an ArtifactHelper implementation which extends :py:class:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC`
164+
5. Add a key value pair to the :py:obj:`openeo.extra.artifacts.artifact_helper.config_to_helper` dictionary. The key is
165+
the class created in 2 and the value is the class created in step 3
166+
167+
.. autoclass:: openeo.extra.artifacts._config.ArtifactsStorageConfigABC
168+
:members:
169+
:private-members: _load_connection_provided_config
170+
171+
.. autoclass:: openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC
172+
:members:
173+
:private-members: _get_default_storage_config, _from_openeo_connection
174+
175+
.. autoclass:: openeo.extra.artifacts._uri.StorageURI
176+
:members:
177+
178+
179+
Artifacts exceptions
180+
--------------------
181+
182+
When using artifacts your interactions can result in the following exceptions.
183+
184+
.. autoexception:: openeo.extra.artifacts.exceptions.ArtifactsException
185+
:members:
186+
187+
.. autoexception:: openeo.extra.artifacts.exceptions.NoAdvertisedProviders
188+
:members:
189+
190+
.. autoexception:: openeo.extra.artifacts.exceptions.UnsupportedArtifactsType
191+
:members:
192+
193+
.. autoexception:: openeo.extra.artifacts.exceptions.NoDefaultConfig
194+
:members:
195+
196+
.. autoexception:: openeo.extra.artifacts.exceptions.InvalidProviderConfig
197+
:members:
198+
199+
.. autoexception:: openeo.extra.artifacts.exceptions.ProviderSpecificException
200+
:members:

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ Table of contents
6161
cookbook/index
6262
api
6363
api-processes
64+
api-artifacts
6465
process_mapping
6566
development
6667
best_practices

openeo/extra/artifacts/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from openeo.extra.artifacts.artifact_helper import build_artifact_helper
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
from __future__ import annotations
2+
3+
from abc import ABC, abstractmethod
4+
from pathlib import Path
5+
from typing import Optional
6+
7+
from openeo.extra.artifacts._backend import ProviderConfig
8+
from openeo.extra.artifacts._config import ArtifactsStorageConfigABC
9+
from openeo.extra.artifacts._uri import StorageURI
10+
from openeo.rest.connection import Connection
11+
12+
13+
class ArtifactHelperABC(ABC):
14+
"""
15+
This class defines the *interface* that an artifact helper should implement and support. This is used by OpenEO users
16+
willing to manage artifacts.
17+
18+
Instances that implement it get created by the `openeo.extra.artifacts.build_artifact_helper` factory
19+
"""
20+
21+
@classmethod
22+
def from_openeo_connection(
23+
cls,
24+
connection: Connection,
25+
provider_config: ProviderConfig,
26+
*,
27+
config: Optional[ArtifactsStorageConfigABC] = None,
28+
) -> ArtifactHelperABC:
29+
"""
30+
Create a new Artifact helper from the OpenEO connection. This is the starting point to upload artifacts.
31+
Each implementation has its own builder
32+
"""
33+
if config is None:
34+
config = cls._get_default_storage_config()
35+
config.load_connection_provided_config(provider_config)
36+
return cls._from_openeo_connection(connection, config)
37+
38+
@abstractmethod
39+
def upload_file(self, path: str | Path, object_name: str = "") -> StorageURI:
40+
"""
41+
A method to store an artifact remotely and get a StorageURI which points to the stored data.
42+
43+
:param path: Location of the file to be uploaded absolute path or relative to current
44+
working directory.
45+
:param object_name: Optional name you want to give to the object. If not specified the filename will be
46+
used.
47+
48+
:return: If you want to use the StorageURI in a processgraph convert it using Python's built-in `str()`
49+
function which is understood by the OpenEO processor.
50+
"""
51+
52+
@abstractmethod
53+
def get_presigned_url(self, storage_uri: StorageURI, expires_in_seconds: int = 7 * 3600 * 24) -> str:
54+
"""
55+
A method to get a signed https URL for a given StorageURI which can be accessed via normal http libraries.
56+
57+
These URIs should be kept secret as they provide access to the data.
58+
59+
:param storage_uri: URI to the artifact that is stored by a previous `upload_file` call
60+
:param expires_in_seconds: Optional how long expressed in seconds before the returned signed URL becomes invalid
61+
62+
:return: The signed https URI.
63+
64+
"""
65+
66+
def __init__(self, config: ArtifactsStorageConfigABC):
67+
if not config.is_openeo_connection_metadata_loaded():
68+
raise RuntimeError("config should have openeo connection metadata loaded prior to initialization.")
69+
self._config = config
70+
71+
@classmethod
72+
@abstractmethod
73+
def _get_default_storage_config(cls) -> ArtifactsStorageConfigABC:
74+
"""
75+
A method that provides a default storage config for the Artifact Helper. It will return a class that
76+
extends `ArtifactsStorageConfigABC` and just provides default values which are defined in code no fancy
77+
resolvement from the backend yet. The config does not need to be usable by itself yet.
78+
79+
If a config value can be advertised by the backend it should be initialized to a sentinel value and the actual
80+
value should be put in place if not advertised by the backend which happens in an implementation of
81+
:func:`~openeo.extra.artifacts._artifact_helper_abc.ArtifactsStorageConfigABC._load_connection_provided_config`
82+
"""
83+
84+
@classmethod
85+
@abstractmethod
86+
def _from_openeo_connection(cls, connection: Connection, config: ArtifactsStorageConfigABC) -> ArtifactHelperABC:
87+
"""
88+
The implementation that creates an artifact helper. This method takes a config which has already been
89+
initialized from the metadata of the OpenEO connection.
90+
91+
This method is internal as it is always called via `ArtifactHelperABC.from_openeo_connection`
92+
93+
:param connection: A valid instance of a connection object to an OpenEOBackend
94+
:param config: object that specifies configuration for Artifact storage.
95+
"""

0 commit comments

Comments
 (0)