Skip to content

S3/boto based StacIO?Β #1541

@soxofaan

Description

@soxofaan

The pystac docs have this interesting S3/boto based StacIO implementation:

pystac/docs/concepts.rst

Lines 321 to 367 in 4dc0e0f

For example, the following code examples will allow
for reading from AWS's S3 cloud object storage using `boto3
<https://boto3.amazonaws.com/v1/documentation/api/latest/index.html>`__
or Azure Blob Storage using the `Azure SDK for Python
<https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme?view=azure-python>`__:
.. tab-set::
.. tab-item:: AWS S3
.. code-block:: python
from urllib.parse import urlparse
import boto3
from pystac import Link
from pystac.stac_io import DefaultStacIO, StacIO
from typing import Union, Any
class CustomStacIO(DefaultStacIO):
def __init__(self):
self.s3 = boto3.resource("s3")
super().__init__()
def read_text(
self, source: Union[str, Link], *args: Any, **kwargs: Any
) -> str:
parsed = urlparse(source)
if parsed.scheme == "s3":
bucket = parsed.netloc
key = parsed.path[1:]
obj = self.s3.Object(bucket, key)
return obj.get()["Body"].read().decode("utf-8")
else:
return super().read_text(source, *args, **kwargs)
def write_text(
self, dest: Union[str, Link], txt: str, *args: Any, **kwargs: Any
) -> None:
parsed = urlparse(dest)
if parsed.scheme == "s3":
bucket = parsed.netloc
key = parsed.path[1:]
self.s3.Object(bucket, key).put(Body=txt, ContentEncoding="utf-8")
else:
super().write_text(dest, txt, *args, **kwargs)
StacIO.set_default(CustomStacIO)

I fully understand this is not part of the core pystac project to minimize third party dependencies, but I wonder if there is any interest or plans for extracting this from the docs and instead package it in some kind of separate/extra package?

In the openEO GeoPysSpark driver, where we need this, we now just copy-pasted that snippet in an ad-hoc way, but it would be better for various reasons to properly decouple it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions