Use ContextManager to support multi S3 object access #626
Unanswered
emmanuelmathot
asked this question in
Q&A
Replies: 1 comment
-
|
@emmanuelmathot in theory you could pass a with rasterio.Env(
session=AWSSession(
aws_access_key_id="MyDevseedId",
aws_secret_access_key="MyDevseedKey",
)
):so you could make a custom rio-tiler/rio_tiler/io/stac.py Lines 280 to 313 in 584ecdb def _get_asset_info(self, asset: str) -> AssetInfo:
"""Validate asset names and return asset's url.
Args:
asset (str): STAC asset name.
Returns:
str: STAC asset href.
"""
if asset not in self.assets:
raise InvalidAssetName(f"{asset} is not valid")
asset_info = self.item.assets[asset]
extras = asset_info.extra_fields
info = AssetInfo(
url=asset_info.get_absolute_href(),
metadata=extras,
env={}
)
if head := extras.get("file:header_size"):
info["env"].update({"GDAL_INGESTED_BYTES_AT_OPEN": head})
if bands := extras.get("raster:bands"):
stats = [
(b["statistics"]["minimum"], b["statistics"]["maximum"])
for b in bands
if {"minimum", "maximum"}.issubset(b.get("statistics", {}))
]
if len(stats) == len(bands):
info["dataset_statistics"] = stats
if extras.get("storage:requester_pays", None):
info["env"].update({"AWS_REQUEST_PAYER":"requester"})
return info |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We have use cases where we would need to access data on multiple S3 object storage. In the past, I already added the possibility to configure a custom S3 endpoint for STAC (#394) but now we would need to configure it dynamically according to the S3 URL for every asset.
Typically, you could have a STAC Item with multiple assets stored in different S3 provider
Each asset would have a different S3 configuration to access the image properly by setting the proper env variables for GDAL vsi s3 driver. One option would be to have a set of URL patterns with the specificaccess config. For instance:
[{ "usgs-landsat": { "url_pattern": "s3://usgs-landsat/.*", "access_key": "key", "secret_key": "secret", "request_pays": "true" }, "custom-provider": { "url_pattern": "s3://custom.*", "endpoint_url": "https://s3.company.com", "access_key": "key", "secret_key": "secret", "region": "cust-om", "force_path_style": true }}]We could also leverage the option set using the storage STAC extension.
I saw that rio-tiler is using contextlib and I was wondering if there is a good entry point to setup dynamically the env variables for GDAL at low level to be able to apply the config not only STAC assets
rio-tiler/rio_tiler/io/base.py
Line 497 in 584ecdb
but also to any dataset url
rio-tiler/rio_tiler/io/rasterio.py
Line 95 in 584ecdb
Please let me know if this is the right path for such a functionalities and if so let me know how to inject env var for GDAL in this context.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions