Document best practice: I/O-free STAC item generation

This issue is discussing what is (IMO) a best-practice for stactools packages: the ability to generate a STAC item without any I/O.

Currently most stactools packages have a high-level `stac.create_item(asset_href: str, ...) -> pystac.Item` function that generates a STAC item from a string. If the method requires reading any data / metadata, it will handle that I/O. This is very convenient, and ideally every stactools package has a way of doing this (especially useful when using a CLI).

Some of the more complicated stactools packages also generate cloud-optimized assets from the "source" asset at `asset_href`. In some of these packages, whether the output STAC item catalogs the cloud-optimized asset is directly tied to that function creating the cloud-optimized asset itself (see https://github.com/stactools-packages/goes-glm/blob/c9c3bc42685e66e0eaace599096ef6050c05eb57/src/stactools/goes_glm/stac.py#L46-L47 for example).

At a minimum, it should be easy to regenerate STAC metadata (including metadata for the cloud-optimized assets) without having to regenerate the cloud-optimized assets.

Now we have a couple ways to handle this:

1. The user passes all the hrefs to both the source asset and the cloud-optimized asset. The `create_item` method is responsible for reading the data:
```python
def create_item(source_asset_href, cloud_optimized_asset_hrefs, ...):
    ...
```
If the user provides `cloud_optimzied_asset_hrefs` then cloud-optimized asset (re)generation can be skipped.
2. The user passes in the data (and perhaps the hrefs, to easily set the `href` for each asset).

```python
def create_item(source_data, cloud_optimized_data):
    ...
```

Of these, I think we should steer package developers towards option 2, but I'm curious to hear others' thoughts. That's the approach taken by [stac-table](https://github.com/TomAugspurger/stac-table) and [xstac](https://github.com/TomAugspurger/xstac), and I think it works pretty well. Users are able to provide (essentially) any dataframe or Dataset and we can generate STAC metadata for it. Crucially, all of rasterio, pyarrow / `dask.dataframe`, and  xarray can lazily read data so creating / passing around a DataFrame or Dataset doesn't actually read data (unless it's required by the method).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document best practice: I/O-free STAC item generation #369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document best practice: I/O-free STAC item generation #369

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions