Append to data store functionality (or provide example of reasonable approach)


Below paraphrased from:
https://cloudnativegeo.slack.com/archives/C060YAB0FHV/p1747780003580869?thread_ts=1747250715.883169&cid=C060YAB0FHV

I have a desire to continuously update a geoparquet-backed STAC datastore - however, I don't see an obvious append workflow - maybe this isn't idiomatic with what we expect of a STAC client, perhaps we expect a stac server to implement this logic on the backend? In a relatively quick and dirty prototype I put together, I addressed this by simply reading all the items into memory, subsetting to 'features', extending with my new item and writing the new item to disk (or to blob storage). See below example:

```
    async def add_items_to_parquet(
        self, fire_event_name: str, items: List[Dict[str, Any]]
    ) -> str:
        """
        Add STAC items to the consolidated GeoParquet file

        Returns:
            Path to the updated GeoParquet file
        """

        # Validate all items using `stac_pydantic`
        for item in items:
            self.validate_stac_item(item)


        # If the parquet file doesn't exist yet, just write the items directly
        if not os.path.exists(self.parquet_path):
            await rustac.write(self.parquet_path, items, format="geoparquet")
            return self.parquet_path


        # Read existing items first
        all_items = await rustac.read(self.parquet_path)
        all_items = all_items["features"]


        # Combine with new items
        all_items.extend(items)


        # Write back to parquet file
        await rustac.write(self.parquet_path, all_items, format="geoparquet")


        return self.parquet_path
```

@gadomski quickly and helpfully replied that: 

_... appends are not supported at the moment. right now the "recommended" approach is "just re-write it" (https://www.gadom.ski/presentations/2025-04-30-CNG.html#/4/3) as that can be pretty fast (a second or two) for decent numbers of items (tens of thousands, at least)
this is as much of a limitation in stac-geoparquet generally as it is with rustac in particular. because parquet requires a fixed schema, and STAC is very flexible, adding new STAC items to an existing parquet file is susceptible to schema mismatches (and therefore errors). that's why, for now, we've [punted](https://github.com/developmentseed/labs-375-stac-geoparquet-backend/issues/12)_

Regardless - even if it is not `rustac-py`'s job (seems quite reasonable to not open this can of worms), it might be nice to include a clean example of the "just rewrite it" strategy within the docs, perhaps with some editorializing on the potential for errors w/ schema mismatches. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Append to data store functionality (or provide example of reasonable approach) #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Append to data store functionality (or provide example of reasonable approach) #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions