Skip to content
This repository was archived by the owner on Jul 21, 2025. It is now read-only.

Commit d663df8

Browse files
gadomskizacdezgeo
andauthored
feat: add blog post (#82)
The "results" section is extremely light ... we might need to add some "what does it mean" text? I also don't talk about the benefit of having the **stac-geoparquet** in blob storage so folks can query directly against that (instead of going through a server layer) or just download the file themselves. @zacdezgeo do you think that's critical to include, or just a confuser? In service of developmentseed/communications#819 --------- Co-authored-by: Zac Deziel <[email protected]>
1 parent 05a24da commit d663df8

File tree

5 files changed

+102
-0
lines changed

5 files changed

+102
-0
lines changed

docs/blog-post.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Right-sizing STAC
2+
3+
[Cloud-Native Geospatial](https://guide.cloudnativegeo.org/) is a collection of specifications, tools, and ideas around how **geospatial data** can be queried, visualized, and analyzed from its storage location, without heavy infrastructure like a database or API server.
4+
We're bringing the same philosophy to **geospatial metadata** via [stac-geoparquet](https://github.com/stac-utils/stac-geoparquet/blob/main/spec/stac-geoparquet-spec.md).
5+
We hope that these new technologies and tools will provide more flexibility and efficiency in storing and querying metadata.
6+
7+
## What the STAC?
8+
9+
The [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org) specification is a **common language to describe geospatial information**.
10+
Built on battle-tested geospatial standards and specifications such as [GeoJSON](https://geojson.org/) and [OGC API - Features](https://ogcapi.ogc.org/features/), STAC has a vast (and growing) number of [implementations](https://stacindex.org/catalogs) with single instances containing over [hundreds of millions of items](https://developers.planet.com/blog/2022/Aug/31/state-of-stac/).
11+
But STAC isn't just for large organizations and companies; it can be used in any system where geospatial assets need to be stored and indexed for later use by humans, machines, or interfaces.
12+
13+
Most existing [STAC API](https://github.com/radiantearth/stac-api-spec) backends use customized instances of existing data store systems, such as [pgstac (for PostgreSQL)](https://github.com/stac-utils/pgstac) or [Elasticsearch/OpenSearch](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch).
14+
Each of those backends support huge (>100 million items) instances, such as [Microsoft's Planetary Computer](https://planetarycomputer.microsoft.com/) or [AWS's Earth Search](https://earth-search.aws.element84.com/v1).
15+
However, because these backends are designed to scale, they can be awkward to use for smaller datasets.
16+
They can be expensive when deployed through a cloud provider's off-the-shelf services (<https://aws.amazon.com/rds/>), without even considering the cost of configuring and maintaining those backends.
17+
18+
> Postgres needs to be managed
19+
>
20+
> @bitner
21+
22+
## Cloud-Native Geospatial Metadata
23+
24+
Enter [geoparquet](https://geoparquet.org/), a geospatial-specific flavor of the powerful column-oriented data format [parquet](https://parquet.apache.org/).
25+
**geoparquet** is natively _queryable_, meaning that clients, such as [DuckDB](https://duckdb.org/), can search directly from a **geoparquet** file.
26+
DuckDB has an officially supported [spatial extension](https://duckdb.org/docs/stable/extensions/spatial/overview.html) for doing precisely this.
27+
28+
```sql
29+
D install spatial;
30+
D load spatial;
31+
D select * from read_parquet('s3://stac-fastapi-geoparquet-labs-375/naip.parquet')
32+
where st_intersects(geometry, st_geomfromgeojson('{"type":"Point","coordinates":[-105.1019,40.1672]}'));
33+
┌─────────┬──────────────┬──────────────────────┬──────────────────────┬───────────────┬───┬───────────┬──────────────────────┬───────────┬──────────────────────┬──────────────────────┬──────────────────────┐
34+
│ type │ stac_version │ stac_extensions │ id │ proj:shape │ … │ naip:year │ proj:bbox │ proj:epsg │ providers │ bbox │ geometry │
35+
varcharvarcharvarchar[] │ varchar │ int64[] │ │ varchar │ double[] │ int64 │ struct(url varchar… │ struct(xmin double… │ geometry │
36+
├─────────┼──────────────┼──────────────────────┼──────────────────────┼───────────────┼───┼───────────┼──────────────────────┼───────────┼──────────────────────┼──────────────────────┼──────────────────────┤
37+
│ Feature │ 1.1.0 │ [https://stac-exte… │ co_m_4010556_sw_13… │ [12240, 9550] │ … │ 2021 │ [489150.0, 4441434… │ 26913 │ [{'url': https://w… │ {'xmin': -105.1274… │ POLYGON ((-105.060… │
38+
├─────────┴──────────────┴──────────────────────┴──────────────────────┴───────────────┴───┴───────────┴──────────────────────┴───────────┴──────────────────────┴──────────────────────┴──────────────────────┤
39+
1 rows 19 columns (11 shown) │
40+
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
41+
```
42+
43+
The only missing piece was to bridge the gap between STAC and **geoparquet**.
44+
[Tom Augsburger](https://github.com/TomAugspurger) began work in [May 2022](https://github.com/stac-utils/stac-geoparquet/commit/8b39b72a5694ea08ec9aaeea37d53bf589969787) with an implementation that used a [GeoDataFrame](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html) as an intermediate representation.
45+
Since then, the original [stac-geoparquet](https://github.com/stac-utils/stac-geoparquet) code has matured to be more performant and support alternative storage mechanisms, including [Delta Lake](https://delta.io/).
46+
In parallel, we've added an intuitive **stac-geoparquet** interface to [rustac](https://github.com/stac-utils/rustac-py), which binds more directly to the underlying Rust libraries such as [geoarrow-rs](https://github.com/geoarrow/geoarrow-rs/).
47+
48+
## But does it work?
49+
50+
Microsoft's Planetary Computer showed that [stac-geoparquet can be useful for bulk STAC item queries](https://planetarycomputer.microsoft.com/docs/quickstarts/stac-geoparquet/).
51+
But we wondered if we could adapt **stac-geoparquet** to work with existing STAC tooling, such as [pystac-client](https://pystac-client.readthedocs.io/) or [stac-browser](https://radiantearth.github.io/stac-browser).
52+
To do so, we built a prototype [stac-fastapi-geoparquet](https://github.com/stac-utils/stac-fastapi-geoparquet/) to put a "serverless" API layer in front of **stac-geoparquet**.
53+
54+
![stac-fastapi-geoparquet architecture](./img/stac-fastapi-geoparquet-architecture.excalidraw.png)
55+
56+
This architecture should be extremely affordable, since we're only utilizing light "serverless" services and blob storage.
57+
To see if it worked, we ran a series of experiments where we compared **stac-fastapi-geoparquet** with a **stac-fastapi-pgstac** instance with the same data.
58+
These results are preliminary, but encouraging.
59+
60+
### Results
61+
62+
Our benchmarks reveal a nuanced performance profile between [`stac-fastapi-geoparquet`](https://github.com/stac-utils/stac-fastapi-geoparquet) and [`stac-fastapi-pgstac`](https://github.com/stac-utils/stac-fastapi-pgstac), shaped by dataset size and query type.
63+
64+
For small to **medium-sized catalogs** (under approximately 100,000 items), `stac-fastapi-geoparquet` consistently outperforms `pgstac` when returning large pages of items. This makes it a strong choice for scenarios like paginated browsing or lightweight faceted search, particularly when operating in serverless environments or on a constrained budget.
65+
66+
![paging speed](./img/search-page-speed.png)
67+
68+
However, for **targeted lookups** (e.g., retrieving a single STAC item by ID or matching on exact attributes), traditional databases like `pgstac` still shine. Their indexing and query planning provide much faster access times for "needle-in-a-haystack" searches.
69+
70+
![search by attributes](./img/searc-by-attributes.png)
71+
72+
**At large scales** (over approximately 2 million items), we reach the limits of our current serverless architecture. In particular, our Lambda deployment of `stac-fastapi-geoparquet` times out during single-item searches. This isn’t a fundamental limit of DuckDB or GeoParquet, but a practical boundary of compute limits in AWS Lambda. These limits highlight the need for thoughtful deployment strategies as data volumes grow.
73+
74+
These results demonstrate the promise of analyzing metadata "at rest" using simple, efficient tooling — but also highlight where robust databases or other scalable query systems still play a key role.
75+
Cloud-native doesn't mean _no_ infrastructure — it means **choosing the right infrastructure** for the job.
76+
77+
### But do you need a server at all?
78+
79+
One of the compelling stories for Cloud-Native Geospatial (Meta)data is that they can be used without a server at all.
80+
**rustac** can use the same STAC API search parameters (including [cql2](https://developmentseed.org/cql2-rs/)) to search the **stac-geoparquet** file directly, without going through a server at all:
81+
82+
```python
83+
from rustac import DuckdbClient
84+
85+
client = DuckdbClient()
86+
# Configure AWS credentials
87+
client.execute("CREATE SECRET (TYPE S3, PROVIDER CREDENTIAL_CHAIN)")
88+
items = client.search(
89+
"s3://stac-fastapi-geoparquet-labs-375/naip.parquet",
90+
intersects={"type": "Point", "coordinates": [-105.1019, 40.1672]},
91+
)
92+
```
93+
94+
![stac-fastapi-geoparquet and rustac arch](./img/stac-fastapi-geoparquet-and-rustac-architecture.excalidraw.png)
95+
96+
## What's Next?
97+
98+
We're looking to take these experiments beyond our labs and into real-world applications.
99+
100+
If you're working with small- to medium-sized geospatial datasets — from thousands to a few hundred thousand assets — we’d love to explore how the free and open-source [`stac-fastapi-geoparquet`](https://github.com/stac-utils/stac-fastapi-geoparquet) can support your use case.
101+
102+
Curious about how we ran these tests and what we found? Dive into the details in our [labs repository](https://github.com/developmentseed/labs-375-stac-geoparquet-backend/).

docs/img/searc-by-attributes.png

43.5 KB
Loading

docs/img/search-page-speed.png

52 KB
Loading
194 KB
Loading
186 KB
Loading

0 commit comments

Comments
 (0)