Skip to content
This repository was archived by the owner on Jul 21, 2025. It is now read-only.

Commit 7c6b9aa

Browse files
authored
feat: duckdb extensions in lambda (#75)
@ceholden I took your good work in #60 and reshaped it a bit to use the targeted extension fetching from stac-utils/rustac-py#81. Lambda sizes: - Before (current **main**): 29.2 MB - With the **uv** and other tweaks, but no DuckDB extensions: 26.2 MB - With DuckDB extensions: 93.6 MB Things are broken, because I haven't wired up the server to actually use those extensions, but I _think_ this is a proof out of using pre-fetched DuckDB extensions. Lambda is here: https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/functions/stac-fastapi-geoparquet-labs-375-de-lambda8B5974B5-W1CWbXEHRA1Y?tab=code > [!WARNING] > When I tried to use the wheels directly, the size got too big. For now, I'm sticking with a build-in-the-container (which is slow, ~10 minutes on my machine), but I need to dig in to why the build is smaller than the wheel.
1 parent 32651f5 commit 7c6b9aa

File tree

7 files changed

+133
-118
lines changed

7 files changed

+133
-118
lines changed

.github/workflows/ci.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,25 @@ on:
99
release:
1010
types: [published]
1111

12+
env:
13+
duckdb-version: "1.2.0"
14+
1215
jobs:
1316
lint:
1417
name: Lint
1518
runs-on: ubuntu-latest
19+
env:
20+
DUCKDB_LIB_DIR: ${{ github.workspace }}/opt/duckdb
21+
LD_LIBRARY_PATH: ${{ github.workspace }}/opt/duckdb
22+
DYLD_LIBRARY_PATH: ${{ github.workspace }}/opt/duckdb
1623
steps:
1724
- uses: actions/checkout@v4
1825
- uses: astral-sh/setup-uv@v5
26+
- name: Install libduckdb
27+
run: |
28+
wget https://github.com/duckdb/duckdb/releases/download/v${{ env.duckdb-version }}/libduckdb-linux-amd64.zip
29+
mkdir -p ${{ github.workspace }}/opt/duckdb
30+
unzip libduckdb-linux-amd64.zip -d ${{ github.workspace }}/opt/duckdb
1931
- name: Sync uv
2032
run: uv sync --all-extras --all-groups
2133
- name: Install yarn deps

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ Then:
3333

3434
```shell
3535
source .venv/bin/activate
36-
cp .env.local .env
3736
cd infrastructure/aws
37+
cp .env.local .env
3838
# Make sure you're using the eoAPI sub-account
3939
aws sso login --profile eoapi && eval "$(aws configure export-credentials --profile eoapi --format env)" # or however you configure your AWS sessions
4040
cdk diff # to show any differences

infrastructure/aws/app.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ def __init__(
174174
),
175175
environment={
176176
"STAC_FASTAPI_GEOPARQUET_HREF": f"s3://{bucket.bucket_name}/{config.geoparquet_key}",
177+
# find pre-fetched extensions
178+
"STAC_FASTAPI_DUCKDB_EXTENSION_DIRECTORY": "/tmp/duckdb-extensions",
177179
"HOME": "/tmp", # for duckdb's home_directory
178180
},
179181
)
Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,35 @@
11
ARG PYTHON_VERSION=3.12
2+
ARG BUILDPLATFORM=x86_64
23

3-
FROM public.ecr.aws/lambda/python:${PYTHON_VERSION}
4-
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
4+
FROM --platform=${BUILDPLATFORM} ghcr.io/astral-sh/uv:0.6.6 AS uv
5+
FROM --platform=${BUILDPLATFORM} public.ecr.aws/lambda/python:${PYTHON_VERSION} AS builder
56

6-
# Install required utilities
7-
RUN dnf install -y findutils binutils git && \
7+
ENV UV_COMPILE_BYTECODE=1
8+
ENV UV_NO_INSTALLER_METADATA=1
9+
ENV UV_LINK_MODE=copy
10+
ENV PATH="/root/.cargo/bin:${PATH}"
11+
ENV MATURIN_PEP517_ARGS="--features=duckdb-bundled"
12+
13+
RUN dnf install -y findutils binutils git gcc g++ && \
814
dnf clean all && \
915
rm -rf /var/cache/dnf
16+
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y
1017

11-
WORKDIR /tmp
12-
COPY uv.lock uv.lock
13-
COPY pyproject.toml pyproject.toml
14-
COPY README.md README.md
15-
COPY src/ src/
16-
17-
RUN uv pip install --compile-bytecode .[lambda] --target /asset
18-
19-
# Reduce package size and remove useless files
18+
RUN --mount=from=uv,source=/uv,target=/bin/uv \
19+
--mount=type=cache,target=/root/.cache/uv \
20+
--mount=type=bind,source=uv.lock,target=uv.lock \
21+
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
22+
uv export --frozen --no-emit-workspace --no-dev --no-editable --extra lambda -o requirements.txt && \
23+
uv pip install -r requirements.txt --target /asset
2024
WORKDIR /asset
25+
RUN python -c "from stacrs import DuckdbClient; DuckdbClient(use_s3_credential_chain=True, use_azure_credential_chain=False, install_extensions=True, extension_directory='/asset/duckdb-extensions')"
2126
RUN find . -type f -name '*.pyc' | while read f; do n=$(echo $f | sed 's/__pycache__\///' | sed 's/.cpython-[0-9]*//'); cp $f $n; done;
2227
RUN find . -type d -a -name '__pycache__' -print0 | xargs -0 rm -rf
2328
RUN find . -type f -a -name '*.py' -print0 | xargs -0 rm -f
2429
RUN find . -type d -a -name 'tests' -print0 | xargs -0 rm -rf
25-
26-
# Strip debug symbols from compiled C/C++ code
2730
RUN find . -type f -name '*.so*' -exec strip --strip-unneeded {} \;
2831

32+
FROM --platform=${BUILDPLATFORM} public.ecr.aws/lambda/python:${PYTHON_VERSION}
33+
WORKDIR /asset
34+
COPY --from=builder /asset /asset
2935
COPY infrastructure/aws/lambda/handler.py /asset/handler.py

infrastructure/aws/lambda/handler.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
"""AWS Lambda handler."""
22

33
import logging
4+
import os
5+
import shutil
46

57
from mangum import Mangum
68
from stac_fastapi.geoparquet.main import app
79

10+
if not os.path.exists("/tmp/duckdb-extensions") and os.path.isdir("/duckdb-extensions"):
11+
shutil.copytree("/duckdb-extensions", "/tmp/duckdb-extensions")
12+
813
logging.getLogger("mangum.lifespan").setLevel(logging.ERROR)
914
logging.getLogger("mangum.http").setLevel(logging.ERROR)
1015

pyproject.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@ version = "0.0.0"
44
description = "Test the performance of stac-fastapi-geoparquet"
55
readme = "README.md"
66
requires-python = ">=3.12"
7-
dependencies = [
8-
"stac-fastapi-geoparquet @ git+https://github.com/stac-utils/stac-fastapi-geoparquet",
9-
]
7+
dependencies = ["stac-fastapi-geoparquet", "stacrs"]
108

119
[project.optional-dependencies]
1210
lambda = ["mangum==0.19.0"]
@@ -40,6 +38,10 @@ ignore_missing_imports = true
4038
[tool.ruff.lint]
4139
select = ["E", "F", "I"]
4240

41+
[tool.uv.sources]
42+
stac-fastapi-geoparquet = { git = "https://github.com/stac-utils/stac-fastapi-geoparquet", branch = "main" }
43+
stacrs = { git = "https://github.com/stac-utils/stacrs", branch = "main" }
44+
4345
[build-system]
4446
requires = ["setuptools"]
4547
build-backend = "setuptools.build_meta"

0 commit comments

Comments
 (0)