Skip to content
This repository was archived by the owner on Jul 21, 2025. It is now read-only.

Conversation

@ceholden
Copy link

@ceholden ceholden commented Feb 25, 2025

Description

This PR attempts to address some cold start performance issues described in, #59
TLDR - our Lambda downloads ~300MB of DuckDB extensions as part of a cold start, adding some overhead in time, data ingress cost, and adding load to DuckDB upstream repos.

The approach I tried was to initialize our DuckDB client during the build process, which necessitated moving to a Docker container based Lambda setup instead of a ZIP (the extensions alone are >250MB and stacrs is ~60MB).

Note

I'm not sure we should merge this, or maybe we can merge to test and revert!

I noticed that our cold starts are on the same order or slightly faster. Spot checking I saw ~0.7s to 1.5s for Docker, while ZIP based was always >1s even though loading the ZIP is faster we still have to wait on extension downloading & installing. I haven't been able to figure out why, but the warm start performance of the Docker based Lambda was much worse (3-4x slower for /collections/naip and /collections/naip/items).

There are one or two nuggets that I think are worth including,

  • ensuring that when we uv pip install that we install based on the lockfile
  • bumping stacrs to 0.5.5 since the one we point to in the lockfile has been yanked
  • specify the "platform" in our CDK code so we ensure our container builds for the platform that we run our Lambda on

I built/deployed this from my laptop, so it might be nice to deploy via CI to be 100% sure that my local deployment environment isn't causing the warm start performance degradation. I redeployed using stacrs==0.5.5 with a ZIP and I saw the same performance as with stacrs==0.4.0, so it's something about the Docker based Lambda

@ceholden ceholden requested review from gadomski and hrodmn and removed request for hrodmn February 25, 2025 15:12
@hrodmn
Copy link
Contributor

hrodmn commented Feb 25, 2025

Thanks for testing this out @ceholden!

The approach I tried was to initialize our DuckDB client during the build process, which necessitated moving to a Docker container based Lambda setup instead of a ZIP (the extensions alone are >250MB and stacrs is ~60MB).

I noticed that our cold starts are on the same order or slightly faster. Spot checking I saw ~0.7s to 1.5s for Docker, while ZIP based was always >1s even though loading the ZIP is faster we still have to wait on extension downloading & installing. I haven't been able to figure out why, but the warm start performance of the Docker based Lambda was much worse (3-4x slower for /collections/naip and /collections/naip/items).

Roughly what is the cold start time for the zip-based deployment? 2s, 3s, 10s? Nevermind, just saw the 8s estimate in the original issue. If it is less than 5 seconds is it a big deal? Maybe a problem for clients with a timeout that bumps against the cold start time.

I haven't used this feature before but it's possible we could accomplish the same goal with layers. I don't know how performance would compare to the Docker image Lambda, though.

There are one or two nuggets that I think are worth including,

  • ensuring that when we uv pip install that we install based on the lockfile
  • bumping stacrs to 0.5.5 since the one we point to in the lockfile has been yanked
  • specify the "platform" in our CDK code so we ensure our container builds for the platform that we run our Lambda on

That all sounds good - could you implement in a separate PR since those are great improvements regardless of the out come of zip vs ECR throwdown 🥊

I built/deployed this from my laptop, so it might be nice to deploy via CI to be 100% sure that my local deployment environment isn't causing the warm start performance degradation. I redeployed using stacrs==0.5.5 with a ZIP and I saw the same performance as with stacrs==0.4.0, so it's something about the Docker based Lambda

I just added a new deployemnt environment called test - we could easily enable deploying to a test stack by making environment an input parameter so that we could pick the branch/environment during a workflow_dispatch. It would make sense to use the input parameter for workflow_dispatch events and default to development for release-triggered runs. That would be a great thing to add to the separate PR with the other improvements you made.

@ceholden
Copy link
Author

Roughly what is the cold start time for the zip-based deployment? 2s, 3s, 10s? If it is less than 5 seconds is it a big deal? Maybe a problem for clients with a timeout that bumps against the cold start time.

Cold start was ~8-10 seconds. This seems not so nice for anything with a UI in front, but we can mitigate by keeping some reserved concurrency of Lambdas to have a "warm" pool. If we stashed DuckDB dependencies in our own S3 bucket in the same region then we'd achieve the other goals of, 1) isolating our service from any outages at DuckDB, 2) being nice to DuckDB, 3) maaaybe it'd download a bit quicker since it'd be within-region.

I haven't used this feature before but it's possible we could accomplish the same goal with layers. I don't know how performance would compare to the Docker image Lambda, though.

Layers unfortunately don't get you around the "maximum of 250MB",

250 MB The maximum size of the contents of a deployment package, including layers and custom runtimes. (unzipped)
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#function-configuration-deployment-and-execution

I think of layers as more useful for organization, reducing rebuild times (e.g., only redeploy app code on top of a stable layer), or sharing among related functions (e.g., if 10 Lambdas needed rasterio then it'd be quicker to deploy and we'd maybe get some cache reuse if there were a dedicated layer).

That all sounds good - could you implement in a separate PR since those are great improvements regardless of the out come of zip vs ECR throwdown 🥊

💯 will do! I think there's maybe a question of uv export && uv pip install --target /asset versus uv sync && (and a bunch of copies) from Pete's question, but the "export to requirements" was recommended by uv authors and that's also how it's done with Poetry so it seems simpler to me to do export as an intermediate step

COPY --from=builder --chown=app:app /app/.venv /app/.venv

# install DuckDB extensions needed by DuckDB client at build time
RUN python -c "import stacrs; stacrs.DuckdbClient()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is duplicated in the deploy dockerfile, can we refactor to something shared so we don't repeat ourselves? Not sure if that makes sense with how dockerfiles work, but just asking.

COPY infrastructure/aws/lambda/handler.py ${LAMBDA_TASK_ROOT}/handler.py

# install DuckDB extensions needed by DuckDB client at build time
RUN python -c "import stacrs; stacrs.DuckdbClient()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ceholden I'm going to take over stac-utils/rustac#655 and implement that this morning.

gadomski added a commit that referenced this pull request Mar 13, 2025
@ceholden I took your good work in
#60
and reshaped it a bit to use the targeted extension fetching from
stac-utils/rustac-py#81. Lambda sizes:

- Before (current **main**): 29.2 MB
- With the **uv** and other tweaks, but no DuckDB extensions: 26.2 MB
- With DuckDB extensions: 93.6 MB

Things are broken, because I haven't wired up the server to actually use
those extensions, but I _think_ this is a proof out of using pre-fetched
DuckDB extensions. Lambda is here:
https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/functions/stac-fastapi-geoparquet-labs-375-de-lambda8B5974B5-W1CWbXEHRA1Y?tab=code

> [!WARNING]
> When I tried to use the wheels directly, the size got too big. For
now, I'm sticking with a build-in-the-container (which is slow, ~10
minutes on my machine), but I need to dig in to why the build is smaller
than the wheel.
@gadomski
Copy link
Contributor

Closing as OBE on #75

@gadomski gadomski closed this Mar 13, 2025
@gadomski gadomski deleted the fix/fix-duckdb-dl branch April 22, 2025 23:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants