Skip to content

feat: add configurable DuckDB memory limit via DUCKDB_MEMORY_LIMIT env var#43

Open
CarstVaartjes wants to merge 1 commit intovisualfabriq:masterfrom
CarstVaartjes:feat/duckdb-memory-limit
Open

feat: add configurable DuckDB memory limit via DUCKDB_MEMORY_LIMIT env var#43
CarstVaartjes wants to merge 1 commit intovisualfabriq:masterfrom
CarstVaartjes:feat/duckdb-memory-limit

Conversation

@CarstVaartjes
Copy link
Copy Markdown
Member

Summary

  • Add DUCKDB_MEMORY_LIMIT environment variable support to cap DuckDB memory per query connection
  • When set (e.g. DUCKDB_MEMORY_LIMIT=2GB), DuckDB spills to temp storage instead of allocating unbounded memory
  • Prevents OOM on containers with limited memory (e.g. ECS tasks running multiple Gunicorn workers)

Context

DQE reader runs 3 Gunicorn workers in a 7.5GB ECS container. DuckDB's default behavior allocates memory without bounds, causing OOM on large shard aggregations. With DUCKDB_MEMORY_LIMIT=2GB, each worker caps at 2GB (3×2GB=6GB), leaving headroom for Python/OS.

Benchmark (345M-row KCI shard, high-cardinality query)

Limit Time RSS delta Result
None (default) 7.93s +2,713 MB 256K rows
2GB 6.71s +540 MB 256K rows
512MB 3.17s OOM Failed

No performance penalty — bounded memory is actually faster due to less GC pressure.

Changes

  • parquery/aggregate_duckdb.py: Read DUCKDB_MEMORY_LIMIT env var, pass as config to duckdb.connect()
  • parquery/__init__.py: Version bump to 2.0.7
  • RELEASE_NOTES.md: Added 2.0.7 entry

Test plan

  • All 76 existing tests pass
  • Verified with real 345M-row parquet file at various limits
  • Deploy to DQE reader with DUCKDB_MEMORY_LIMIT=2GB env var

@CarstVaartjes CarstVaartjes requested a review from a team as a code owner March 28, 2026 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant