Skip to content

Add support for DuckDB#1

Merged
tylerbarker merged 5 commits intomainfrom
duckdb
Dec 29, 2025
Merged

Add support for DuckDB#1
tylerbarker merged 5 commits intomainfrom
duckdb

Conversation

@tylerbarker
Copy link
Owner

Summary

Add first-class DuckDB support to SqlKit via the duckdbex driver.

DuckDB is unique among supported databases - it's not an Ecto adapter but a direct NIF driver. This PR provides complete integration including direct connections, connection pooling, file-based SQL support, prepared statement caching, and streaming for large result sets.

Features

Direct Connection API

  • SqlKit.DuckDB.connect/2 and disconnect/1 for manual connection management
  • Works with both in-memory (:memory:) and file-based databases
  • Supports custom Duckdbex.Config for threads, memory limits, etc.

Connection Pool

  • SqlKit.DuckDB.Pool - NimblePool-based connection pool for production use
  • Proper lifecycle management via supervision tree
  • Database released cleanly on pool termination
  • Configurable pool size (default: 4 connections)

Prepared Statement Caching

  • Pool automatically caches prepared statements per connection
  • Repeated queries with same SQL skip the prepare step
  • Configurable via :cache option (default: true)

Streaming Large Results

  • SqlKit.DuckDB.stream!/3 for direct connections
  • SqlKit.DuckDB.Pool.with_stream!/5 for pooled connections
  • File-based SQL modules get with_stream!/3
  • Memory-efficient processing via Duckdbex.fetch_chunk/1

File-Based SQL

  • New :backend option for use SqlKit macro
  • Syntax: backend: {:duckdb, pool: MyApp.AnalyticsPool}
  • Same API as Ecto-based modules (query_all!, query_one!, etc.)

Additional Features

  • Automatic hugeint to integer conversion
  • PostgreSQL-style $1, $2, ... parameter placeholders
  • Configurable checkout timeout (default: 5000ms)
  • Extensions loaded via SQL (INSTALL, LOAD)

Breaking Changes

None. All existing functionality unchanged.

Introduces first-class DuckDB support as an optional backend alongside
Ecto-based databases.

- Add SqlKit.DuckDB module with direct connection API
- Add SqlKit.DuckDB.Pool with NimblePool-based connection pooling
- Use Supervisor pattern for proper database resource cleanup
- Support full SqlKit API (query_all!, query_one!, :as option)
- Add nimble_pool ~> 1.1 dependency
- Add duckdbex ~> 0.3.19 as optional dependency
- Add 49 tests covering connections, pools, persistence, and types
Extends the `use SqlKit` macro to support DuckDB pools as a backend,
enabling file-based SQL with DuckDB alongside Ecto repos.

- Add `:backend` option as alternative to `:repo` in `use SqlKit`
- Support `backend: {:duckdb, pool: PoolName}` configuration
- Add compile-time validation with clear error messages
- Expand module aliases at compile time for proper resolution
- Generate backend-specific `get_backend/0` function
- Add DuckDB test SQL files (all_users, user_by_id, etc.)
- Add SqlKit.Test.DuckDBSQL module for file-based tests
- Add 18 tests for file-based SQL including persistence verification
Phase 4: Testing & Documentation for DuckDB support.

Documentation:
- Add DuckDB to supported databases table in README
- Add comprehensive DuckDB section with direct connection, pooled
  connection, file-based SQL, and extension loading examples
- Document pool options including :config for Duckdbex.Config
- Add DuckDB to parameter syntax table
- Update "Use SqlKit Options" to include :backend option
- Update CLAUDE.md with file-based SQL examples for DuckDB

Tests were already comprehensive from Phase 3 (65+ DuckDB tests covering
direct connections, pools, file-based SQL, persistence, and edge cases).
Phase 5: Polish & Performance for DuckDB support.

Prepared Statement Caching:
- Add prepared_cache to pool worker state for statement reuse
- New Pool.query!/4 and Pool.query/4 with automatic caching
- Cache uses SQL as key, stores prepared statement references
- Support :cache option (default: true) to disable per-query
- SqlKit functions use cached queries through pools by default

Streaming Support:
- SqlKit.DuckDB.stream!/3 and stream_with_columns!/3 for direct connections
- Pool.with_stream!/5 and with_stream_and_columns!/5 for pooled connections
- File-based SQL modules get with_stream!/3 and with_stream_and_columns!/3
- Uses Duckdbex.fetch_chunk/1 for lazy evaluation of large result sets
- Hugeint conversion applied to streamed chunks

Pool Tuning:
- Add :timeout option to all pool operations (default: 5000ms)
- Document lazy initialization strategy in Pool module
- Document prepared statement caching behavior

Tests:
- 22 new tests for Phase 5 features (87 total DuckDB tests)
- Tests for caching, streaming (direct/pool/file-based), and timeouts
- All 282 tests pass, mix check passes all quality checks
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive first-class DuckDB support to SqlKit, enabling both direct connections and pooled connections for the DuckDB analytical database. Unlike other supported databases, DuckDB is integrated via the direct duckdbex NIF driver rather than an Ecto adapter.

Key changes:

  • New DuckDB connection and pool modules with prepared statement caching
  • Backend abstraction layer allowing file-based SQL to work with both Ecto repos and DuckDB pools
  • Streaming API for memory-efficient processing of large result sets
  • Complete test suite with 100+ tests covering all DuckDB features

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lib/sql_kit/duckdb.ex Core DuckDB module providing direct connection API, query execution, and streaming
lib/sql_kit/duckdb/pool.ex NimblePool-based connection pool with prepared statement caching, streaming support, and proper resource lifecycle management
lib/sql_kit.ex Extended use SqlKit macro to support :backend option for non-Ecto databases, added backend validation and conditional streaming functions
lib/sql_kit/query.ex Added backend detection logic to route queries to appropriate drivers (DuckDB connections/pools vs Ecto repos)
test/sql_kit/duckdb_test.exs Comprehensive test suite covering direct connections, pools, streaming, caching, and file-based SQL
test/support/test_sql_modules.ex Added DuckDBSQL test module configuration
test/support/sql/test_duckdb/*.sql Test SQL files for file-based SQL testing with DuckDB backend
mix.exs Added nimble_pool as required dependency and duckdbex as optional dependency
mix.lock Updated with new dependencies
README.md Added comprehensive DuckDB documentation including setup, usage patterns, and key differences
CLAUDE.md Technical documentation for DuckDB support
.github/workflows/ci.yml Reordered SQL Server database creation step
Comments suppressed due to low confidence (1)

mix.exs:70

  • The ex_check dependency should remain in the :dev environment only, not both :dev and :test. This dependency is used for running checks in development (like format, credo, dialyzer), not for running tests themselves. Including it in :test unnecessarily increases test dependencies.
      {:tds, "~> 2.3", optional: true},

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tylerbarker tylerbarker merged commit 27e7585 into main Dec 29, 2025
1 check passed
@tylerbarker tylerbarker deleted the duckdb branch December 29, 2025 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants