Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Changelog

## Unreleased

### Added

- **DuckDB support** via `duckdbex` driver
- `SqlKit.DuckDB.connect/2` and `disconnect/1` for direct connections
- `SqlKit.DuckDB.Pool` - NimblePool-based connection pool with supervision
- File-based SQL support via `:backend` option (`backend: {:duckdb, pool: PoolName}`)
- Automatic hugeint to integer conversion
- PostgreSQL-style `$1, $2, ...` parameter placeholders

- **Prepared statement caching** for DuckDB pools
- Automatic caching of prepared statements per connection
- Configurable via `:cache` option (default: true)

- **Streaming support** for DuckDB large result sets
- `SqlKit.DuckDB.stream!/3` and `stream_with_columns!/3` for direct connections
- `SqlKit.DuckDB.Pool.with_stream!/5` and `with_stream_and_columns!/6` for pools
- `with_stream!/3` and `with_stream_and_columns!/4` for file-based SQL modules

- **Pool tuning options**
- `:timeout` option for checkout operations (default: 5000ms)
- Lazy connection initialization
- Documented pool behavior and configuration

## 0.1.0

- Initial release
263 changes: 263 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# SqlKit

An Elixir library for executing raw SQL with automatic result transformation.

## Overview

SqlKit provides two ways to execute raw SQL with results automatically transformed into maps or structs:

1. **Direct SQL execution** - Standalone functions for executing SQL strings with any Ecto repo
2. **File-based SQL** - Macro-based approach for compile-time embedded SQL files

SQL files are embedded at compile-time for production (stored in module attributes) while reading from disk in dev/test for rapid iteration. Supports Postgrex, MyXQL, Exqlite, Tds, Ch (ClickHouse), and DuckDB.

## Project Structure

```
lib/
sql_kit.ex # Main module with `use SqlKit` macro + standalone query functions
sql_kit/
config.ex # Runtime config (root_sql_dir, load_sql)
helpers.ex # Compile-time helpers (file_atom)
exceptions.ex # NoResultsError, MultipleResultsError
query.ex # Core query execution logic (shared by both APIs)
duckdb.ex # DuckDB connection management (conditional on duckdbex)
duckdb/
pool.ex # NimblePool-based connection pool for DuckDB
test/
sql_kit_test.exs # Main tests covering all databases and both APIs
sql_kit/
helpers_test.exs # Helpers module tests
duckdb_test.exs # DuckDB-specific tests
support/
repos.ex # Test Ecto repos (Postgres, MySQL, SQLite, TDS, ClickHouse)
sql/ # Test SQL files per database
data_case.ex # Test case template
test_setup.ex # Database setup/teardown
test_sql_modules.ex # Test SqlKit modules
```

## Key Technical Decisions

- **Two APIs**: Standalone functions for direct SQL execution + macro-generated functions for file-based SQL
- **Shared execution logic**: Both APIs delegate to `SqlKit.Query` for consistent behavior
- **Compile-time embedding**: SQL files are read once at compile time and stored as module attributes with `persist: true`
- **Runtime file reads in dev/test**: Allows editing SQL without recompilation via `:load_sql` config
- **Direct driver support**: Pattern matches on result structs with `columns` and `rows` fields
- **Atom safety**: Use `String.to_existing_atom/1` for column names (requires struct fields to pre-exist)
- **Configurable**: `otp_app`, `repo`/`backend`, `dirname`, and `root_sql_dir` are configurable
- **Backend abstraction**: File-based SQL supports both Ecto repos (`:repo`) and DuckDB pools (`:backend`)

## Core API

### Standalone Functions (SqlKit module)

```elixir
# Execute SQL strings directly with any Ecto repo
SqlKit.query_all!(MyApp.Repo, "SELECT * FROM users WHERE age > $1", [21])
# => [%{id: 1, name: "Alice", age: 30}, ...]

SqlKit.query_one!(MyApp.Repo, "SELECT * FROM users WHERE id = $1", [1])
# => %{id: 1, name: "Alice"}

SqlKit.query_all!(MyApp.Repo, "SELECT * FROM users", [], as: User)
# => [%User{id: 1, name: "Alice"}, ...]

# Non-bang variants
SqlKit.query_all(repo, sql, params, opts) # => {:ok, results} | {:error, reason}
SqlKit.query_one(repo, sql, params, opts) # => {:ok, result | nil} | {:error, reason}

# Aliases for query_one
SqlKit.query!(repo, sql, params, opts)
SqlKit.query(repo, sql, params, opts)
```

### File-Based Functions (generated by `use SqlKit`)

```elixir
# With Ecto repo
defmodule MyApp.Reports.SQL do
use SqlKit,
otp_app: :my_app,
repo: MyApp.Repo,
dirname: "reports",
files: ["stats.sql", "activity.sql"]
end

# With DuckDB pool (use :backend instead of :repo)
defmodule MyApp.Analytics.SQL do
use SqlKit,
otp_app: :my_app,
backend: {:duckdb, pool: MyApp.AnalyticsPool},
dirname: "analytics",
files: ["daily_summary.sql"]
end

# Usage (same API for both)
MyApp.Reports.SQL.query!("stats.sql", [id]) # single row (alias for query_one!)
MyApp.Reports.SQL.query_one!("stats.sql", [id]) # single row
MyApp.Reports.SQL.query_all!("activity.sql", [id], as: Activity) # all rows as structs
MyApp.Reports.SQL.load!("stats.sql") # just get SQL string

# Non-bang variants return {:ok, result} | {:error, reason}
MyApp.Reports.SQL.query("stats.sql", [id])
MyApp.Reports.SQL.query_one("stats.sql", [id])
MyApp.Reports.SQL.query_all("activity.sql", [id])
MyApp.Reports.SQL.load("stats.sql")
```

### Utility Functions

```elixir
# Transform raw columns/rows into maps or structs (used internally, also public)
SqlKit.transform_rows(["id", "name"], [[1, "Alice"]], as: User)
# => [%User{id: 1, name: "Alice"}]

# Extract columns and rows from driver result
SqlKit.extract_result(result)
# => {["id", "name"], [[1, "Alice"]]}
```

## Supported Databases

| Database | Ecto Adapter | Driver |
|------------|---------------------------|----------|
| PostgreSQL | Ecto.Adapters.Postgres | Postgrex |
| MySQL | Ecto.Adapters.MyXQL | MyXQL |
| SQLite | Ecto.Adapters.SQLite3 | Exqlite |
| SQL Server | Ecto.Adapters.Tds | Tds |
| ClickHouse | Ecto.Adapters.ClickHouse | Ch |
| DuckDB | N/A (direct driver) | Duckdbex |

### DuckDB Support

DuckDB is unique - it's not an Ecto adapter but a direct NIF driver. SqlKit provides first-class support:

```elixir
# Direct connection (BYO)
{:ok, conn} = SqlKit.DuckDB.connect(":memory:")
SqlKit.query_all!(conn, "SELECT * FROM users", [])
SqlKit.DuckDB.disconnect(conn)

# Pooled connection (recommended for production)
# Add to supervision tree:
{SqlKit.DuckDB.Pool, name: MyPool, database: "analytics.duckdb", pool_size: 4}

# With custom Duckdbex config:
{SqlKit.DuckDB.Pool, name: MyPool, database: "analytics.duckdb", pool_size: 4,
config: %Duckdbex.Config{threads: 4}}

# Then use the pool:
{:ok, pool} = SqlKit.DuckDB.Pool.start_link(name: MyPool, database: ":memory:")
SqlKit.query_all!(pool, "SELECT * FROM events", [])

# File-based SQL with DuckDB (use :backend instead of :repo)
defmodule MyApp.Analytics.SQL do
use SqlKit,
otp_app: :my_app,
backend: {:duckdb, pool: MyApp.AnalyticsPool},
dirname: "analytics",
files: ["daily_summary.sql"]
end

MyApp.Analytics.SQL.query_all!("daily_summary.sql", [~D[2024-01-01]])
```

Key differences from Ecto-based databases:
- Uses PostgreSQL-style `$1, $2, ...` parameter placeholders
- In-memory database: use `":memory:"` string
- Pool uses NimblePool (connections share one database instance)
- Hugeint values auto-converted to Elixir integers
- Extensions loaded via SQL: `INSTALL 'parquet'; LOAD 'parquet';`
- File-based SQL uses `:backend` option instead of `:repo`

### DuckDB Pool Features

**Prepared Statement Caching**: Pool queries automatically cache prepared statements per connection. Repeated queries with the same SQL skip the prepare step.

```elixir
# Caching is enabled by default
SqlKit.query_all!(pool, "SELECT * FROM events WHERE id = $1", [1])
SqlKit.query_all!(pool, "SELECT * FROM events WHERE id = $1", [2]) # uses cached statement

# Disable caching for specific queries
SqlKit.DuckDB.Pool.query!(pool, sql, params, cache: false)
```

**Streaming Large Results**: For memory-efficient processing of large result sets:

```elixir
# Direct connection streaming
conn
|> SqlKit.DuckDB.stream!("SELECT * FROM large_table", [])
|> Stream.flat_map(& &1)
|> Enum.take(100)

# With column names
{columns, stream} = SqlKit.DuckDB.stream_with_columns!(conn, sql, [])

# Pool streaming (callback-based to manage connection lifecycle)
SqlKit.DuckDB.Pool.with_stream!(pool, "SELECT * FROM events", [], fn stream ->
stream |> Stream.flat_map(& &1) |> Enum.count()
end)

# File-based SQL streaming (DuckDB backends only)
MyApp.Analytics.SQL.with_stream!("large_query.sql", [], fn stream ->
stream |> Stream.flat_map(& &1) |> Enum.take(1000)
end)
```

**Pool Timeout**: All pool operations accept a `:timeout` option (default: 5000ms):

```elixir
SqlKit.DuckDB.Pool.query!(pool, sql, params, timeout: 10_000)
SqlKit.DuckDB.Pool.checkout!(pool, fn conn -> ... end, timeout: 10_000)
```

## Configuration

Users configure in their app's config:

```elixir
# config/config.exs
config :my_app, SqlKit,
root_sql_dir: "priv/repo/sql" # default

# config/dev.exs and config/test.exs
config :my_app, SqlKit,
load_sql: :dynamic # read from disk at runtime

# config/prod.exs (or rely on default)
config :my_app, SqlKit,
load_sql: :compiled # use compile-time embedded SQL (default)
```

## Commands

```bash
mix check # Run all checks (format, compile, dialyzer, credo, sobelow, test)
mix test # Run tests (requires all databases running via Docker)
mix format # Format code
mix credo # Linting
mix dialyzer # Type checking
mix sobelow # Security analysis
```

## Dependencies

Runtime:
- ecto_sql ~> 3.0
- nimble_pool ~> 1.1
- postgrex ~> 0.19 (optional)
- myxql ~> 0.7 (optional)
- ecto_sqlite3 ~> 0.18 (optional)
- tds ~> 2.3 (optional)
- ecto_ch ~> 0.7 (optional)
- duckdbex ~> 0.3.19 (optional)

Dev/Test:
- ex_check ~> 0.16
- credo ~> 1.7
- dialyxir ~> 1.4
- sobelow ~> 0.14
- styler ~> 1.10
Loading