An immutable database that follows Snowflake architecture, designed for scalable, replayable systems beyond analytics.
Arkilian treats SQLite files as immutable 8-128MB micro-partitions stored in object storage (S3/GCS), with a stateless query federation layer providing Snowflake-like compute/storage separation.
Unlike monolithic SQLite distribution attempts (rqlite, LiteFS), Arkilian embraces SQLite's write single-node nature and partitions at the dataset level—no VFS complexity, no WAL coordination issues, no storage duplication.
Target workloads:
- Multi-tenant SaaS event stores
- Time-series analytics
- Edge-to-cloud sync with strong eventual consistency
What you get:
- SQLite's embedded simplicity + Snowflake's elasticity
- No forks of SQLite core
- No distributed transactions
- No consensus overhead on writes
- 1.0× replication factor (vs 3-5× in LiteFS)
┌──────────────────────────────────────────────────────────────────────────────┐
│ Client Applications │
│ (HTTP/gRPC API Consumers) │
└─────────────────────────────────┬────────────────────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ arkilian- │ │ arkilian- │ │ arkilian- │
│ ingest │ │ query │ │ compact │
│ (Writers) │ │ (Readers) │ │ (Background) │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└─────────────────────────┼─────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Manifest Catalog │
│ (manifest.db) │
└──────────────────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Object Storage (S3/GCS) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ events_20260205 │ │ events_20260205 │ │ events_20260206 │ │
│ │ _001.sqlite │ │ _001.meta.json │ │ _001.sqlite │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
- Immutability: Once written, SQLite files never change. Compaction produces new files.
- Partition Autonomy: Each SQLite file is self-contained with its own indexes.
- Metadata-Driven Pruning: Bloom filters + min/max stats eliminate >99% of I/O.
- Write Isolation: Writers operate on disjoint partition keys (no distributed locking).
- Prerequisites
- Go 1.21+ in
go build ./...- Run benchmark
go test -v -bench=BenchmarkProd -run=TestProd -benchtime=2x -timeout=300s ./test/benchmark/# Run all services with a single command
./arkilian --data-dir /data/arkilian
# Or run specific services
./arkilian --mode ingest --data-dir /data/arkilian
./arkilian --mode query --data-dir /data/arkilian
./arkilian --mode compact --data-dir /data/arkiliancurl -X POST http://localhost:8080/v1/ingest \
-H "Content-Type: application/json" \
-d '{
"partition_key": "20260206",
"rows": [
{
"tenant_id": "acme",
"user_id": 12345,
"event_time": 1707235200,
"event_type": "page_view",
"payload": {"page": "/home"}
}
]
}'Response:
{
"partition_id": "events:20260206:abc123",
"row_count": 1,
"size_bytes": 4096,
"request_id": "req-xyz789"
}curl -X POST http://localhost:8081/v1/query \
-H "Content-Type: application/json" \
-d '{
"sql": "SELECT user_id, COUNT(*) FROM events WHERE tenant_id = '\''acme'\'' GROUP BY user_id LIMIT 10"
}'Response:
{
"columns": ["user_id", "count"],
"rows": [
[12345, 42],
[67890, 37]
],
"stats": {
"partitions_scanned": 5,
"partitions_pruned": 115,
"execution_time_ms": 234
},
"request_id": "req-abc123"
}| Workload Type | Partition Key | Target Size | Use Case |
|---|---|---|---|
| Time-series | YYYYMMDD |
16-64 MB | Event logs, metrics |
| Multi-tenant | tenant_id |
8-32 MB | SaaS applications |
| High-cardinality | user_id % N |
4-16 MB | User analytics |
Arkilian uses 2-phase partition pruning:
- Phase 1 (Min/Max): Query manifest catalog using column statistics
- Phase 2 (Bloom Filter): Apply bloom filters to eliminate false positives
Example: For a query on 100K partitions with tenant_id = 'acme':
- Phase 1: 100K → 500 candidates (min/max pruning)
- Phase 2: 500 → 5 partitions (bloom filter, 99% reduction)
| Metric | Target |
|---|---|
| Ingest throughput | 50K rows/sec/node |
| P95 query latency | <500ms (point queries) |
| Partition pruning ratio | 99.5% |
| Storage overhead | <5% |
| Recovery time | <10s |
# Run all tests
go test ./...
# Run benchmarks
go test -bench=. ./test/benchmark/...
# Run integration tests
go test ./test/integration/...The unified arkilian binary supports configuration via command-line flags, environment variables, and config files.
Command-line flags:
| Flag | Default | Description |
|---|---|---|
--config |
- | Path to YAML/JSON config file |
--data-dir |
./data/arkilian |
Base directory for all data files |
--mode |
all |
Service mode: all, ingest, query, compact |
--http-ingest |
:8080 |
HTTP address for ingest service |
--http-query |
:8081 |
HTTP address for query service |
--http-compact |
:8082 |
HTTP address for compaction service |
--grpc-addr |
:9090 |
gRPC server address |
Environment variables:
| Environment Variable | Default | Description |
|---|---|---|
ARKILIAN_MODE |
all |
Service mode |
ARKILIAN_DATA_DIR |
./data/arkilian |
Base data directory |
ARKILIAN_HTTP_INGEST_ADDR |
:8080 |
Ingest HTTP address |
ARKILIAN_HTTP_QUERY_ADDR |
:8081 |
Query HTTP address |
ARKILIAN_HTTP_COMPACT_ADDR |
:8082 |
Compact HTTP address |
ARKILIAN_GRPC_ADDR |
:9090 |
gRPC server address |
ARKILIAN_STORAGE_TYPE |
local |
Storage type (local, s3) |
ARKILIAN_STORAGE_PATH |
{data-dir}/storage |
Local storage path |
ARKILIAN_S3_BUCKET |
- | S3 bucket for partitions |
ARKILIAN_S3_REGION |
us-east-1 |
AWS region |
Example config file (config.yaml):
mode: all
data_dir: /data/arkilian
http:
ingest_addr: ":8080"
query_addr: ":8081"
compact_addr: ":8082"
grpc:
addr: ":9090"
enabled: true
query:
concurrency: 10
pool_size: 100
compaction:
check_interval: 5m
min_partition_size: 8388608 # 8MB
ttl_days: 7
storage:
type: local
path: /data/arkilian/storage| Environment Variable | Default | Description |
|---|---|---|
ARKILIAN_STORAGE_PATH |
/data/arkilian |
Local storage path |
ARKILIAN_MANIFEST_PATH |
/data/manifest.db |
Manifest database path |
ARKILIAN_S3_BUCKET |
- | S3 bucket for partitions |
ARKILIAN_S3_REGION |
us-east-1 |
AWS region |
ARKILIAN_HTTP_PORT |
8080 |
HTTP server port |
ARKILIAN_GRPC_PORT |
9090 |
gRPC server port |
- No distributed ACID transactions across partitions
- Eventual consistency (not strong consistency)
- Usage Guide - Complete usage documentation with real-world scenarios
- Blueprint - Original RFC and architecture design
MIT
Inspired by:
- Snowflake - Micro-partition design
- DuckDB - Embeddable analytical database
- Litestream - WAL replication
- Apache Iceberg - Metadata layer for immutable files
Arkilian includes a multi-stage Dockerfile that builds minimal, secure images for all services.
```bash
docker build -t arkilian:latest .
docker run -p 8080:8080 arkilian:latest
docker run -p 8081:8081 --entrypoint /usr/local/bin/arkilian-query arkilian:latest
docker run --entrypoint /usr/local/bin/arkilian-compact arkilian:latest ```
The project uses GitHub Actions for Continuous Integration. The workflow at `.github/workflows/ci.yaml` automatically:
- Verifies dependencies.
- Runs static analysis (`go vet`).
- Executes the full test suite with race detection.
- Verifies the Docker build.