Skip to content

Commit aeb28ac

Browse files
Merge pull request #18 from christopherkarani/2614
Investigate WAL compaction gains
2 parents ead7a08 + b36d881 commit aeb28ac

File tree

94 files changed

+14790
-393
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+14790
-393
lines changed

README.md

Lines changed: 106 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<p align="center">
2121
<img src="https://img.shields.io/badge/Swift-6.2-orange.svg" alt="Swift 6.2">
2222
<img src="https://img.shields.io/badge/platforms-iOS%2026%20%7C%20macOS%2026-blue.svg" alt="Platforms">
23-
<img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
23+
<img src="https://img.shields.io/badge/license-Apache_2.0-green.svg" alt="License">
2424
</p>
2525

2626
---
@@ -104,16 +104,120 @@ Cold Open → First Query: 17ms
104104
Hybrid Search @ 10K docs: 105ms
105105
```
106106

107+
### Core Benchmark Baselines (as of February 17, 2026)
108+
109+
These are reproducible XCTest benchmark baselines captured from the current Wax benchmark harness.
110+
111+
#### Ingest throughput (`testIngestHybridBatchedPerformance`)
112+
113+
| Workload | Time | Throughput |
114+
|:---|---:|---:|
115+
| smoke (200 docs) | `0.103s` | `~1941.7 docs/s` |
116+
| standard (1000 docs) | `0.309s` | `~3236.2 docs/s` |
117+
| stress (5000 docs) | `2.864s` | `~1745.8 docs/s` |
118+
| 10k | `7.756s` | `~1289.3 docs/s` |
119+
120+
#### Search latency
121+
122+
| Workload | Time | Throughput |
123+
|:---|---:|---:|
124+
| warm CPU smoke | `0.0015s` | `~666.7 ops/s` |
125+
| warm CPU standard | `0.0033s` | `~303.0 ops/s` |
126+
| warm CPU stress | `0.0072s` | `~138.9 ops/s` |
127+
| 10k CPU hybrid iteration | `0.103s` | `~9.7 ops/s` |
128+
129+
#### Recall latency (`testMemoryOrchestratorRecallPerformance`)
130+
131+
| Workload | Time |
132+
|:---|---:|
133+
| smoke | `0.103s` |
134+
| standard | `0.101s` |
135+
136+
Stress recall is currently harness-blocked (`signal 11`) and treated as a known benchmark issue.
137+
138+
#### FastRAG builder
139+
140+
| Mode | Time |
141+
|:---|---:|
142+
| fast mode | `0.102s` |
143+
| dense cached | `0.102s` |
144+
145+
For benchmark commands, profiling traces, and methodology, see:
146+
- `/Users/chriskarani/CodingProjects/Wax/Tasks/hot-path-specialization-investigation.md`
147+
107148
*No, that's not a typo. GPU vector search really is sub-millisecond.*
108149

109150
---
110151

152+
## WAL Compaction and Storage Health (2026-02)
153+
154+
Wax now includes a WAL/storage health track focused on commit latency tails, long-run file growth, and recovery behavior:
155+
156+
- No-op index compaction guards to avoid unnecessary index rewrites.
157+
- Single-pass WAL replay with guarded replay snapshot fast path.
158+
- Proactive WAL-pressure commits for targeted workloads (guarded rollout).
159+
- Scheduled `rewriteLiveSet` maintenance with dead-payload thresholds, validation, and rollback.
160+
161+
### Measured outcomes
162+
163+
- Repeated unchanged index compaction growth improved from `+61,768,464` bytes over 8 runs (`~7.72MB/run`) to bounded drift (test-gated).
164+
- Commit latency improved in most matrix workloads in recent runs (examples: `medium_hybrid` p95 `-13.9%`, `large_text_10k` p95 `-8.0%`, `sustained_write_text` p95 `-5.7%`).
165+
- Reopen/recovery p95 is generally flat-to-improved across the matrix.
166+
- `sustained_write_hybrid` remains workload-sensitive, so proactive/scheduled maintenance stays guarded by default.
167+
168+
### Safe rollout defaults
169+
170+
- Proactive pressure commits are tuned for targeted workloads and validated with percentile guardrails.
171+
- Replay snapshot open-path optimization is additive and guarded.
172+
- Scheduled live-set rewrite is configurable and runs deferred from the `flush()` hot path.
173+
- Rewrite candidates are automatically validated and rolled back on verification failure.
174+
175+
### Configure scheduled live-set rewrite
176+
177+
```swift
178+
import Wax
179+
180+
var config = OrchestratorConfig.default
181+
config.liveSetRewriteSchedule = LiveSetRewriteSchedule(
182+
enabled: true,
183+
checkEveryFlushes: 32,
184+
minDeadPayloadBytes: 64 * 1024 * 1024,
185+
minDeadPayloadFraction: 0.25,
186+
minimumCompactionGainBytes: 0,
187+
minimumIdleMs: 15_000,
188+
minIntervalMs: 5 * 60_000,
189+
verifyDeep: false
190+
)
191+
```
192+
193+
### Reproduce benchmark matrix
194+
195+
```bash
196+
WAX_BENCHMARK_WAL_COMPACTION=1 \
197+
WAX_BENCHMARK_WAL_OUTPUT=/tmp/wal-matrix.json \
198+
swift test --filter WALCompactionBenchmarks.testWALCompactionWorkloadMatrix
199+
```
200+
201+
```bash
202+
WAX_BENCHMARK_WAL_GUARDRAILS=1 \
203+
swift test --filter WALCompactionBenchmarks.testProactivePressureCommitGuardrails
204+
```
205+
206+
```bash
207+
WAX_BENCHMARK_WAL_REOPEN_GUARDRAILS=1 \
208+
swift test --filter WALCompactionBenchmarks.testReplayStateSnapshotGuardrails
209+
```
210+
211+
See `/Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-investigation.md` and `/Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-baseline.json` for methodology and full baseline artifacts.
212+
213+
---
214+
111215
## Quick Start
112216

113217
### 1. Add to Package.swift
114218

115219
```swift
116-
.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.1")
220+
.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")
117221
```
118222

119223
### 2. Choose Your Memory Type

SHOW_HN_POST.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Show HN Post
2+
3+
**Title:** `Show HN: Wax -- On-device multimodal RAG for iOS/macOS with Metal GPU search`
4+
5+
**URL:** `https://github.com/christopherkarani/Wax`
6+
7+
---
8+
9+
Hey HN,
10+
11+
I built Wax, an open-source Swift framework for on-device Retrieval-Augmented Generation. It indexes text, photos, and videos into a single portable file and searches them with sub-millisecond latency -- with no server, no API calls, and no data leaving the device.
12+
13+
**Why I built this:** Every RAG solution I found required either a cloud vector database (Pinecone, Weaviate) or a local server process (ChromaDB, Qdrant). I wanted something that works like SQLite -- import the library, open a file, query it. Except for multimodal content with hybrid search.
14+
15+
**What it does:**
16+
17+
- **Single-file storage (`.mv2s`)** -- Everything lives in one crash-safe binary file: embeddings, BM25 index, metadata, compressed payloads. You can sync it via iCloud, email it, or commit it to git. Dual-header atomic writes with generation counters mean you can kill -9 mid-write and never corrupt the database.
18+
19+
- **Metal GPU vector search** -- Vectors live directly in Apple Silicon unified memory (`MTLBuffer`). Zero CPU-GPU copy. Adaptive SIMD4/SIMD8 kernels based on embedding dimensions. GPU-side bitonic sort for top-K. Result: **sub-millisecond search on 10K+ vectors** (vs ~100ms on CPU). Falls back to USearch HNSW on non-Metal hardware.
20+
21+
- **Hybrid search with query-adaptive fusion** -- Four parallel search lanes (BM25, vector, timeline, structured memory) fused with Reciprocal Rank Fusion. A lightweight rule-based classifier detects query intent (factual -> boost BM25, temporal -> boost timeline, semantic -> boost vector). Deterministic tie-breaking means identical queries always produce identical results.
22+
23+
- **Photo RAG** -- Indexes your photo library with OCR, captions, GPS binning (~1km resolution), and per-region embeddings. Query "find that receipt from the restaurant" and it searches OCR text, image similarity, and location simultaneously. Fully offline -- iCloud-only photos get metadata-only indexing (marked as degraded, never silently downloaded).
24+
25+
- **Video RAG** -- Segments videos into configurable time windows, extracts keyframe embeddings, and maps transcripts to segments. Results include timecodes so you can jump to the exact moment. Capture-time semantics: "videos from last week" filters by recording date, not segment position.
26+
27+
- **Deterministic context assembly** -- `FastRAGContextBuilder` produces identical output for identical input under strict token budgets. Three-tier surrogate compression (full/gist/micro) adapts based on memory age and importance. Uses bundled cl100k_base BPE tokenization -- no network, no nondeterminism.
28+
29+
- **Bring your own model** -- Wax ships no ML models by default (optional built-in MiniLM via Swift package trait). You provide embedders, OCR, captions, and transcripts via protocols. Each provider declares `onDeviceOnly` or `networkOptional`, validated at init.
30+
31+
**Technical details:**
32+
33+
- 22K lines of Swift 6.2 (strict concurrency), 496 lines of Metal shaders
34+
- Every orchestrator is a Swift actor -- thread safety proven at compile time
35+
- Custom binary codec (little-endian, deterministic serialization, SHA256 checksums)
36+
- Two-phase indexing: stage to WAL, commit atomically
37+
- 91 test files covering integration, property-based, and stress scenarios
38+
- iOS 26+ / macOS 26+
39+
40+
**Quick start:**
41+
42+
```swift
43+
import Wax
44+
45+
let brain = try await MemoryOrchestrator(
46+
at: URL(fileURLWithPath: "brain.mv2s")
47+
)
48+
49+
// Remember
50+
try await brain.remember(
51+
"User prefers dark mode and gets headaches from bright screens",
52+
metadata: ["source": "onboarding"]
53+
)
54+
55+
// Recall with RAG
56+
let context = try await brain.recall(query: "user preferences")
57+
for item in context.items {
58+
print("[\(item.kind)] \(item.text)")
59+
}
60+
```
61+
62+
For more control, the low-level API exposes the full storage engine:
63+
64+
```swift
65+
import Wax
66+
import WaxCore
67+
68+
let store = try await Wax.create(at: fileURL)
69+
let session = try await WaxSession(wax: store, mode: .readWrite())
70+
71+
let content = Data("Meeting notes from Q4 planning...".utf8)
72+
try await session.put(content, options: FrameMetaSubset(
73+
kind: "note.meeting",
74+
searchText: "Meeting notes from Q4 planning...",
75+
metadata: Metadata(["date": "2026-01-15"])
76+
))
77+
try await session.commit()
78+
79+
let response = try await session.search(
80+
SearchRequest(query: "Q4 planning decisions", topK: 5)
81+
)
82+
```
83+
84+
**What it's not:**
85+
- Not a cloud service. No telemetry. No vendor lock-in.
86+
- Not an LLM. Wax retrieves context for your LLM of choice.
87+
- Not Python. This is native Swift, optimized for Apple Silicon.
88+
89+
Feedback welcome. The framework is early but the core architecture (storage format, search pipeline, concurrency model) is stable.
90+
91+
GitHub: https://github.com/christopherkarani/Wax

Sources/Wax/Adapters/CLAUDE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<claude-mem-context>
2+
# Recent Activity
3+
4+
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
5+
6+
*No recent activity*
7+
</claude-mem-context>

Sources/Wax/Embeddings/CLAUDE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<claude-mem-context>
2+
# Recent Activity
3+
4+
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
5+
6+
*No recent activity*
7+
</claude-mem-context>

Sources/Wax/Embeddings/EmbeddingMemoizer.swift

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,13 @@ actor EmbeddingMemoizer {
160160
}
161161
}
162162

163+
extension EmbeddingMemoizer {
164+
static func fromConfig(capacity: Int, enabled: Bool = true) -> EmbeddingMemoizer? {
165+
guard enabled, capacity > 0 else { return nil }
166+
return EmbeddingMemoizer(capacity: capacity)
167+
}
168+
}
169+
163170
enum EmbeddingKey {
164171
static func make(text: String, identity: EmbeddingIdentity?, dimensions: Int, normalized: Bool) -> UInt64 {
165172
var hasher = FNV1a64()

Sources/Wax/Ingest/CLAUDE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,12 @@ All PDF code is wrapped in `#if canImport(PDFKit)`. This means:
3333
- `PDFKit` (Apple platforms only)
3434
- `Foundation`
3535
- `MemoryOrchestrator` (from parent Wax module)
36+
37+
38+
<claude-mem-context>
39+
# Recent Activity
40+
41+
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
42+
43+
*No recent activity*
44+
</claude-mem-context>
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
import Foundation
2+
3+
/// Errors that can occur while ingesting a local text file.
4+
public enum FileIngestError: Error, Sendable, Equatable {
5+
case fileNotFound(url: URL)
6+
case loadFailed(url: URL)
7+
case unsupportedTextEncoding(url: URL)
8+
case emptyContent(url: URL)
9+
}
10+
11+
extension FileIngestError: LocalizedError {
12+
public var errorDescription: String? {
13+
switch self {
14+
case let .fileNotFound(url):
15+
return "File not found: \(url.path)"
16+
case let .loadFailed(url):
17+
return "File could not be read: \(url.path)"
18+
case let .unsupportedTextEncoding(url):
19+
return "File is not UTF-8 text: \(url.path)"
20+
case let .emptyContent(url):
21+
return "File has no text content: \(url.path)"
22+
}
23+
}
24+
}

Sources/Wax/Ingest/TextChunker.swift

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,17 @@ public enum TextChunker {
2222
let cappedTarget = max(1, targetTokens)
2323
let cappedOverlap = max(0, overlapTokens)
2424

25-
guard let counter = try? await TokenCounter.shared() else { return [text] }
25+
let counter: TokenCounter
26+
do {
27+
counter = try await TokenCounter.shared()
28+
} catch {
29+
WaxDiagnostics.logSwallowed(
30+
error,
31+
context: "text chunker token counter init",
32+
fallback: "character-preserving unsplit text"
33+
)
34+
return [text]
35+
}
2636
let tokens = await counter.encode(text)
2737
if tokens.count <= cappedTarget {
2838
return [text]
@@ -58,7 +68,15 @@ public enum TextChunker {
5868

5969
return AsyncStream { continuation in
6070
Task {
61-
guard let counter = try? await TokenCounter.shared() else {
71+
let counter: TokenCounter
72+
do {
73+
counter = try await TokenCounter.shared()
74+
} catch {
75+
WaxDiagnostics.logSwallowed(
76+
error,
77+
context: "text chunker stream token counter init",
78+
fallback: "stream original unsplit text"
79+
)
6280
continuation.yield(text)
6381
continuation.finish()
6482
return

Sources/Wax/Maintenance/CLAUDE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<claude-mem-context>
2+
# Recent Activity
3+
4+
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
5+
6+
*No recent activity*
7+
</claude-mem-context>
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import Foundation
2+
3+
public struct LiveSetRewriteOptions: Sendable, Equatable {
4+
/// Allow replacing an existing destination file.
5+
public var overwriteDestination: Bool
6+
7+
/// Replace payload bytes for non-live frames (deleted/superseded) with empty payloads.
8+
public var dropNonLivePayloads: Bool
9+
10+
/// Run `Wax.verify(deep:)` on the rewritten file before returning.
11+
public var verifyDeep: Bool
12+
13+
public init(
14+
overwriteDestination: Bool = false,
15+
dropNonLivePayloads: Bool = true,
16+
verifyDeep: Bool = false
17+
) {
18+
self.overwriteDestination = overwriteDestination
19+
self.dropNonLivePayloads = dropNonLivePayloads
20+
self.verifyDeep = verifyDeep
21+
}
22+
}

0 commit comments

Comments
 (0)