0.3.0

BYK released this 26 Feb 15:51

· 101 commits to main since this release

545bb85

What's changed

New eval infrastructure

Self-contained session eval harness (eval/session_eval.ts): loads session transcripts from JSON files, distills on the fly, seeds the DB so the recall tool works during evaluation, and compares against OpenCode's actual compaction behavior (summary of early messages + 80K tail window)
20 questions across two real coding sessions (113K and 353K tokens)
Token tracking with cost-per-correct-answer metrics

Results (Claude Sonnet 4)

Mode	Score	Cost
Default (compaction + 80K tail)	10/20 (50%)	$8.14
Lore (distillation + recall)	17/20 (85%)	$1.87

Lore's 35pp accuracy advantage comes entirely from early/mid-session details outside the tail window. Late details are tied. Cost per correct answer: $0.11 vs $0.81 (7.4x cheaper).

Bug fixes

agents-file: sort category headings alphabetically in AGENTS.md export
ltm test: fix monotonic ID test failing on fast CI (Date.now() collision)
src/index.ts: session error handler now skips eval/child sessions

Eval harness improvements

All eval files import DISTILLATION_SYSTEM from src/prompt (DRY)
backfill.ts: --wipe flag to clear old distillations before re-distilling
coding_eval.ts: token tracking, stronger eval session isolation
Removed LongMemEval harness and old result files

Assets 2