Skip to content

Migrate storage formats: SQLite → JSONL/Markdown#110

Merged
priyanshujain merged 27 commits intomasterfrom
refactor-file-formats
Mar 20, 2026
Merged

Migrate storage formats: SQLite → JSONL/Markdown#110
priyanshujain merged 27 commits intomasterfrom
refactor-file-formats

Conversation

@priyanshujain
Copy link
Copy Markdown
Collaborator

Summary

Migrates 5 storage layers from SQLite to file-based formats that better match their access patterns:

  • Audit log → JSONL — append-only, zero read consumers, fire-and-forget
  • Usage records → JSONL — append-only with date/model aggregation queries
  • Search history → JSONL — split from websearch SQLite (cache tables stay)
  • User memories → Markdown- [id] content bullets in category files, optimized for LLM consumption
  • Conversation history → JSONL — per-session files with sessions.jsonl index

Each phase replaces the SQLite store.DB dependency with direct file I/O (*os.File + json.Encoder for JSONL, atomic rename for Markdown). No migration needed — alpha stage, clean replacement.

Unchanged (stay SQLite): contacts, Gmail, WhatsApp, iMessage, Apple Notes, web search cache, OAuth tokens, scheduler, jobs queue.

Key changes

  • 48 files changed, ~1556 insertions, ~1735 deletions (net -179 lines)
  • service/memory/store.go: Store struct with sync.RWMutex, .counter file for globally unique IDs, atomic writes via tmp+rename
  • service/history/store.go: Store struct with sync.Mutex, sessions.jsonl index + per-session JSONL files
  • API change: UpsertConversation returns error (not int64, error); callers use sessionID directly
  • config/paths.go: added AuditJSONLPath(), UsageJSONLPath(), UserMemoryDir(), HistoryDir()
  • Removed memory, history, usage from server.migrateDBs()

Test plan

  • go build ./... — compiles clean
  • go test ./... — all 40 packages pass (0 failures)
  • Phase 1: TestLog, TestTruncation, TestNilSafe, TestClose + 3 bonus tests
  • Phase 2: TestRecord, TestQueryDailyGrouping, TestQueryMonthlyGrouping, TestQueryFilterByModel, TestQueryFilterByDateRange, TestRecorderIntegration
  • Phase 3: TestPutSearchHistory, TestLoadSearchHistory, TestCountSearchHistory, TestCacheTablesUnchanged
  • Phase 4: TestAdd, TestGet, TestUpdate, TestDelete, TestList, TestListByCategory, TestSearch, TestCount, TestIDGloballyUnique, TestAtomicWrite + reconcile & format tests
  • Phase 5: TestUpsertConversation, TestSaveMessage, TestLoadRecentSession (3 variants), TestLoadSessionMessages, TestEndSession, TestCountConversations, TestLastCaptureTime, TestCapture, TestCaptureIdempotent, TestLoadRecentUserMessages
  • Integration tests with real LLM providers (OpenAI, Gemini, Gemini Vertex) pass
  • Manual smoke: obk chat → tool use → verify ~/.obk/audit/audit.jsonl
  • Manual smoke: obk chatobk usage daily
  • Manual smoke: obk memory add/list/delete
  • Manual smoke: obk chat → exit → reopen → session restores

Replace SQLite-backed audit logger with append-only JSONL file writer
for better crash safety and debuggability.
Switch all audit logger callers from AuditDBPath to AuditJSONLPath
and update registry test to verify JSONL output.
Replace SQLite-backed usage store with JSONL file operations.
Record() appends one line per API call, Query() aggregates on read.
Switch all usage callers from SQLite DB to UsageJSONLPath.
Remove usage from server migrateDBs since it's no longer SQL.
Move putSearchHistory to append JSONL file. Add LoadSearchHistory and
countSearchHistory. Remove search_history table from SQL schemas.
Update CLI history command, web_deps, and tests to use JSONL search
history instead of SQLite search_history table.
Replace SQLite-backed memory store with Markdown files per category.
Each bullet has [id] prefix for addressability. Atomic writes via
tmp+rename. Global counter in .counter file.
Update all memory tests to use NewStore(t.TempDir()) instead of
SQLite testDB. All 35 tests pass including LLM integration tests.
Replace SQLite-based memory operations with the new
memory.NewStore(dir) API in all CLI memory subcommands.
Remove memory from migrateDBs, update handler_memory.go to use
memory.NewStore, and update tests to use EnsureDir instead of Migrate.
Replace store.Open + memory.Migrate pattern with memory.NewStore
in extractMemories, userMemoriesPrompt, and all related tests.
Replace createMemoryDB and GivenMemories to use memory.NewStore
instead of SQLite-based memory.Migrate and memory.Add.
Replace SQLite-based history storage with JSONL files:
- sessions.jsonl index with per-session JSONL message files
- Store struct with method-based API using sessionID directly
- EnsureDir replaces Migrate for directory setup
Capture now takes *Store instead of *store.DB. All tests
rewritten to use testStore(t) and the new method-based API.
Replace SQLite-based history API with JSONL Store methods in chat,
capture, status, and their tests.
Replace SQLite history queries with Store.LoadRecentUserMessages in
memory extract handler and CLI command. Remove history from migrateDBs.
Replace SQLite-based history operations with JSONL Store methods in
session manager, session tests, and integration tests.
Add TestCacheTablesUnchanged verifying search_history table was removed
from SQLite schema. Add TestAtomicWrite verifying concurrent memory
writes don't corrupt data.
Validate sessionID matches ^[a-zA-Z0-9_-]+$ and use filepath.Base
as defense-in-depth. Reject slashes, dots, spaces, and empty strings.
Extend bullet format from `- [id] content` to `- [id|source] content`.
The regex is backward-compatible with the old format (source defaults
to empty). Add() was accepting source/sourceRef but silently dropping
them.
Only update sessions.jsonl on UpsertConversation and EndSession.
SaveMessage now only writes to the per-session file. Prevents
index bloat (N lines per N messages for the same session).
LoadSessionMessages with a limit now returns the tail of the
conversation (most recent messages) rather than the head.
Add slog.Warn calls when JSON unmarshal fails in history store, usage
store, and websearch cache. Makes corruption visible instead of
silently dropping data.
Use sort.Slice for O(n log n) sorting of session entries in
LoadRecentUserMessages.
Verify that a corrupted .counter file doesn't crash the store —
it recovers by resetting the counter, and both old and new memories
remain retrievable.
All other migrated packages use EnsureDir. Update the usage package
and all 3 callers to match.
@priyanshujain priyanshujain merged commit f8b7566 into master Mar 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant