Skip to content

Commit c83d567

Browse files
phernandezgithub-actions[bot]claude
authored
fix: enable WAL mode and add Windows-specific SQLite optimizations (#316)
Signed-off-by: phernandez <[email protected]> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Paul Hernandez <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent ace6a0f commit c83d567

File tree

4 files changed

+847
-3
lines changed

4 files changed

+847
-3
lines changed
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: 'SPEC-14: Cloud Git Versioning & GitHub Backup'
3+
type: spec
4+
permalink: specs/spec-14-cloud-git-versioning
5+
tags:
6+
- git
7+
- github
8+
- backup
9+
- versioning
10+
- cloud
11+
related:
12+
- specs/spec-9-multi-project-bisync
13+
- specs/spec-9-follow-ups-conflict-sync-and-observability
14+
status: deferred
15+
---
16+
17+
# SPEC-14: Cloud Git Versioning & GitHub Backup
18+
19+
**Status: DEFERRED** - Postponed until multi-user/teams feature development. Using S3 versioning (SPEC-9.1) for v1 instead.
20+
21+
## Why Deferred
22+
23+
**Original goals can be met with simpler solutions:**
24+
- Version history → **S3 bucket versioning** (automatic, zero config)
25+
- Offsite backup → **Tigris global replication** (built-in)
26+
- Restore capability → **S3 version restore** (`bm cloud restore --version-id`)
27+
- Collaboration → **Deferred to teams/multi-user feature** (not v1 requirement)
28+
29+
**Complexity vs value trade-off:**
30+
- Git integration adds: committer service, puller service, webhooks, LFS, merge conflicts
31+
- Risk: Loop detection between Git ↔ rclone bisync ↔ local edits
32+
- S3 versioning gives 80% of value with 5% of complexity
33+
34+
**When to revisit:**
35+
- Teams/multi-user features (PR-based collaboration workflow)
36+
- User requests for commit messages and branch-based workflows
37+
- Need for fine-grained audit trail beyond S3 object metadata
38+
39+
---
40+
41+
## Original Specification (for reference)
42+
43+
## Why
44+
Early access users want **transparent version history**, easy **offsite backup**, and a familiar **restore/branching** workflow. Git/GitHub integration would provide:
45+
- Auditable history of every change (who/when/why)
46+
- Branches/PRs for review and collaboration
47+
- Offsite private backup under the user's control
48+
- Escape hatch: users can always `git clone` their knowledge base
49+
50+
**Note:** These goals are now addressed via S3 versioning (SPEC-9.1) for single-user use case.
51+
52+
## Goals
53+
- **Transparent**: Users keep using Basic Memory; Git runs behind the scenes.
54+
- **Private**: Push to a **private GitHub repo** that the user owns (or tenant org).
55+
- **Reliable**: No data loss, deterministic mapping of filesystem ↔ Git.
56+
- **Composable**: Plays nicely with SPEC‑9 bisync and upcoming conflict features (SPEC‑9 Follow‑Ups).
57+
58+
**Non‑Goals (for v1):**
59+
- Fine‑grained per‑file encryption in Git history (can be layered later).
60+
- Large media optimization beyond Git LFS defaults.
61+
62+
## User Stories
63+
1. *As a user*, I connect my GitHub and choose a private backup repo.
64+
2. *As a user*, every change I make in cloud (or via bisync) is **committed** and **pushed** automatically.
65+
3. *As a user*, I can **restore** a file/folder/project to a prior version.
66+
4. *As a power user*, I can **git pull/push** directly to collaborate outside the app.
67+
5. *As an admin*, I can enforce repo ownership (tenant org) and least‑privilege scopes.
68+
69+
## Scope
70+
- **In scope:** Full repo backup of `/app/data/` (all projects) with optional selective subpaths.
71+
- **Out of scope (v1):** Partial shallow mirrors; encrypted Git; cross‑provider SCM (GitLab/Bitbucket).
72+
73+
## Architecture
74+
### Topology
75+
- **Authoritative working tree**: `/app/data/` (bucket mount) remains the source of truth (SPEC‑9).
76+
- **Bare repo** lives alongside: `/app/git/${tenant}/knowledge.git` (server‑side).
77+
- **Mirror remote**: `github.com/<owner>/<repo>.git` (private).
78+
79+
```mermaid
80+
flowchart LR
81+
A[/Users & Agents/] -->|writes/edits| B[/app/data/]
82+
B -->|file events| C[Committer Service]
83+
C -->|git commit| D[(Bare Repo)]
84+
D -->|push| E[(GitHub Private Repo)]
85+
E -->|webhook (push)| F[Puller Service]
86+
F -->|git pull/merge| D
87+
D -->|checkout/merge| B
88+
```
89+
90+
### Services
91+
- **Committer Service** (daemon):
92+
- Watches `/app/data/` for changes (inotify/poll)
93+
- Batches changes (debounce e.g. 2–5s)
94+
- Writes `.bmmeta` (if present) into commit message trailer (see Follow‑Ups)
95+
- `git add -A && git commit -m "chore(sync): <summary>
96+
97+
BM-Meta: <json>"`
98+
- Periodic `git push` to GitHub mirror (configurable interval)
99+
- **Puller Service** (webhook target):
100+
- Receives GitHub webhook (push) → `git fetch`
101+
- **Fast‑forward** merges to `main` only; reject non‑FF unless policy allows
102+
- Applies changes back to `/app/data/` via clean checkout
103+
- Emits sync events for Basic Memory indexers
104+
105+
### Auth & Security
106+
- **GitHub App** (recommended): minimal scopes: `contents:read/write`, `metadata:read`, webhook.
107+
- Tenant‑scoped installation; repo created in user account or tenant org.
108+
- Tokens stored in KMS/secret manager; rotated automatically.
109+
- Optional policy: allow only **FF merges** on `main`; non‑FF requires PR.
110+
111+
### Repo Layout
112+
- **Monorepo** (default): one repo per tenant mirrors `/app/data/` with subfolders per project.
113+
- Optional multi‑repo mode (later): one repo per project.
114+
115+
### File Handling
116+
- Honor `.gitignore` generated from `.bmignore.rclone` + BM defaults (cache, temp, state).
117+
- **Git LFS** for large binaries (images, media) — auto track by extension/size threshold.
118+
- Normalize newline + Unicode (aligns with Follow‑Ups).
119+
120+
### Conflict Model
121+
- **Primary concurrency**: SPEC‑9 Follow‑Ups (`.bmmeta`, conflict copies) stays the first line of defense.
122+
- **Git merges** are a **secondary** mechanism:
123+
- Server only auto‑merges **text** conflicts when trivial (FF or clean 3‑way).
124+
- Otherwise, create `name (conflict from <branch>, <ts>).md` and surface via events.
125+
126+
### Data Flow vs Bisync
127+
- Bisync (rclone) continues between local sync dir ↔ bucket.
128+
- Git sits **cloud‑side** between bucket and GitHub.
129+
- On **pull** from GitHub → files written to `/app/data/` → picked up by indexers & eventually by bisync back to users.
130+
131+
## CLI & UX
132+
New commands (cloud mode):
133+
- `bm cloud git connect` — Launch GitHub App installation; create private repo; store installation id.
134+
- `bm cloud git status` — Show connected repo, last push time, last webhook delivery, pending commits.
135+
- `bm cloud git push` — Manual push (rarely needed).
136+
- `bm cloud git pull` — Manual pull/FF (admin only by default).
137+
- `bm cloud snapshot -m "message"` — Create a tagged point‑in‑time snapshot (git tag).
138+
- `bm restore <path> --to <commit|tag>` — Restore file/folder/project to prior version.
139+
140+
Settings:
141+
- `bm config set git.autoPushInterval=5s`
142+
- `bm config set git.lfs.sizeThreshold=10MB`
143+
- `bm config set git.allowNonFF=false`
144+
145+
## Migration & Backfill
146+
- On connect, if repo empty: initial commit of entire `/app/data/`.
147+
- If repo has content: require **one‑time import** path (clone to staging, reconcile, choose direction).
148+
149+
## Edge Cases
150+
- Massive deletes: gated by SPEC‑9 `max_delete` **and** Git pre‑push hook checks.
151+
- Case changes and rename detection: rely on git rename heuristics + Follow‑Ups move hints.
152+
- Secrets: default ignore common secret patterns; allow custom deny list.
153+
154+
## Telemetry & Observability
155+
- Emit `git_commit`, `git_push`, `git_pull`, `git_conflict` events with correlation IDs.
156+
- `bm sync --report` extended with Git stats (commit count, delta bytes, push latency).
157+
158+
## Phased Plan
159+
### Phase 0 — Prototype (1 sprint)
160+
- Server: bare repo init + simple committer (batch every 10s) + manual GitHub token.
161+
- CLI: `bm cloud git connect --token <PAT>` (dev‑only)
162+
- Success: edits in `/app/data/` appear in GitHub within 30s.
163+
164+
### Phase 1 — GitHub App & Webhooks (1–2 sprints)
165+
- Switch to GitHub App installs; create private repo; store installation id.
166+
- Committer hardened (debounce 2–5s, backoff, retries).
167+
- Puller service with webhook → FF merge → checkout to `/app/data/`.
168+
- LFS auto‑track + `.gitignore` generation.
169+
- CLI surfaces status + logs.
170+
171+
### Phase 2 — Restore & Snapshots (1 sprint)
172+
- `bm restore` for file/folder/project with dry‑run.
173+
- `bm cloud snapshot` tags + list/inspect.
174+
- Policy: PR‑only non‑FF, admin override.
175+
176+
### Phase 3 — Selective & Multi‑Repo (nice‑to‑have)
177+
- Include/exclude projects; optional per‑project repos.
178+
- Advanced policies (branch protections, required reviews).
179+
180+
## Acceptance Criteria
181+
- Changes to `/app/data/` are committed and pushed automatically within configurable interval (default ≤5s).
182+
- GitHub webhook pull results in updated files in `/app/data/` (FF‑only by default).
183+
- LFS configured and functioning; large files don't bloat history.
184+
- `bm cloud git status` shows connected repo and last push/pull times.
185+
- `bm restore` restores a file/folder to a prior commit with a clear audit trail.
186+
- End‑to‑end works alongside SPEC‑9 bisync without loops or data loss.
187+
188+
## Risks & Mitigations
189+
- **Loop risk (Git ↔ Bisync)**: Writes to `/app/data/` → bisync → local → user edits → back again. *Mitigation*: Debounce, commit squashing, idempotent `.bmmeta` versioning, and watch exclusion windows during pull.
190+
- **Repo bloat**: Lots of binary churn. *Mitigation*: default LFS, size threshold, optional media‑only repo later.
191+
- **Security**: Token leakage. *Mitigation*: GitHub App with short‑lived tokens, KMS storage, scoped permissions.
192+
- **Merge complexity**: Non‑trivial conflicts. *Mitigation*: prefer FF; otherwise conflict copies + events; require PR for non‑FF.
193+
194+
## Open Questions
195+
- Do we default to **monorepo** per tenant, or offer project‑per‑repo at connect time?
196+
- Should `restore` write to a branch and open a PR, or directly modify `main`?
197+
- How do we expose Git history in UI (timeline view) without users dropping to CLI?
198+
199+
## Appendix: Sample Config
200+
```json
201+
{
202+
"git": {
203+
"enabled": true,
204+
"repo": "https://github.com/<owner>/<repo>.git",
205+
"autoPushInterval": "5s",
206+
"allowNonFF": false,
207+
"lfs": { "sizeThreshold": 10485760 }
208+
}
209+
}
210+
```

0 commit comments

Comments
 (0)