Skip to content

Commit a677a3b

Browse files
author
Marvin Zhang
committed
feat: Add Local-First Collector Architecture specification
1 parent b5dc00d commit a677a3b

File tree

2 files changed

+258
-13
lines changed

2 files changed

+258
-13
lines changed

specs/016-automatic-historical-sync/README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -201,22 +201,25 @@ if cfg.Sync.BackgroundSync {
201201

202202
## Open Questions
203203

204-
1. **Initial sync limit**: Should we limit first sync to N days? (Proposed: 90 days)
205-
2. **Progress display**: Spinner vs progress bar vs silent?
206-
3. **Error handling**: Continue watcher if historical sync fails?
204+
1. **Workspace selection**: Add `--workspaces` filter now, or defer to 017?
205+
2. **Initial sync limit**: 90 days reasonable? Or unlimited with progress indicator?
206+
3. **Progress display**: Spinner vs progress bar vs silent?
207+
4. **Error handling**: Continue watcher if historical sync fails?
207208

208209
## Notes
209210

210-
### Prior Art
211+
### Industry Research Summary
211212

212-
- **Dropbox/iCloud**: Sync is always on, no manual steps
213-
- **Docker Desktop**: Background processes auto-start
214-
- **VSCode Settings Sync**: Just enable and it works
213+
Researched Prometheus, OpenTelemetry Collector, Fluent Bit, Vector, and Grafana Loki. Key finding: **all major observability tools buffer locally first, then export selectively**.
215214

216-
### Mental Model Shift
215+
| Tool | Pattern |
216+
|------|---------|
217+
| **Fluent Bit** | Memory + filesystem buffer → routing by tags |
218+
| **OpenTelemetry** | Receivers → Processors → fan-out to Exporters |
219+
| **Vector** | Sources → Transforms → multiple Sinks |
217220

218-
| Old (Wrong) | New (Right) |
219-
|-------------|-------------|
220-
| "Backfill" = manual import | "Sync" = automatic, continuous |
221-
| Two separate operations | One unified concept |
222-
| User runs command | System handles everything |
221+
This informed the design of [017-local-first-architecture](../017-local-first-architecture/README.md).
222+
223+
### Scope Decision
224+
225+
This spec focuses on **UX improvement** (auto-sync on startup). The full local-first architecture with multiple remotes and workspace routing is detailed in **017-local-first-architecture**.
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
---
2+
status: planned
3+
created: '2025-12-05'
4+
tags:
5+
- architecture
6+
- collector
7+
- sync
8+
- storage
9+
priority: medium
10+
created_at: '2025-12-05T05:44:09.643Z'
11+
depends_on:
12+
- 016-automatic-historical-sync
13+
updated_at: '2025-12-05T05:44:09.651Z'
14+
---
15+
16+
# Local-First Collector Architecture
17+
18+
> **Status**: 🗓️ Planned · **Priority**: Medium · **Created**: 2025-12-05 · **Depends on**: 016-automatic-historical-sync
19+
20+
## Problem Statement
21+
22+
Current architecture tries to send events directly to remote, buffering only on failure. This creates issues:
23+
24+
1. **Privacy**: All data goes to remote by default - no user control
25+
2. **Offline**: Collection fails when network is unavailable
26+
3. **Single destination**: Can't route work → team server, personal → private
27+
4. **No local querying**: Can't explore data before sharing
28+
29+
## Design
30+
31+
### Core Principle: Collect Local, Export Selective
32+
33+
Inspired by Fluent Bit, Vector, and OpenTelemetry patterns:
34+
35+
```
36+
┌─────────────────────────────────────────────────────────────┐
37+
│ DEVLOG COLLECTOR │
38+
├─────────────────────────────────────────────────────────────┤
39+
│ LAYER 1: COLLECT (always on, all workspaces) │
40+
│ Agent Logs → Parser → Local SQLite │
41+
│ • Works offline │
42+
│ • All data captured │
43+
│ • No remote dependency │
44+
├─────────────────────────────────────────────────────────────┤
45+
│ LAYER 2: EXPORT (selective, multi-destination) │
46+
│ Local SQLite → Remote(s) based on routing rules │
47+
│ • Workspace pattern matching │
48+
│ • Multiple remotes │
49+
│ • Auto-sync or manual │
50+
└─────────────────────────────────────────────────────────────┘
51+
```
52+
53+
### Data Flow
54+
55+
```
56+
Agent Logs Local Store Remote(s)
57+
│ │ │
58+
▼ ▼ ▼
59+
┌─────────┐ ┌──────────┐ ┌─────────┐
60+
│ Copilot │──┐ │ │ ┌─────▶│ Team │
61+
│ Logs │ │ │ SQLite │ │ │ Server │
62+
└─────────┘ │ Parse │ events │ Export └─────────┘
63+
├──────────▶│ .db │─────┤
64+
┌─────────┐ │ │ │ │ ┌─────────┐
65+
│ Claude │──┤ │ Cursors │ └─────▶│Personal │
66+
│ Logs │ │ │ .db │ │ Server │
67+
└─────────┘ │ └──────────┘ └─────────┘
68+
│ │
69+
┌─────────┐ │ │ Query locally
70+
│ Cursor │──┘ ▼
71+
│ Logs │ ┌──────────┐
72+
└─────────┘ │ devlog │
73+
│ query │
74+
└──────────┘
75+
```
76+
77+
### Configuration Schema
78+
79+
```yaml
80+
# ~/.devlog/config.yaml
81+
82+
collect:
83+
enabled: true
84+
agents: [copilot, claude, cursor]
85+
86+
storage:
87+
path: ~/.devlog/data/
88+
events_db: events.db
89+
cursors_db: cursors.db
90+
retention_days: 365
91+
92+
export:
93+
remotes:
94+
# Team remote - auto-sync work projects
95+
team:
96+
url: https://devlog.company.com
97+
api_key: ${DEVLOG_TEAM_API_KEY}
98+
auto_sync: true
99+
sync_interval: 30s
100+
workspaces:
101+
include:
102+
- "~/work/**"
103+
- "~/company/**"
104+
exclude:
105+
- "**/personal/**"
106+
- "**/scratch/**"
107+
108+
# Personal remote - manual export only
109+
personal:
110+
url: https://my-devlog.io
111+
api_key: ${DEVLOG_PERSONAL_API_KEY}
112+
auto_sync: false
113+
workspaces:
114+
include:
115+
- "~/personal/**"
116+
- "~/side-projects/**"
117+
118+
# Default: collect everything, export nothing (opt-in)
119+
# Or: collect everything, export to default remote (opt-out)
120+
export_default: none # or "team"
121+
```
122+
123+
### Two Cursor Types
124+
125+
```go
126+
// Collection cursor: "What have I parsed from log files?"
127+
type CollectCursor struct {
128+
AgentName string // "github-copilot"
129+
SourcePath string // "/path/to/chatSessions/abc.json"
130+
LastByteOffset int64 // Resume position in file
131+
LastEventTime time.Time // Latest event timestamp seen
132+
}
133+
134+
// Export cursor: "What have I sent to remote X?"
135+
type ExportCursor struct {
136+
RemoteName string // "team"
137+
LastEventID string // Last event ID sent
138+
LastExportTime time.Time // When last export happened
139+
Status string // "synced", "pending", "error"
140+
}
141+
```
142+
143+
This separation allows:
144+
- Collection runs independently of export
145+
- Add new remote later → backfill just that remote
146+
- Remote down → collection continues, export retries
147+
148+
### CLI Commands
149+
150+
```bash
151+
# Start collector (collection always runs)
152+
devlog start
153+
154+
# Check local data
155+
devlog query --workspace ~/work/myproject --last 7d
156+
devlog stats --local
157+
158+
# Export commands
159+
devlog export --remote team # Manual export to team
160+
devlog export --remote personal --all # Export all matching events
161+
devlog export --status # Show export status per remote
162+
163+
# Remote management
164+
devlog remote add personal https://my.devlog.io
165+
devlog remote list
166+
devlog remote test team # Test connectivity
167+
```
168+
169+
## Plan
170+
171+
### Phase 1: Refactor Storage Layer
172+
173+
- [ ] Create `internal/storage/` package
174+
- [ ] Move buffer.go logic to storage layer
175+
- [ ] Add events table with workspace metadata
176+
- [ ] Add collect_cursors table
177+
- [ ] Add export_cursors table (per-remote)
178+
179+
### Phase 2: Decouple Collection from Export
180+
181+
- [ ] Collection writes to local SQLite only
182+
- [ ] Remove direct client.SendEvent from collection path
183+
- [ ] Add workspace path extraction to events
184+
- [ ] Update BackfillManager to use new storage
185+
186+
### Phase 3: Export Manager
187+
188+
- [ ] Create `internal/export/` package
189+
- [ ] Implement ExportManager with per-remote cursors
190+
- [ ] Add workspace pattern matching (glob)
191+
- [ ] Add background export goroutine
192+
- [ ] Implement retry with exponential backoff
193+
194+
### Phase 4: Multi-Remote Configuration
195+
196+
- [ ] Extend config schema for multiple remotes
197+
- [ ] Add remote management CLI commands
198+
- [ ] Add `devlog export` CLI commands
199+
- [ ] Add export status/progress reporting
200+
201+
### Phase 5: Local Query Support
202+
203+
- [ ] Add `devlog query` command
204+
- [ ] Add `devlog stats --local` command
205+
- [ ] Simple filtering by workspace, time range, event type
206+
207+
## Test
208+
209+
- [ ] Collection works with no network (airplane mode)
210+
- [ ] Events stored locally with correct workspace metadata
211+
- [ ] Export sends only matching workspaces per remote
212+
- [ ] Add remote later → can backfill historical data
213+
- [ ] Remote down → collection continues, export retries
214+
- [ ] Multiple remotes receive correct filtered data
215+
- [ ] `devlog query` returns local data correctly
216+
217+
## Notes
218+
219+
### Industry Patterns Applied
220+
221+
| Pattern | Source | How We Apply |
222+
|---------|--------|--------------|
223+
| Memory + disk buffer | Fluent Bit | SQLite as durable store |
224+
| Fan-out to sinks | OTel, Vector | Multiple remotes |
225+
| Tag-based routing | Fluent Bit | Workspace pattern matching |
226+
| Cursor/checkpoint | All | Per-source + per-destination cursors |
227+
| Backpressure handling | Fluent Bit | Local buffer absorbs spikes |
228+
229+
### Migration Path
230+
231+
From current architecture:
232+
1. Spec 016 adds auto-sync (keep current single-remote)
233+
2. This spec adds local-first + multi-remote
234+
3. Existing users: seamless upgrade (local DB created automatically)
235+
4. New users: opt-in to remotes (privacy by default)
236+
237+
### Open Questions
238+
239+
1. **Default behavior**: Collect all + export none (privacy) vs export to default remote?
240+
2. **Query language**: Simple filters or full SQL access to local DB?
241+
3. **Storage limits**: Auto-prune after N days, or let user manage?
242+
4. **Encryption**: Encrypt local SQLite at rest?

0 commit comments

Comments
 (0)