Skip to content

Commit b5dc00d

Browse files
author
Marvin Zhang
committed
feat: Add README for Automatic Historical Data Synchronization with unified sync model
1 parent 8525838 commit b5dc00d

File tree

1 file changed

+222
-0
lines changed

1 file changed

+222
-0
lines changed
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
---
2+
status: planned
3+
created: '2025-12-05'
4+
tags:
5+
- ux
6+
- workflow
7+
- collector
8+
- sync
9+
priority: high
10+
created_at: '2025-12-05T04:55:09.494Z'
11+
---
12+
13+
# Automatic Historical Data Synchronization
14+
15+
> **Status**: 📅 Planned · **Priority**: High · **Created**: 2025-12-05
16+
17+
## Problem Statement
18+
19+
**Current UX is counter-intuitive:**
20+
- User runs `devlog-collector start` expecting it to "just work"
21+
- But historical data isn't captured - only new events are watched
22+
- User must discover and run `devlog-collector backfill run --agent copilot --days 30`
23+
- This is a manual, separate operation users don't expect
24+
25+
**What users expect (cloud-native mental model):**
26+
- Install collector → Start collector → All data syncs automatically
27+
- Like Dropbox, iCloud, or any modern sync service
28+
- Historical + real-time data handled seamlessly as one concept
29+
30+
## Design
31+
32+
### Core Philosophy: Unified Sync Model
33+
34+
Replace the two separate concepts (backfill + watch) with a single **sync** concept:
35+
36+
```
37+
┌─────────────────────────────────────────────────────────────┐
38+
│ UNIFIED SYNC │
39+
├─────────────────────────────────────────────────────────────┤
40+
│ 1. On startup: Process all unsynced historical data │
41+
│ 2. Continue: Watch for new events in real-time │
42+
│ 3. Track: Remember what's been synced (cursor/watermark) │
43+
└─────────────────────────────────────────────────────────────┘
44+
```
45+
46+
### State Management
47+
48+
Each log source maintains a **sync cursor** (stored in SQLite):
49+
50+
```go
51+
type SyncState struct {
52+
AgentName string // "github-copilot"
53+
SourcePath string // "/path/to/chatSessions"
54+
LastSyncedAt time.Time // Timestamp of last synced event
55+
LastByteOffset int64 // Position in file (for resumption)
56+
LastEventHash string // Deduplication marker
57+
Status string // "synced", "syncing", "pending"
58+
}
59+
```
60+
61+
### Startup Flow (New Design)
62+
63+
```
64+
devlog-collector start
65+
66+
├── 1. Discover all agent log sources
67+
68+
├── 2. For each source:
69+
│ ├── Load sync cursor from state DB
70+
│ ├── If cursor exists: Process only events after cursor
71+
│ └── If no cursor: Process all events (first sync)
72+
73+
├── 3. Start real-time watcher
74+
│ └── Update cursor after each event batch
75+
76+
└── 4. Background: Periodically check for new log files
77+
└── New workspace opened → process its history too
78+
```
79+
80+
### Configuration
81+
82+
```yaml
83+
sync:
84+
# How far back to sync on first run (safety limit)
85+
initial_sync_days: 90
86+
87+
# Whether to sync history at all (power users can disable)
88+
historical_sync: true
89+
90+
# Batch size for initial sync (avoid overwhelming backend)
91+
sync_batch_size: 100
92+
93+
# Sync in background vs blocking startup
94+
background_sync: true
95+
```
96+
97+
### CLI Changes
98+
99+
**Before (confusing):**
100+
```bash
101+
devlog-collector start # Only watches new events
102+
devlog-collector backfill run # Separate manual step
103+
```
104+
105+
**After (intuitive):**
106+
```bash
107+
devlog-collector start # Syncs everything automatically
108+
devlog-collector start --no-history # Skip history (power users)
109+
devlog-collector sync --status # Check sync progress
110+
```
111+
112+
**Deprecate but keep for compatibility:**
113+
```bash
114+
devlog-collector backfill run # → Prints: "Deprecated. Use 'start' instead."
115+
```
116+
117+
## Plan
118+
119+
### Phase 1: Merge Backfill into Start Command
120+
121+
- [ ] Add sync cursor management to BackfillManager
122+
- [ ] Integrate historical sync into `start` command flow
123+
- [ ] Process history before starting watcher (or in parallel)
124+
- [ ] Add `--no-history` flag for power users
125+
126+
### Phase 2: Continuous Cursor Tracking
127+
128+
- [ ] Update cursor after each event batch (watcher and initial sync)
129+
- [ ] Handle file rotation (new chat sessions)
130+
- [ ] Detect new workspace directories dynamically
131+
132+
### Phase 3: Progress & Status UX
133+
134+
- [ ] Add `sync --status` subcommand
135+
- [ ] Show sync progress on startup (non-blocking spinner/bar)
136+
- [ ] Add sync state to `status` command output
137+
138+
### Phase 4: Cleanup & Documentation
139+
140+
- [ ] Deprecate `backfill` command (show migration message)
141+
- [ ] Update README and help text
142+
- [ ] Test full user journey from fresh install
143+
144+
## Test
145+
146+
- [ ] Fresh install: `start` syncs all history automatically
147+
- [ ] Second start: Only syncs new events (cursor works)
148+
- [ ] Interrupted sync: Resumes correctly
149+
- [ ] New workspace opened: Its history syncs without restart
150+
- [ ] `--no-history` skips historical sync
151+
- [ ] Progress shows accurate % during initial sync
152+
153+
## Implementation Notes
154+
155+
### Minimal Changes Required
156+
157+
The good news: Most infrastructure already exists!
158+
159+
1. **BackfillManager** already handles historical parsing with state tracking
160+
2. **Watcher** already handles real-time events
161+
3. **StateStore** already persists progress
162+
163+
The main work is **orchestration** - calling backfill before/alongside watcher.
164+
165+
### Key Code Changes
166+
167+
```go
168+
// cmd/devlog/main.go - start command
169+
170+
// Before watcher.Start():
171+
if !skipHistory {
172+
log.Info("Syncing historical data...")
173+
bm, _ := backfill.NewBackfillManager(...)
174+
175+
for agent, logs := range discovered {
176+
for _, logPath := range logs {
177+
// Process only unsynced events (cursor-based)
178+
bm.SyncToPresent(ctx, agent, logPath)
179+
}
180+
}
181+
log.Info("Historical sync complete")
182+
}
183+
184+
// Then start watcher as normal
185+
watcher.Start()
186+
```
187+
188+
### Background Sync Option
189+
190+
For large histories, consider syncing in background:
191+
192+
```go
193+
if cfg.Sync.BackgroundSync {
194+
go func() {
195+
bm.SyncToPresent(ctx, ...)
196+
}()
197+
} else {
198+
bm.SyncToPresent(ctx, ...) // Block until done
199+
}
200+
```
201+
202+
## Open Questions
203+
204+
1. **Initial sync limit**: Should we limit first sync to N days? (Proposed: 90 days)
205+
2. **Progress display**: Spinner vs progress bar vs silent?
206+
3. **Error handling**: Continue watcher if historical sync fails?
207+
208+
## Notes
209+
210+
### Prior Art
211+
212+
- **Dropbox/iCloud**: Sync is always on, no manual steps
213+
- **Docker Desktop**: Background processes auto-start
214+
- **VSCode Settings Sync**: Just enable and it works
215+
216+
### Mental Model Shift
217+
218+
| Old (Wrong) | New (Right) |
219+
|-------------|-------------|
220+
| "Backfill" = manual import | "Sync" = automatic, continuous |
221+
| Two separate operations | One unified concept |
222+
| User runs command | System handles everything |

0 commit comments

Comments
 (0)