|
| 1 | +# Fix Collector Backfill Parsing Errors |
| 2 | + |
| 3 | +**Status**: � In Progress |
| 4 | +**Created**: 2025-10-31 |
| 5 | +**Spec**: `20251031/004-collector-parsing-errors` |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +The Go collector's backfill functionality is failing to parse GitHub Copilot chat session log files, resulting in 447K+ parsing errors when processing historical logs. While the SQL timestamp scanning issue has been resolved, the event parsing logic is encountering errors that prevent successful backfill operations. |
| 10 | + |
| 11 | +## Objectives |
| 12 | + |
| 13 | +1. Identify root cause of 447K parsing errors in Copilot log backfill |
| 14 | +2. Fix event parsing logic to correctly handle Copilot chat session format |
| 15 | +3. Add verbose error logging for debugging |
| 16 | +4. Successfully backfill historical Copilot activity |
| 17 | + |
| 18 | +## Current Behavior |
| 19 | + |
| 20 | +**Command**: `./bin/devlog-collector backfill run --days 1` |
| 21 | + |
| 22 | +**Results**: |
| 23 | +- Events processed: 0 |
| 24 | +- Errors: 447,397 |
| 25 | +- Data processed: 18.02 MB (but not successfully parsed) |
| 26 | +- 11 log files discovered but not processed |
| 27 | +- No error messages logged to stderr (silent failures) |
| 28 | + |
| 29 | +**Log Files**: |
| 30 | +- Location: `~/Library/Application Support/Code - Insiders/User/workspaceStorage/.../chatSessions/` |
| 31 | +- Format: JSON chat session files (version 3) |
| 32 | +- Size range: 511 bytes to 941 KB |
| 33 | +- 11 files total |
| 34 | + |
| 35 | +**Sample Log Structure**: |
| 36 | +```json |
| 37 | +{ |
| 38 | + "version": 3, |
| 39 | + "requesterUsername": "tikazyq", |
| 40 | + "requesterAvatarIconUri": { "$mid": 1, ... }, |
| 41 | + ... |
| 42 | +} |
| 43 | +``` |
| 44 | + |
| 45 | +## Design |
| 46 | + |
| 47 | +### Fixed Issues ✅ |
| 48 | + |
| 49 | +1. **SQL Timestamp Scanning** - Fixed `started_at` column scanning from int64 to `time.Time` |
| 50 | + - File: `packages/collector-go/internal/backfill/state.go` |
| 51 | + - Changes: Added `sql.NullInt64` for `startedAt` in both `Load()` and `ListByAgent()` methods |
| 52 | + |
| 53 | +2. **DefaultRegistry Arguments** - Added missing `hierarchyCache` and `logger` parameters |
| 54 | + - File: `packages/collector-go/cmd/collector/main.go` |
| 55 | + - Changes: Initialize `HierarchyCache` and pass to `DefaultRegistry()` calls |
| 56 | + |
| 57 | +### Root Cause Analysis |
| 58 | + |
| 59 | +The Copilot adapter (`packages/collector-go/internal/adapters/copilot_adapter.go`) likely expects: |
| 60 | +- Line-delimited JSON logs (NDJSON format) |
| 61 | +- Different schema than chat session format |
| 62 | +- Specific event structure that doesn't match chat sessions |
| 63 | + |
| 64 | +The chat session files are full session objects, not individual log events. |
| 65 | + |
| 66 | +## Implementation Plan |
| 67 | + |
| 68 | +### Phase 1: Investigation (High Priority) |
| 69 | +- [ ] Add verbose error logging to backfill processor |
| 70 | +- [ ] Capture and log first 10 parsing errors with sample data |
| 71 | +- [ ] Examine `copilot_adapter.go` to understand expected format |
| 72 | +- [ ] Compare expected vs actual log file format |
| 73 | +- [ ] Determine if chat sessions are the correct log source |
| 74 | + |
| 75 | +### Phase 2: Fix Parsing Logic |
| 76 | +- [ ] Update parser to handle chat session format (if correct source) |
| 77 | +- [ ] Or identify and use correct Copilot log files (if wrong source) |
| 78 | +- [ ] Add format detection/validation |
| 79 | +- [ ] Handle both session-level and event-level data |
| 80 | + |
| 81 | +### Phase 3: Testing |
| 82 | +- [ ] Test with sample chat session files |
| 83 | +- [ ] Verify successful event extraction |
| 84 | +- [ ] Test backfill with various date ranges |
| 85 | +- [ ] Validate data sent to backend |
| 86 | +- [ ] Test state persistence |
| 87 | + |
| 88 | +## Files to Investigate |
| 89 | + |
| 90 | +``` |
| 91 | +packages/collector-go/ |
| 92 | +├── internal/ |
| 93 | +│ ├── adapters/ |
| 94 | +│ │ ├── copilot_adapter.go # Parsing logic |
| 95 | +│ │ ├── claude_adapter.go |
| 96 | +│ │ └── cursor_adapter.go |
| 97 | +│ ├── backfill/ |
| 98 | +│ │ ├── backfill.go # Error handling |
| 99 | +│ │ └── state.go # ✅ Fixed |
| 100 | +│ └── watcher/ |
| 101 | +│ └── discovery.go # Log file discovery |
| 102 | +└── cmd/collector/main.go # ✅ Fixed |
| 103 | +``` |
| 104 | + |
| 105 | +## Success Criteria |
| 106 | + |
| 107 | +- [ ] Zero parsing errors on valid log files |
| 108 | +- [ ] Successfully extract events from Copilot chat sessions |
| 109 | +- [ ] Error messages logged with actionable details |
| 110 | +- [ ] Events successfully sent to backend |
| 111 | +- [ ] Backfill state properly tracked |
| 112 | +- [ ] Throughput > 0 events/sec |
| 113 | + |
| 114 | +## Testing Commands |
| 115 | + |
| 116 | +```bash |
| 117 | +# Clean state and test backfill |
| 118 | +rm -f ~/.devlog/buffer.db* |
| 119 | +cd packages/collector-go |
| 120 | +./bin/devlog-collector backfill run --days 1 |
| 121 | + |
| 122 | +# Check backfill status |
| 123 | +./bin/devlog-collector backfill status |
| 124 | + |
| 125 | +# Build collector |
| 126 | +./build.sh |
| 127 | + |
| 128 | +# Verbose mode (when implemented) |
| 129 | +./bin/devlog-collector backfill run --days 1 --verbose |
| 130 | +``` |
| 131 | + |
| 132 | +## References |
| 133 | + |
| 134 | +- Fixed SQL scanning issue in `state.go` (Lines 95-136) |
| 135 | +- Fixed DefaultRegistry calls in `main.go` (Lines 97, 327) |
| 136 | +- Chat session log location: `~/Library/Application Support/Code - Insiders/User/workspaceStorage/.../chatSessions/` |
0 commit comments