-
Notifications
You must be signed in to change notification settings - Fork 0
feat(metrics): SQLite Metrics Store Foundation (Phase 1 & 2) #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Phase 1.1: Add dependencies - Add better-sqlite3 (v12.5.0) for metrics persistence - Add @types/better-sqlite3 for TypeScript support - Native SQLite with pre-built binaries - ~9MB uncompressed (~3MB compressed) Sets foundation for event-driven metrics store.
Phase 1.2-1.3: Core metrics infrastructure - Created MetricsStore class with CRUD operations - Implemented SQLite schema with WAL mode for concurrency - Added Zod schemas for snapshot query validation - Comprehensive test coverage (25 tests, all passing) Features: - recordSnapshot(): Store index/update snapshots - getSnapshots(): Query with filters (time, repo, trigger) - getLatestSnapshot(): Retrieve most recent snapshot - pruneOldSnapshots(): Retention policy enforcement - Kero logger integration (optional) Database optimizations: - WAL mode for concurrent reads/writes - Denormalized fields for fast queries - Indexes on timestamp, repository, trigger Next: Event bus integration for automatic persistence
Phase 1.4: Event-driven metrics persistence - Updated IndexUpdatedEvent to include DetailedIndexStats & isIncremental flag - Added optional eventBus parameter to RepositoryIndexer constructor - Emit index.updated events after index() and update() complete - Fire-and-forget pattern (waitForHandlers: false) to avoid blocking - Fixed event bus test to include required stats field Event payload includes: - type: 'code' | 'github' - documentsCount, duration, path - stats: Full DetailedIndexStats snapshot - isIncremental: Whether this was an update vs full index This enables automatic snapshot recording via MetricsStore listeners. Next: CLI integration for MetricsStore
Phase 1 Complete! Foundation + Event Bus CLI Integration: - Wired up MetricsStore in dev index and dev update commands - Created event bus for each command invocation - Subscribed MetricsStore to index.updated events - Automatic snapshot recording on every index/update - Proper error logging (non-blocking, metrics are non-critical) - Proper cleanup (close() on completion) Metrics Database: - Stored in ~/.dev-agent/indexes/<repo>/metrics.db - SQLite with WAL mode for concurrency - Automatic persistence via event-driven architecture Phase 1 Deliverables (ALL COMPLETE): ✅ better-sqlite3 dependency added ✅ MetricsStore class with CRUD operations ✅ SQLite schema with indexes and WAL mode ✅ Comprehensive tests (25 tests, all passing) ✅ Event bus integration in RepositoryIndexer ✅ CLI commands automatically record metrics ✅ Fire-and-forget pattern for non-blocking persistence ✅ Proper error handling with logging Next: Phase 2 - code_metadata table and hotspot detection
Phase 2.1: Code Metadata Schema & Store Methods Database Schema: - Added code_metadata table with foreign key to snapshots - Stores per-file metrics: commit_count, author_count, LOC, functions, imports - Includes calculated risk_score for hotspot detection - Indexes for efficient querying (by snapshot, risk, file) - CASCADE DELETE when snapshots are removed Types & Schemas: - Added CodeMetadata interface with Zod schema - Added CodeMetadataQuery for filtering/sorting - Added Hotspot interface for analysis results - Exported all new types from metrics module MetricsStore Methods: - appendCodeMetadata() - Bulk insert with transaction - getCodeMetadata() - Query with filtering and sorting - getCodeMetadataForFile() - File history across snapshots - getCodeMetadataCount() - Count records per snapshot - calculateRiskScore() - Risk formula: (commits * LOC) / authors Risk Score Formula: - High commits = frequently changed (more bugs) - High LOC = more complex (harder to maintain) - Low authors = knowledge concentrated (bus factor) Next: Analytics module and CLI integration
Replaced judgmental "risk scores" with observable, factual metrics. Developers get data; they make decisions. Analytics API (BREAKING): - Removed: getHotspots() - Added: getFileMetrics(), getMostActive(), getLargestFiles(), getConcentratedOwnership() - Classifications: activity (very-high to minimal), size (very-large to tiny), ownership (single to shared) - Updated: getSnapshotSummary() now categorizes by activity/size/ownership CLI Commands: - dev metrics activity # Most active files by commits - dev metrics size # Largest files by LOC - dev metrics ownership # Knowledge silos Visualization: File: src/auth/session.ts 📊 Activity: ████████░░ Very High (120 commits) 📏 Size: ██████░░░░ Medium (800 LOC, 15 functions) 👥 Ownership: ██░░░░░░░░ Single (1 author) 📅 2024-12-10 Tests: - 17 tests, all passing - Renamed fixtures from "high-risk" to "very-active" - Coverage for all new analytics functions Next: Collect file metadata during indexing
Phase 2 Complete! Code Metadata Collection + Factual Analytics 🎯 What's New: 1. Code Metadata Collection - Built buildCodeMetadata() collector utility - Combines scanner results + git history automatically - Collects: LOC, functions, imports, commits, authors - Automatic collection during index/update operations - Stored in SQLite code_metadata table 2. Factual Analytics (Replaced Risk Scoring) - getMostActive() - files by commit count - getLargestFiles() - files by LOC + function count - getConcentratedOwnership() - files by author count - Multi-dimensional ASCII bar visualizations - Factual labels: very-high/high/medium/low/minimal 3. CLI Commands - dev metrics activity # Most active files - dev metrics size # Largest files - dev metrics ownership # Knowledge concentration 4. Logger Integration - Added optional logger to IndexerConfig - RepositoryIndexer warns on metadata failures - Non-blocking (continues indexing on errors) - Helpful for debugging git/filesystem issues 5. Event Architecture - Added codeMetadata field to IndexUpdatedEvent - RepositoryIndexer emits metadata after scanning - CLI handlers store metadata in SQLite automatically - Graceful handling when metadata unavailable 6. Test Improvements - Fixed flaky timestamp ordering in MetricsStore - Added customTimestamp param to recordSnapshot() - All 42 metrics tests passing (store + analytics) - 1857 total tests passing 7. Lint Cleanup - Fixed Number.parseInt radix warnings - Removed unused biome-ignore suppressions - 100% clean lint across all packages 📊 Visualization Example: File: src/auth/session.ts 📊 Activity: ████████░░ Very High (120 commits) 📏 Size: ██████░░░░ Medium (800 LOC, 15 functions) 👥 Ownership: ██░░░░░░░░ Single (1 author) 📅 Last Changed: 2 days ago 🔄 Data Flow: 1. dev index/update → RepositoryIndexer 2. Indexer scans → builds code metadata 3. Emits index.updated event with metadata 4. CLI handler stores in SQLite code_metadata table 5. CLI commands query + visualize metrics ✅ All Quality Checks Passing: - Build: ✅ Successful - Tests: ✅ 1857/1857 passing - Lint: ✅ 100% clean - TypeCheck: ✅ No errors Next: Phase 3 - Trends table (optional)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #158
Part of Epic #145 (Dashboard & Visualization)
📊 Overview
Implements SQLite-based metrics store with event-driven architecture to enable time-series analytics and dashboard visualizations. This provides the data infrastructure for tracking codebase evolution, identifying hotspots, and displaying trends.
See issue #158 for complete details.
✨ What's New
Phase 1: Foundation + Event Bus
Phase 2: Code Metadata + Analytics
getMostActive()- by commit countgetLargestFiles()- by LOC + function countgetConcentratedOwnership()- by author countdev metrics activity- Most active filesdev metrics size- Largest filesdev metrics ownership- Knowledge silos📦 What's Included
Files Created:
packages/core/src/metrics/(complete module)schema.ts- SQLite schemastore.ts- MetricsStore classcollector.ts- Code metadata builderanalytics.ts- Factual metricstypes.ts- Type definitionspackages/cli/src/commands/metrics.ts- CLI commandsFiles Modified:
🎯 Example Output
$ dev metrics activity 📊 Metrics for /Users/dev/my-repo Captured at: 12/12/2024, 7:00:00 PM Most Active Files (by commits) File: packages/core/src/indexer/index.ts 📊 Activity: ████████░░ Very High (145 commits) 📏 Size: ████████░░ Large (901 LOC, 45 functions) 👥 Ownership: ████░░░░░░ Distributed (3 authors) 📅 2024-12-10✅ Quality Metrics
🏗️ Architecture
Event-driven design ensures metrics never crashes indexing:
🚀 Performance
📋 Next Steps
Phase 3 (Trends Table) - Deferred until dashboard UI work:
🔗 Related