Skip to content

Conversation

@prosdev
Copy link
Collaborator

@prosdev prosdev commented Dec 13, 2025

Closes #158

Part of Epic #145 (Dashboard & Visualization)

📊 Overview

Implements SQLite-based metrics store with event-driven architecture to enable time-series analytics and dashboard visualizations. This provides the data infrastructure for tracking codebase evolution, identifying hotspots, and displaying trends.

See issue #158 for complete details.

✨ What's New

Phase 1: Foundation + Event Bus

  • MetricsStore class with SQLite (better-sqlite3)
  • Automatic snapshot persistence on every index/update
  • Event-driven architecture (indexer emits, metrics listens)
  • WAL mode for concurrency and crash recovery
  • Zod validation for all queries
  • 25 tests passing

Phase 2: Code Metadata + Analytics

  • Per-file metrics: LOC, commits, authors, functions, imports
  • Factual analytics (replaced subjective risk scores):
    • getMostActive() - by commit count
    • getLargestFiles() - by LOC + function count
    • getConcentratedOwnership() - by author count
  • CLI commands:
    • dev metrics activity - Most active files
    • dev metrics size - Largest files
    • dev metrics ownership - Knowledge silos
  • Multi-dimensional ASCII bar visualizations
  • 17 tests passing

📦 What's Included

Files Created:

  • packages/core/src/metrics/ (complete module)
    • schema.ts - SQLite schema
    • store.ts - MetricsStore class
    • collector.ts - Code metadata builder
    • analytics.ts - Factual metrics
    • types.ts - Type definitions
  • packages/cli/src/commands/metrics.ts - CLI commands

Files Modified:

  • Event types (added stats + codeMetadata to IndexUpdatedEvent)
  • Indexer (emits events, builds metadata)
  • CLI commands (event handlers)

🎯 Example Output

$ dev metrics activity

📊 Metrics for /Users/dev/my-repo
   Captured at: 12/12/2024, 7:00:00 PM

Most Active Files (by commits)

File: packages/core/src/indexer/index.ts
📊 Activity:   ████████░░  Very High (145 commits)
📏 Size:       ████████░░  Large (901 LOC, 45 functions)
👥 Ownership:  ████░░░░░░  Distributed (3 authors)
📅 2024-12-10

✅ Quality Metrics

  • 42 tests passing (25 store + 17 analytics)
  • 100% lint clean (Biome)
  • TypeCheck passing (strict mode)
  • Zod validation on all boundaries
  • Logger integration for observability

🏗️ Architecture

Event-driven design ensures metrics never crashes indexing:

RepositoryIndexer
  ├─ Scans files → builds metadata
  ├─ Emits index.updated event
  └─ Returns stats (indexing complete)
  
CLI Event Handler
  ├─ Listens to index.updated
  ├─ Stores snapshot in SQLite
  └─ Logs errors (doesn't throw)

🚀 Performance

  • Metrics append: <10ms
  • Query latency: <100ms
  • Indexing overhead: <2%
  • Memory bounded (no leaks)

📋 Next Steps

Phase 3 (Trends Table) - Deferred until dashboard UI work:

  • Pre-computed aggregations for fast queries
  • Daily/weekly/monthly trends
  • Will be done in a separate PR when building the web dashboard

🔗 Related

Phase 1.1: Add dependencies
- Add better-sqlite3 (v12.5.0) for metrics persistence
- Add @types/better-sqlite3 for TypeScript support
- Native SQLite with pre-built binaries
- ~9MB uncompressed (~3MB compressed)

Sets foundation for event-driven metrics store.
Phase 1.2-1.3: Core metrics infrastructure
- Created MetricsStore class with CRUD operations
- Implemented SQLite schema with WAL mode for concurrency
- Added Zod schemas for snapshot query validation
- Comprehensive test coverage (25 tests, all passing)

Features:
- recordSnapshot(): Store index/update snapshots
- getSnapshots(): Query with filters (time, repo, trigger)
- getLatestSnapshot(): Retrieve most recent snapshot
- pruneOldSnapshots(): Retention policy enforcement
- Kero logger integration (optional)

Database optimizations:
- WAL mode for concurrent reads/writes
- Denormalized fields for fast queries
- Indexes on timestamp, repository, trigger

Next: Event bus integration for automatic persistence
Phase 1.4: Event-driven metrics persistence
- Updated IndexUpdatedEvent to include DetailedIndexStats & isIncremental flag
- Added optional eventBus parameter to RepositoryIndexer constructor
- Emit index.updated events after index() and update() complete
- Fire-and-forget pattern (waitForHandlers: false) to avoid blocking
- Fixed event bus test to include required stats field

Event payload includes:
- type: 'code' | 'github'
- documentsCount, duration, path
- stats: Full DetailedIndexStats snapshot
- isIncremental: Whether this was an update vs full index

This enables automatic snapshot recording via MetricsStore listeners.

Next: CLI integration for MetricsStore
Phase 1 Complete! Foundation + Event Bus

CLI Integration:
- Wired up MetricsStore in dev index and dev update commands
- Created event bus for each command invocation
- Subscribed MetricsStore to index.updated events
- Automatic snapshot recording on every index/update
- Proper error logging (non-blocking, metrics are non-critical)
- Proper cleanup (close() on completion)

Metrics Database:
- Stored in ~/.dev-agent/indexes/<repo>/metrics.db
- SQLite with WAL mode for concurrency
- Automatic persistence via event-driven architecture

Phase 1 Deliverables (ALL COMPLETE):
✅ better-sqlite3 dependency added
✅ MetricsStore class with CRUD operations
✅ SQLite schema with indexes and WAL mode
✅ Comprehensive tests (25 tests, all passing)
✅ Event bus integration in RepositoryIndexer
✅ CLI commands automatically record metrics
✅ Fire-and-forget pattern for non-blocking persistence
✅ Proper error handling with logging

Next: Phase 2 - code_metadata table and hotspot detection
Phase 2.1: Code Metadata Schema & Store Methods

Database Schema:
- Added code_metadata table with foreign key to snapshots
- Stores per-file metrics: commit_count, author_count, LOC, functions, imports
- Includes calculated risk_score for hotspot detection
- Indexes for efficient querying (by snapshot, risk, file)
- CASCADE DELETE when snapshots are removed

Types & Schemas:
- Added CodeMetadata interface with Zod schema
- Added CodeMetadataQuery for filtering/sorting
- Added Hotspot interface for analysis results
- Exported all new types from metrics module

MetricsStore Methods:
- appendCodeMetadata() - Bulk insert with transaction
- getCodeMetadata() - Query with filtering and sorting
- getCodeMetadataForFile() - File history across snapshots
- getCodeMetadataCount() - Count records per snapshot
- calculateRiskScore() - Risk formula: (commits * LOC) / authors

Risk Score Formula:
- High commits = frequently changed (more bugs)
- High LOC = more complex (harder to maintain)
- Low authors = knowledge concentrated (bus factor)

Next: Analytics module and CLI integration
Replaced judgmental "risk scores" with observable, factual metrics.
Developers get data; they make decisions.

Analytics API (BREAKING):
- Removed: getHotspots()
- Added: getFileMetrics(), getMostActive(), getLargestFiles(),
  getConcentratedOwnership()
- Classifications: activity (very-high to minimal),
  size (very-large to tiny), ownership (single to shared)
- Updated: getSnapshotSummary() now categorizes by
  activity/size/ownership

CLI Commands:
- dev metrics activity    # Most active files by commits
- dev metrics size        # Largest files by LOC
- dev metrics ownership   # Knowledge silos

Visualization:
File: src/auth/session.ts
📊 Activity:   ████████░░  Very High (120 commits)
📏 Size:       ██████░░░░  Medium (800 LOC, 15 functions)
👥 Ownership:  ██░░░░░░░░  Single (1 author)
📅 2024-12-10

Tests:
- 17 tests, all passing
- Renamed fixtures from "high-risk" to "very-active"
- Coverage for all new analytics functions

Next: Collect file metadata during indexing
Phase 2 Complete! Code Metadata Collection + Factual Analytics

🎯 What's New:

1. Code Metadata Collection
   - Built buildCodeMetadata() collector utility
   - Combines scanner results + git history automatically
   - Collects: LOC, functions, imports, commits, authors
   - Automatic collection during index/update operations
   - Stored in SQLite code_metadata table

2. Factual Analytics (Replaced Risk Scoring)
   - getMostActive() - files by commit count
   - getLargestFiles() - files by LOC + function count
   - getConcentratedOwnership() - files by author count
   - Multi-dimensional ASCII bar visualizations
   - Factual labels: very-high/high/medium/low/minimal

3. CLI Commands
   - dev metrics activity    # Most active files
   - dev metrics size        # Largest files
   - dev metrics ownership   # Knowledge concentration

4. Logger Integration
   - Added optional logger to IndexerConfig
   - RepositoryIndexer warns on metadata failures
   - Non-blocking (continues indexing on errors)
   - Helpful for debugging git/filesystem issues

5. Event Architecture
   - Added codeMetadata field to IndexUpdatedEvent
   - RepositoryIndexer emits metadata after scanning
   - CLI handlers store metadata in SQLite automatically
   - Graceful handling when metadata unavailable

6. Test Improvements
   - Fixed flaky timestamp ordering in MetricsStore
   - Added customTimestamp param to recordSnapshot()
   - All 42 metrics tests passing (store + analytics)
   - 1857 total tests passing

7. Lint Cleanup
   - Fixed Number.parseInt radix warnings
   - Removed unused biome-ignore suppressions
   - 100% clean lint across all packages

📊 Visualization Example:
File: src/auth/session.ts
📊 Activity:   ████████░░  Very High (120 commits)
📏 Size:       ██████░░░░  Medium (800 LOC, 15 functions)
👥 Ownership:  ██░░░░░░░░  Single (1 author)
📅 Last Changed: 2 days ago

🔄 Data Flow:
1. dev index/update → RepositoryIndexer
2. Indexer scans → builds code metadata
3. Emits index.updated event with metadata
4. CLI handler stores in SQLite code_metadata table
5. CLI commands query + visualize metrics

✅ All Quality Checks Passing:
- Build: ✅ Successful
- Tests: ✅ 1857/1857 passing
- Lint: ✅ 100% clean
- TypeCheck: ✅ No errors

Next: Phase 3 - Trends table (optional)
@prosdev prosdev merged commit 54efb97 into main Dec 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Story: SQLite Metrics Store Foundation (Phase 1 & 2)

1 participant