Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 223 additions & 1 deletion PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,40 @@ Git history is valuable context that LLMs can't easily access. We add intelligen

---

## Current: Performance & Reliability (v0.6.x - v0.7.x)

> Critical high-impact improvements for production readiness and user experience.

**Epic:** #104 (Progress: 6/9 complete)

### Completed Improvements ✅

| Feature | Status | Version | Impact |
|---------|--------|---------|--------|
| Index size reporting | ✅ Done | v0.4.3 | Track disk usage growth |
| Adaptive concurrency | ✅ Done | v0.6.0 | Auto-detect optimal batch size by CPU/memory |
| Incremental indexing | ✅ Done | v0.5.1 | <30s updates for single file changes (#122) |
| Progress indicators | ✅ Done | v0.1.0 | Real-time feedback for long operations |
| Error handling | ✅ Done | v0.3.0 | Graceful degradation |
| Basic validation | ✅ Done | v0.2.0 | Git repo and path checks |

### Remaining Work 🔄

| Issue | Priority | Impact | Status |
|-------|----------|--------|--------|
| #152 - MCP lazy initialization | P0 | Reduce startup from 2-5s to <500ms | 🔲 Todo |
| #153 - GitHub history in planner | P0 | Add commit context to AI plans | 🔲 Todo |
| #154 - Memory monitoring | P1 | Prevent leaks, maintain <500MB usage | 🔲 Todo |

**Success Metrics:**
- ✅ Large repo indexing: <5min for 50k files
- ✅ Incremental updates: <30s for single file changes
- 🔲 MCP server startup: <500ms (currently 2-5s)
- 🔲 Memory usage: <500MB steady state
- 🔲 Planner quality: Include git history context

---

## Next: Extended Git Intelligence (v0.5.0)

> Building on git history with deeper insights.
Expand Down Expand Up @@ -277,7 +311,195 @@ Git history is valuable context that LLMs can't easily access. We add intelligen

---

## Future: Extended Intelligence (v0.6+)
## Next: Dashboard & Visualization (v0.7.1)

> Making codebase insights visible and accessible.

**Epic:** #145

### Philosophy

Dev-agent provides rich context about codebases, but it's currently text-only. A dashboard makes insights:
- **Visible** - See language breakdown, component types, health status at a glance
- **Interactive** - Explore relationships, drill into packages
- **Actionable** - Identify areas needing attention

### Goals

1. **Enhanced CLI** (`dev dashboard`) - Terminal-based stats with rich formatting
2. **Web Dashboard** - Next.js app with real-time insights
3. **Data Infrastructure** - Aggregate stats during indexing for efficient display

### Components

| Component | Status | Priority |
|-----------|--------|----------|
| **CLI Enhancements** | | |
| Language breakdown display | 🔲 Todo | 🔴 High |
| Component type statistics | 🔲 Todo | 🔴 High |
| Package-level stats (monorepo) | 🔲 Todo | 🔴 High |
| Rich formatting (tables, colors) | 🔲 Todo | 🔴 High |
| **Core Data Collection** | | |
| Track language metrics in indexer | 🔲 Todo | 🔴 High |
| Aggregate component type counts | 🔲 Todo | 🔴 High |
| Package-level aggregation | 🔲 Todo | 🟡 Medium |
| Change frequency tracking | 🔲 Todo | 🟡 Medium |
| **Web Dashboard** | | |
| Next.js app setup (`apps/dashboard/`) | 🔲 Todo | 🔴 High |
| Tremor component library | 🔲 Todo | 🔴 High |
| API routes (stats, health) | 🔲 Todo | 🔴 High |
| Real-time stats display | 🔲 Todo | 🔴 High |
| Language distribution charts | 🔲 Todo | 🟡 Medium |
| Component type visualizations | 🔲 Todo | 🟡 Medium |
| Health status indicators | 🔲 Todo | 🟡 Medium |
| Vector index metrics (simple) | 🔲 Todo | 🟡 Medium |
| Basic package list (monorepo) | 🔲 Todo | 🟡 Medium |

### Architecture

```
apps/
└── dashboard/ # Next.js 16 + React 19 + Tremor
├── app/
│ ├── page.tsx # Main dashboard
│ └── api/
│ └── stats/ # Next.js API routes
└── components/
└── tremor/ # Tremor dashboard components

packages/core/
└── src/
└── indexer/
└── stats-aggregator.ts # New: Collect detailed stats
```

### Implementation Plan

**Implementation Phases:**

**Phase 1: Data Foundation**
- Enhance IndexStats with language/component breakdowns
- Aggregate stats during indexing (minimal overhead)
- Foundation for all visualizations

**Phase 2: CLI Enhancements**
- Rich terminal output with tables and colors
- Package-level breakdown for monorepos
- Immediate user value

**Phase 3: Web Dashboard**
- Next.js 16 app in `apps/dashboard/`
- Tremor component setup
- Basic stats display with charts

**Phase 4: Advanced Features**
- Interactive exploration
- Package explorer (monorepo support)
- Real-time updates

---

## Next: Advanced LanceDB Visualizations (v0.7.2)

> Making vector embeddings visible and explorable.

### Philosophy

LanceDB stores 384-dimensional embeddings for semantic search, but these are invisible to users. Advanced visualizations reveal:
- **Where code lives** in semantic space (2D projections)
- **What's related** beyond imports (similarity networks)
- **How embeddings evolve** over time (drift tracking)
- **Search quality** insights (what works, what doesn't)

### Goals

1. **Semantic Code Map** - 2D/3D projection of vector space
2. **Similarity Explorer** - Interactive component relationship graph
3. **Search Quality Dashboard** - Analyze search performance
4. **Embedding Health** - Coverage and quality metrics per directory

### Components

| Component | Description | Priority |
|-----------|-------------|----------|
| **Semantic Code Map** | | |
| t-SNE/UMAP projection to 2D | Visualize embedding space | 🔴 High |
| Interactive scatter plot | Click to see code snippet | 🔴 High |
| Color by language/type | Visual code categorization | 🟡 Medium |
| Cluster detection | Auto-identify code groups | 🟡 Medium |
| **Similarity Network** | | |
| Component relationship graph | Force-directed layout | 🔴 High |
| Semantic similarity edges | Show hidden relationships | 🔴 High |
| Interactive exploration | Zoom, pan, filter | 🟡 Medium |
| Duplication detection | High similarity alerts | 🟡 Medium |
| **Search Quality** | | |
| Search metrics dashboard | Track performance over time | 🔴 High |
| Query similarity heatmap | Understand search patterns | 🟡 Medium |
| "Dead zone" detection | Queries with poor results | 🟡 Medium |
| Recommendation engine | Suggest better queries | 🟢 Low |
| **Embedding Health** | | |
| Coverage heatmap by directory | Identify blind spots | 🔴 High |
| Quality scoring per file | Flag low-quality embeddings | 🟡 Medium |
| Drift tracking over time | Monitor embedding changes | 🟡 Medium |
| Re-index recommendations | Suggest what needs updating | 🟢 Low |

### Architecture

```
Dashboard UI
Advanced Viz Components (D3.js, Plotly, or similar)
New API Routes
├─ GET /api/embeddings/projection (t-SNE/UMAP data)
├─ GET /api/embeddings/similarity (network graph)
├─ GET /api/embeddings/quality (coverage metrics)
└─ GET /api/embeddings/search-history (query analysis)
LanceDB + Vector Analysis
└─ Dimensionality reduction, similarity queries, metrics
```

### Dependencies

**New:**
- `umap-js` or `tsne-js` - Dimensionality reduction
- `d3` or `@visx/visx` - Advanced visualizations
- `react-force-graph` - Network graphs (or `sigma.js`)
- `@tensorflow/tfjs` (optional) - Advanced vector operations

### Implementation Phases

**Phase 1: Semantic Code Map**
- Implement t-SNE/UMAP projection
- Create 2D scatter plot visualization
- Add basic interactivity (hover, click)

**Phase 2: Similarity Network**
- Build component similarity graph
- Implement force-directed layout
- Add filtering and exploration

**Phase 3: Search Quality**
- Track search queries and results
- Build metrics dashboard
- Implement quality scoring

**Phase 4: Embedding Health**
- Coverage analysis by directory
- Quality scoring per file
- Drift detection system

### Success Metrics

- Developers can visually explore codebase semantics
- Identify code duplication without running analysis tools
- Understand which areas need re-indexing
- Improve search query formulation based on insights

---

## Future: Extended Intelligence (v0.8+)

### Multi-Language Support

Expand Down
8 changes: 8 additions & 0 deletions packages/core/src/indexer.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/**
* Repository Indexer module exports
*/

export { RepositoryIndexer } from './indexer/index';
export { StatsAggregator } from './indexer/stats-aggregator';
export * from './indexer/types';
export * from './indexer/utils';
Loading