Skip to content

[Monorepo Phase 3] Organize into clean package structure (9 packages) #204

@bashandbone

Description

@bashandbone

Phase 3: Monorepo Structure Organization

Parent Epic: #116
Depends On: #118 (DI Integration)
Target: Week 3 (5-7 days)
Risk Level: Low

Organize CodeWeaver into clean monorepo structure with 9 independently-buildable packages, now trivial thanks to DI breaking circular dependencies.

Goals

  • Organize code into packages/ with uv workspace
  • All packages build independently
  • Remaining violations: ~30-40 (down from 164)
  • Clean dependency graph
  • Documentation and migration guides
  • Zero functional changes

Why This Phase is Now Easy

Thanks to Phase 2 (DI Integration):

  • ✅ Circular dependencies broken (75% eliminated)
  • ✅ Services don't import across packages
  • ✅ Clean dependency flow established
  • ✅ Just need to organize files into packages

Original estimate: 3-4 weeks to refactor dependencies
New reality: 5-7 days to organize structure (dependencies already fixed!)

Target Monorepo Structure

packages/
  codeweaver-core/
    - Core types, exceptions
    - DI infrastructure
    - search_types (moved in Phase 1)
    - No external dependencies (except stdlib)

  codeweaver-tokenizers/  ✅ (Extracted in Phase 1)
    - Tokenizer implementations
    - Tree-sitter integrations
    
  codeweaver-daemon/  ✅ (Extracted in Phase 1)
    - Background daemon logic
    - Process management

  codeweaver-utils/
    - Common utilities
    - git, logging, procs
    - Depends: core

  codeweaver-semantic/
    - Semantic chunking
    - AST analysis
    - Depends: core, utils, tokenizers

  codeweaver-telemetry/
    - Telemetry client (DI-enabled)
    - Analytics
    - Depends: core

  codeweaver-providers/
    - All provider implementations
    - Embedding, vector store, reranking
    - Provider factories (DI)
    - Depends: core, telemetry

  codeweaver-engine/
    - Indexer, search services
    - Config, registry (simplified)
    - Depends: core, utils, semantic, providers

  codeweaver/
    - CLI, server, MCP
    - agent_api orchestration
    - Depends: engine, all other packages

Implementation Checklist

Part A: Package Structure Setup (Days 1-2)

Create package directories:

  • Create packages/ directory structure
  • Create pyproject.toml for each package
  • Set up uv workspace configuration
  • Define inter-package dependencies

uv workspace configuration:

# Root pyproject.toml
[tool.uv.workspace]
members = [
    "packages/codeweaver-core",
    "packages/codeweaver-tokenizers",
    "packages/codeweaver-daemon",
    "packages/codeweaver-utils",
    "packages/codeweaver-semantic",
    "packages/codeweaver-telemetry",
    "packages/codeweaver-providers",
    "packages/codeweaver-engine",
    "packages/codeweaver",
]

Part B: Move Code to Packages (Days 3-4)

Priority 1: Foundation packages (already extracted)

  • codeweaver-tokenizers ✅ (Phase 1)
  • codeweaver-daemon ✅ (Phase 1)
  • codeweaver-core
    • Move DI infrastructure
    • Include search_types
    • Core exceptions

Priority 2: Utility packages

  • codeweaver-utils

    • Move common/utils (except registry)
    • Update imports
    • Validate independence
  • codeweaver-telemetry

    • Move common/telemetry
    • DI-enabled client
    • Update imports

Priority 3: Semantic and providers

  • codeweaver-semantic

    • Move semantic package
    • Update imports
    • Validate against utils
  • codeweaver-providers

    • Move all provider implementations
    • Include provider factories
    • Update imports

Priority 4: Engine and app

  • codeweaver-engine

    • Move engine, config
    • Simplified registry (thin layer)
    • Services using DI
  • codeweaver (main package)

    • CLI, server, MCP
    • agent_api
    • Orchestration layer

Part C: Build System (Days 5-6)

Package build configuration:

  • Configure build backend for each package
  • Set version management strategy
  • Define dependency ranges
  • Set up development dependencies

Example package pyproject.toml:

[project]
name = "codeweaver-providers"
version = "0.2.0-alpha"
dependencies = [
    "codeweaver-core >=0.2.0",
    "codeweaver-telemetry >=0.2.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Build validation:

  • Build each package independently
  • Verify import paths
  • Test inter-package dependencies
  • Validate circular dependency elimination

Part D: Testing & Validation (Day 7)

Dependency validation:

  • Run: python scripts/validate_proposed_structure.py
  • Target: < 50 violations (down from 164)
  • Verify no circular dependencies between packages
  • Check dependency graph is acyclic

Build testing:

  • Build all packages in dependency order
  • Run tests for each package
  • Integration test full system
  • Performance validation

Expected results:

# Should succeed for all packages
cd packages/codeweaver-core && uv build
cd packages/codeweaver-utils && uv build
cd packages/codeweaver-providers && uv build
# ... etc

Package Dependency Graph

Clean dependency flow (acyclic):

codeweaver-core (foundation)
  ↑
  ├── codeweaver-tokenizers
  ├── codeweaver-daemon
  ├── codeweaver-utils
  ├── codeweaver-telemetry
  ↑
  ├── codeweaver-semantic (depends on: core, utils, tokenizers)
  ├── codeweaver-providers (depends on: core, telemetry)
  ↑
  ├── codeweaver-engine (depends on: core, utils, semantic, providers)
  ↑
  └── codeweaver (depends on: ALL)

No circular dependencies between packages!

Remaining Violations (~30-40)

What's left after DI broke 75%:

Type movements (10-15 violations)

  • Some types still in wrong packages
  • Easy to fix: just move files
  • No logic changes needed

Utility dependencies (9 violations)

  • core → utils
  • Solution: Move core utilities to core package

Semantic utilities (4 violations)

  • semantic → utils
  • Solution: Move or minimize shared utilities

Minor couplings (10-15 violations)

  • Various small import adjustments
  • Lazy imports where needed
  • Protocol usage for abstract dependencies

Acceptance Criteria

  • All 9 packages created with pyproject.toml
  • Code organized into packages
  • uv workspace builds successfully
  • Dependency violations < 50 (down from 164)
  • Zero circular dependencies between packages
  • All packages build independently
  • All tests pass
  • Type checking passes
  • Performance within 5% of baseline
  • Documentation complete

Migration Guide

Document for users:

  • Import path changes
  • Package installation instructions
  • Development setup with workspace
  • Contribution guidelines per package

Example migration:

# Before (monolith)
from codeweaver.engine.indexer import Indexer
from codeweaver.providers.embedding.fastembed import FastembedProvider

# After (monorepo)
from codeweaver_engine.indexer import Indexer
from codeweaver_providers.embedding.fastembed import FastembedProvider

Benefits After This Phase

For developers:

  • ✅ Work on individual packages without full codebase
  • ✅ Clear package boundaries and responsibilities
  • ✅ Independent versioning possible
  • ✅ Faster builds (only rebuild changed packages)

For users:

  • ✅ Install only needed packages
  • ✅ Lighter dependencies for specific use cases
  • ✅ Clear module structure

For maintainers:

  • ✅ Easier to review (package-scoped changes)
  • ✅ Independent package releases
  • ✅ Clear ownership boundaries

Success Metrics

Structural:

  • 9 packages building independently
  • < 50 dependency violations
  • 0 circular dependencies between packages
  • Clean dependency graph

Quality:

  • All tests pass
  • Type checking passes
  • Documentation complete
  • Migration guide available

Next Steps After This Phase

Phase 4 (#119): pydantic-ai Integration

  • Add agent DI support
  • Integrate data providers
  • Build on clean package structure

Phase 5 (#120): Advanced Features

  • Health checks, telemetry
  • Plugin system
  • Performance optimization

Phase 6 (#121): Cleanup

  • Deprecate old patterns
  • Eliminate/simplify registry
  • Final documentation

Connection to Integrated Strategy

This phase implements Week 3 of the integrated strategy:

  • Organize into packages → Natural structure emerges
  • Build system → All packages independent
  • Final validation → Proof of clean architecture

Why This is Low Risk Now

Thanks to Phases 1-2:

  • ✅ Circular dependencies already broken (DI work)
  • ✅ Services already don't import across boundaries
  • ✅ Just moving files, not changing logic
  • ✅ Can validate continuously during organization

Original risk: HIGH (dependencies blocking)
Current risk: LOW (just file organization)

Reference

  • Planning: INTEGRATED_DI_MONOREPO_STRATEGY.md (Week 3)
  • Visualization: DI_IMPACT_VISUALIZATION.md
  • Original analysis: REFINED_MONOREPO_ANALYSIS.md
  • Validation script: scripts/validate_proposed_structure.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions