Architecture

Overview

mdite is a comprehensive documentation toolkit built as a modular system with clear separation of concerns. The architecture follows a layered approach, separating CLI concerns from core business logic and shared utilities.

Core Philosophy: mdite treats documentation as a connected system (graph), not isolated files. This graph foundation enables all current and future features: validation, dependency analysis, search, output, and more.

Core Components

CLI Layer

Purpose: Parse command-line arguments and options, coordinate user interaction

Location: src/cli.ts, src/commands/

Key files:

cli.ts - Main CLI setup with Commander.js, signal handlers, global options
commands/lint.ts - Validation command (structural integrity)
commands/deps.ts - Dependency analysis command
commands/config.ts - Configuration management commands
commands/init.ts - Initialize configuration
Future: commands/query.ts, commands/cat.ts, commands/toc.ts

Responsibilities:

Parse CLI arguments and options (including Unix-friendly flags)
Load and merge configuration
Initialize logger with appropriate verbosity and output modes
Handle Unix signals (SIGINT, SIGTERM, SIGPIPE) gracefully
Execute commands and handle errors with proper exit codes
Format output for user consumption (respecting stdout/stderr separation)

Core Layer

Purpose: Business logic and orchestration of documentation system operations

Location: src/core/

Key files:

doc-linter.ts - Main orchestrator that coordinates all operations
graph-analyzer.ts - Graph foundation: Dependency graph building and traversal (enables all features)
link-validator.ts - Link and anchor validation
markdown-cache.ts - Performance optimization: Centralized cache for markdown parsing and derived data
config-manager.ts - Multi-layer configuration management
remark-engine.ts - Content linting with remark plugins
reporter.ts - Result formatting and output
Future: Query engine, content output processor, TOC generator

Responsibilities:

Build documentation dependency graph (foundation for all features)
Validate links (files and anchors)
Detect orphaned files
Analyze dependencies and relationships
Run content linting with remark
Cache markdown parsing and derived data (eliminates redundant operations)
Aggregate and return results
Future: Search/query operations, content output, TOC generation

Type Layer

Purpose: Define data structures and schemas

Location: src/types/

Key files:

config.ts - Configuration schemas and types (Zod-based)
graph.ts - Dependency graph data structure
results.ts - Lint results and error types
errors.ts - Lint error message format
exit-codes.ts - Standard Unix exit codes enum

Responsibilities:

Define type-safe configuration schemas
Provide runtime validation with Zod
Structure lint results and errors
Define standard exit codes for Unix compatibility
Ensure type safety across the codebase

Utility Layer

Purpose: Shared utilities and helpers

Location: src/utils/

Key files:

logger.ts - Unix-friendly logging with TTY detection, stdout/stderr separation, quiet/verbose modes
errors.ts - Custom error classes with exit codes and context
error-handler.ts - Error handling middleware and utilities
fs.ts - File system utilities (find markdown files, check existence)
paths.ts - Path resolution for config files (user/project)
slug.ts - GitHub-style heading slugification
reporter.ts - Format lint results for text/JSON output with stream separation

Responsibilities:

Provide Unix-friendly logging (TTY detection, color control, quiet/verbose modes)
Separate data (stdout) from messages (stderr) for pipe compatibility
Handle errors with proper context and exit codes
Manage file system operations
Format output for different consumers

Data Flow

Current Operations (lint, deps)

1. User runs CLI command
   ↓
2. CLI parses arguments (Commander.js)
   ↓
3. ConfigManager loads and merges config
   (Defaults → User Config → Project Config → CLI Options)
   ↓
4. GraphAnalyzer builds dependency graph from entrypoint
   (Foundation step - used by ALL commands)
   ↓
5. Command-specific operations:

   lint:
   ├─ GraphAnalyzer detects orphaned files
   ├─ LinkValidator validates all links (files + anchors)
   └─ RemarkEngine runs content linting

   deps:
   ├─ Extract dependencies for target file
   └─ Format as tree, list, or JSON
   ↓
6. Results aggregation
   ↓
7. Reporter formats results (text, JSON, tree, list)
   ├─ Data to stdout (pipeable)
   └─ Messages to stderr (suppressible with --quiet)
   ↓
8. CLI sets appropriate exit code:
   ├─ 0 = Success
   ├─ 1 = Validation errors
   ├─ 2 = Usage errors
   └─ 130 = Interrupted

Future Operations (query, cat, toc)

All will leverage the same graph foundation built by GraphAnalyzer, with command-specific processing layers.

Configuration System

The configuration system uses a layered approach with clear priority:

Priority (highest to lowest):

CLI Options - Flags passed on command line (--entrypoint, --format, etc.)
Project Config - .mditerc, mdite.config.js, or package.json#mdite
User Config - ~/.config/mdite/config.json (personal defaults)
Defaults - Built-in defaults from src/types/config.ts

Each layer is merged into the next, with higher priority layers overriding lower ones.

Graph Building Algorithm

The dependency graph is built using depth-first traversal:

1. Start with entrypoint file (e.g., README.md) at depth 0
2. Parse markdown to extract links
3. For each relative .md link:
   - Resolve absolute path
   - Skip if already visited (cycle detection)
   - Add edge to graph with current depth
   - Recursively visit target file at depth + 1 (if within maxDepth)
4. Return complete graph of reachable files

Depth Tracking: Each node in the graph tracks its depth from the entrypoint:

Entrypoint is at depth 0
Direct links from entrypoint are at depth 1
Links from those files are at depth 2, etc.
Files beyond maxDepth are not included in the graph

Depth Limiting: The --depth parameter (or depth config option) controls how far traversal goes:

unlimited (default) - Traverse all reachable files (maxDepth = Infinity)
0 - Only the entrypoint file (no links followed)
1 - Entrypoint + direct links only
2 - Entrypoint + direct links + links from those files
etc.

Use Cases for Depth Limiting:

Progressive validation: Start with core docs (depth 1-2), expand gradually
Performance: Limit scope for faster validation on large doc sets
Focused validation: Validate only immediate dependencies of key files

Orphan Detection: After graph is built, find all markdown files in directory that are NOT in the graph. Files beyond maxDepth are considered orphans (not reachable within the specified depth limit).

Link Validation

Link validation handles three types of links:

1. Anchor-only links (`#heading`)

Extract all headings from current file
Convert to GitHub-style slugs
Check if anchor matches any heading

2. File links (`./other.md`)

Resolve relative path
Check if file exists
Report error if not found

3. File + anchor links (`./other.md#section`)

First validate file exists
Then extract headings from target file
Check if anchor matches any heading

Unix CLI Integration Patterns

mdite follows Unix philosophy and conventions for CLI tool design.

Stdout/Stderr Separation

Pattern: Separate data from messages for pipe-friendly operation

stdout (file descriptor 1):
  - Validation results (errors, warnings)
  - JSON output
  - Data intended for further processing

stderr (file descriptor 2):
  - Informational messages (progress, summaries)
  - Headers and separators
  - Success/failure notifications

Implementation:

logger.log() → stdout (always shown)
logger.info(), logger.success(), logger.header() → stderr (suppressed in --quiet)
logger.error() → stderr (always shown)

Benefits:

# Pipe data without progress messages
mdite lint --format json | jq '.'

# Suppress progress, keep only errors
mdite lint 2>/dev/null

# Grep errors without interference
mdite lint | grep "Dead link"

TTY Detection

Pattern: Auto-detect terminal capabilities and adjust output

function shouldUseColors(): boolean {
  if ('NO_COLOR' in process.env) return false;
  if ('FORCE_COLOR' in process.env) return true;
  if (process.env.CI === 'true') return false;
  return process.stdout.isTTY ?? false;
}

Environment Variables:

NO_COLOR - Disable colors (respects no-color.org)
FORCE_COLOR - Force colors even when not a TTY
CI=true - Auto-disable colors in CI environments

CLI Flags:

--colors - Override detection, force colors
--no-colors - Override detection, disable colors

Exit Codes

Pattern: Use standard Unix exit codes for different scenarios

enum ExitCode {
  SUCCESS = 0, // No errors
  ERROR = 1, // Validation/operational errors
  USAGE_ERROR = 2, // Invalid arguments/options
  INTERRUPTED = 130, // SIGINT/SIGTERM (128 + 2)
}

Usage:

# Success check
mdite lint && echo "Success"

# Failure check
mdite lint || echo "Failed"

# Capture exit code
mdite lint
echo $?  # 0, 1, 2, or 130

Signal Handling

Pattern: Handle Unix signals gracefully

process.on('SIGINT', () => {
  console.error('\nInterrupted');
  process.exit(ExitCode.INTERRUPTED);
});

process.on('SIGTERM', () => {
  console.error('\nTerminated');
  process.exit(ExitCode.INTERRUPTED);
});

process.on('SIGPIPE', () => {
  process.exit(ExitCode.SUCCESS);
});

Benefits:

Clean Ctrl+C handling
Proper exit codes for signal termination
SIGPIPE handling for broken pipes (e.g., mdite lint | head)

Quiet Mode

Pattern: Suppress informational output for scripting

class Logger {
  private quiet: boolean;

  info(message: string): void {
    if (this.quiet) return; // Suppressed
    console.error(`ℹ ${message}`);
  }

  error(message: string): void {
    // Always shown, never suppressed
    console.error(`✗ ${message}`);
  }
}

Usage:

# Scripting - only errors
mdite lint --quiet

# CI/CD - clean output
mdite lint --quiet --format json

Error Handling

All errors extend DocLintError base class with:

code - Machine-readable error code
exitCode - CLI exit code (0 = success, 1+ = failure)
context - Additional metadata for debugging
cause - Original error (for error wrapping)

Error hierarchy:

DocLintError (base)
├── ConfigNotFoundError
├── InvalidConfigError
├── FileNotFoundError
├── DirectoryNotFoundError
├── FileReadError
├── FileWriteError
├── ValidationError
├── SchemaValidationError
├── GraphBuildError
├── DeadLinkError
├── DeadAnchorError
├── MarkdownParseError
├── FrontmatterParseError
├── InvalidArgumentError
├── MissingArgumentError
├── OperationCancelledError
└── TimeoutError

Extension Points

Adding a New Rule

Define rule logic in appropriate module (e.g., link-validator.ts)
Add rule name to RuntimeConfig.rules type
Update DEFAULT_CONFIG with default severity
Implement rule checking logic
Add tests for the new rule
Update documentation

Adding a New Command

Create command file in src/commands/ (e.g., commands/check.ts)

import { checkCommand } from './commands/check.js';
program.addCommand(checkCommand());

Add integration tests in tests/integration/
Update README with command documentation

Adding a New Output Format

Update RuntimeConfig.format type in src/types/config.ts
Implement formatter in src/utils/reporter.ts
Add tests for new format
Update CLI help text

Adding Configuration Validation

Update appropriate schema in src/types/config.ts:
- UserConfigSchema for user config
- ProjectConfigSchema for project config
- RuntimeConfigSchema for final runtime config
Zod will automatically validate at runtime
Add tests for invalid configurations

Testing Strategy

Unit Tests (`tests/unit/`)

Test individual modules in isolation
Mock dependencies
Fast execution
High coverage

Integration Tests (`tests/integration/`)

Test full workflows (CLI, commands)
Use real file system (temp directories)
Test error scenarios
Slower but more comprehensive

Test Infrastructure (`tests/`)

setup.ts - Helper functions for test setup
utils.ts - Test utilities (fixtures, assertions)
mocks/ - Mock objects (logger, etc.)
fixtures/ - Sample markdown files for testing

Examples Directory

Location: examples/

Purpose: Runnable examples and smoke tests

Structure

examples/
├── 01-04: Core Examples (Phase 1)
├── 05-06: Real-World + Config Variations (Phase 2)
└── 07: Edge Cases (Phase 3)

Usage

Examples serve three purposes:

User Documentation - Show how mdite works
Manual Testing - Quick smoke tests during development
Regression Testing - Verify behavior across releases

Difference from tests/fixtures/

Aspect	tests/fixtures/	examples/
Purpose	Automated unit tests	Manual demos + smoke tests
Audience	Developers (internal)	Users + Developers
Execution	Via Vitest	Via CLI
Documentation	Minimal	Comprehensive
Scope	Focused test cases	Realistic scenarios

Running Examples

# Individual example
cd examples/01-valid-docs && mdite lint

# Full smoke test suite
cd examples && ./run-all-examples.sh

See examples/README.md for details.

Performance Considerations

Centralized Markdown Cache

The MarkdownCache class eliminates redundant parsing operations:

Content caching: File content read once, reused across operations
AST caching: Markdown parsed once per file (not 2-3 times)
Derived data caching: Headings and links extracted once and cached
Shared processor: Single unified processor instance for all parsing
Automatic cleanup: Cache cleared between operations
Memory efficient: ~6MB for 100 files, ~60MB for 1000 files

Impact: 2-3x overall speedup by reducing parse operations by 60-70%

Graph Building

Uses cycle detection to prevent infinite loops
Visits each file only once
Shared cache eliminates redundant parsing during traversal
Depth limiting optimization skips link extraction at maximum depth

Link Validation

Parallel file validation with controlled concurrency (default: 10 concurrent operations)
Shared cache eliminates redundant parsing during validation
Promise pool prevents resource exhaustion on large documentation sets
Skips external links (http/https)

File System Operations

Uses fs/promises for async I/O
Skips hidden directories and node_modules
Minimal file reads via content caching

Dependencies

Core Dependencies

unified - Markdown parsing and processing
remark-parse - Markdown AST parser
remark-lint - Markdown linting rules
commander - CLI argument parsing
cosmiconfig - Configuration file loading
zod - Runtime schema validation
chalk - Terminal colors
globby - File pattern matching

Development Dependencies

TypeScript - Type safety
Vitest - Testing framework
ESLint - Code linting
Prettier - Code formatting

Design Principles

Separation of Concerns: Clear boundaries between CLI, core logic, and utilities
Type Safety: Comprehensive TypeScript types with runtime validation
Testability: All components are independently testable
Extensibility: Easy to add new rules, commands, and formats
Error Handling: Rich error context with user-friendly messages
Configuration: Flexible multi-layer configuration system
Performance: Async operations with minimal file I/O
Unix Philosophy: Pipe-friendly, proper exit codes, stdout/stderr separation
- Data to stdout, messages to stderr
- TTY detection for automatic color control
- Standard exit codes (0/1/2/130)
- Graceful signal handling (SIGINT, SIGTERM, SIGPIPE)
- Quiet mode for scripting
- Respects NO_COLOR and FORCE_COLOR environment variables

Code Organization

src/
├── cli.ts              # CLI entry point with signal handlers
├── index.ts            # Main executable
├── commands/           # CLI commands
│   ├── lint.ts         # Validation command
│   ├── deps.ts         # Dependency analysis
│   ├── config.ts       # Config management
│   └── init.ts         # Config initialization
├── core/               # Business logic
│   ├── doc-linter.ts   # Main orchestrator
│   ├── graph-analyzer.ts # Graph foundation
│   ├── link-validator.ts # Link validation
│   ├── markdown-cache.ts # Centralized parsing cache
│   ├── config-manager.ts # Config loading
│   ├── remark-engine.ts  # Content linting
│   └── reporter.ts       # Result formatting
├── types/              # Type definitions
│   ├── config.ts       # Config schemas (Zod)
│   ├── graph.ts        # Graph structure
│   ├── results.ts      # Lint results
│   ├── errors.ts       # Error types
│   └── exit-codes.ts   # Unix exit codes enum
└── utils/              # Shared utilities
    ├── logger.ts         # Unix-friendly logging
    ├── errors.ts         # Custom error classes
    ├── error-handler.ts  # Error handling
    ├── fs.ts             # File system ops
    ├── paths.ts          # Path resolution
    ├── slug.ts           # Heading slugification
    ├── reporter.ts       # Output formatting
    └── dependency-reporter.ts  # Dependency formatting

Future Enhancements

Potential areas for expansion:

mdite query: Search across documentation system
- Full-text search across connected docs
- Pattern matching on file names
- Metadata/frontmatter queries
mdite cat: Output documentation content
- Pipe to shell tools
- Order by dependency graph
- Filter and concatenate
mdite toc: Generate table of contents from graph
mdite stats: Documentation metrics and analysis
External link validation: Check HTTP/HTTPS URLs (with caching)
Watch mode: Monitor files and re-lint on changes
Plugin System: Allow external plugins for custom rules
Fix Mode: Automatically fix certain issues
LSP Server: Language server protocol for editor integration
Custom Reporters: Allow custom output formatters
Configuration Presets: Shareable configuration packages

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

Overview

Core Components

CLI Layer

Core Layer

Type Layer

Utility Layer

Data Flow

Current Operations (lint, deps)

Future Operations (query, cat, toc)

Configuration System

Graph Building Algorithm

Link Validation

1. Anchor-only links (#heading)

2. File links (./other.md)

3. File + anchor links (./other.md#section)

Unix CLI Integration Patterns

Stdout/Stderr Separation

TTY Detection

Exit Codes

Signal Handling

Quiet Mode

Error Handling

Extension Points

Adding a New Rule

Adding a New Command

Adding a New Output Format

Adding Configuration Validation

Testing Strategy

Unit Tests (tests/unit/)

Integration Tests (tests/integration/)

Test Infrastructure (tests/)

Examples Directory

Structure

Usage

Difference from tests/fixtures/

Running Examples

Performance Considerations

Centralized Markdown Cache

Graph Building

Link Validation

File System Operations

Dependencies

Core Dependencies

Development Dependencies

Design Principles

Code Organization

Future Enhancements

Resources

1. Anchor-only links (`#heading`)

2. File links (`./other.md`)

3. File + anchor links (`./other.md#section`)

Unit Tests (`tests/unit/`)

Integration Tests (`tests/integration/`)

Test Infrastructure (`tests/`)