Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions PERFORMANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Performance Guide

This guide provides recommendations for optimal mdxify performance in CI/CD and pre-commit scenarios.

## Quick Start: Fastest Invocation

### 1. Use the Python API (Recommended for CI/CD)

For best performance in automated workflows, use the programmatic API:

```python
# scripts/generate_api_ref.py
from mdxify import generate_docs

result = generate_docs(
"prefect",
output_dir="docs/v3/api-ref/python",
exclude=["prefect.agent"],
anchor_name="Python SDK Reference",
include_inheritance=True,
repo_url="https://github.com/PrefectHQ/prefect",
)

print(f"✓ Generated {result['modules_processed']} modules in {result['time_elapsed']:.3f}s")
if result['modules_failed']:
print(f"✗ Failed: {result['modules_failed']} modules")
```

This avoids all CLI startup overhead and is the fastest option.

### 2. Use `uv run` for CLI (Good Performance)

If you need the CLI, use `uv run` directly:

```bash
uv run mdxify \
--all \
--root-module prefect \
--output-dir docs/v3/api-ref/python \
--exclude prefect.agent
```

### 3. Use `uvx` Without --refresh-package (Acceptable Performance)

For one-off runs with uvx:

```bash
uvx mdxify \
--all \
--root-module prefect \
--output-dir docs/v3/api-ref/python
```

**Note:** Avoid `--refresh-package` unless necessary. It adds ~2s overhead.

## Performance Comparison

Based on benchmarking with Prefect (290 modules):

| Method | Time | Notes |
|--------|------|-------|
| Python API | ~0.6-1.0s | Core generation only, no overhead |
| `uv run` | ~0.7-1.5s | Minimal CLI overhead |
| `uvx` (no refresh) | ~1.0-2.0s | Some environment resolution |
| `uvx --refresh-package` | ~3.0-5.0s | Full package refresh |

## Pre-commit Hook Example

For pre-commit/pre-push hooks, use the Python API:

```yaml
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: generate-api-docs
name: Generate API Documentation
entry: python scripts/generate_api_ref.py
language: python
additional_dependencies: [mdxify]
pass_filenames: false
stages: [push]
```

## Tips for Large Codebases

1. **Use parallel processing**: mdxify automatically uses 8 workers for parallel processing
2. **Exclude unnecessary modules**: Use `--exclude` to skip internal/test modules
3. **Consider incremental updates**: For development, generate only changed modules
4. **Pin mdxify version**: Avoid version resolution overhead by pinning: `mdxify==0.x.x`

## Troubleshooting Slow Performance

If mdxify seems slow:

1. **Check for --refresh-package**: Remove it if not needed
2. **Verify Python environment**: Ensure mdxify is installed in the active environment
3. **Profile imports**: Heavy user code imports can slow down parsing
4. **Use verbose mode**: Add `-v` to see per-module timing

## Future Improvements

We're working on:
- Incremental generation (only rebuild changed modules)
- Caching of parsed module data
- Further lazy loading optimizations
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ pip install mdxify

## Usage

### CLI Usage

Generate documentation for all modules in a package:

```bash
Expand All @@ -35,6 +37,24 @@ Exclude internal modules from documentation:
mdxify --all --root-module mypackage --exclude mypackage.internal --exclude mypackage.tests
```

### Programmatic API (Recommended for CI/CD)

For best performance in automated workflows:

```python
from mdxify import generate_docs

result = generate_docs(
"mypackage",
output_dir="docs/python-sdk",
exclude=["mypackage.internal", "mypackage.tests"],
)

print(f"Generated {result['modules_processed']} modules in {result['time_elapsed']:.3f}s")
```

See [PERFORMANCE.md](PERFORMANCE.md) for detailed performance optimization tips.

### Options

- `modules`: Specific modules to document
Expand Down
48 changes: 48 additions & 0 deletions docs/mdxify-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: api
sidebarTitle: api
---

# `mdxify.api`


Programmatic API for mdxify.

This module provides a Python API for generating MDX documentation without CLI overhead.
This is the recommended approach for CI/CD and pre-commit scenarios where performance matters.


## Functions

### `generate_docs` <sup><a href="https://github.com/zzstoatzz/mdxify/blob/main/src/mdxify/api.py#L10" target="_blank"><Icon icon="github" style="width: 14px; height: 14px;" /></a></sup>

```python
generate_docs(root_module: str, output_dir: str | Path = 'docs/python-sdk') -> dict
```


Generate MDX documentation for a Python package.

This is the programmatic API for mdxify, designed for optimal performance
when called from Python scripts (e.g., in CI/CD pipelines).

**Args:**
- `root_module`: The root module to document (e.g., 'prefect')
- `output_dir`: Output directory for MDX files
- `exclude`: List of module patterns to exclude
- `anchor_name`: Navigation anchor name in docs.json
- `repo_url`: GitHub repository URL for source links
- `branch`: Git branch for source links
- `include_internal`: Include internal/private modules
- `include_inheritance`: Include inherited methods in docs
- `skip_empty_parents`: Skip parent modules with only boilerplate
- `verbose`: Enable verbose output

**Returns:**
- Dictionary with generation statistics:
- - modules_processed: Number of modules processed
- - modules_failed: Number of modules that failed
- - time_elapsed: Total time in seconds
- - files_created: Number of new files created
- - files_updated: Number of existing files updated

99 changes: 99 additions & 0 deletions repros/20.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/usr/bin/env python
import os
import statistics
import subprocess
import time
from pathlib import Path

# Ensure we're running from mdxify root
os.chdir(Path(__file__).parent.parent)

def run_benchmark(command: list[str], runs: int = 5) -> dict:
"""Run a command multiple times and measure performance."""
times = []
for i in range(runs):
start = time.perf_counter()
result = subprocess.run(command, capture_output=True, text=True)
end = time.perf_counter()
elapsed = end - start
times.append(elapsed)
print(f" Run {i+1}: {elapsed:.3f}s")
if result.returncode != 0:
print(f" Error: {result.stderr}")

return {
"min": min(times),
"max": max(times),
"mean": statistics.mean(times),
"stdev": statistics.stdev(times) if len(times) > 1 else 0,
"times": times
}

def main():
print("=== mdxify Performance Benchmark ===\n")

# Test on Prefect (matching the issue)
prefect_path = Path("sandbox/prefect")
if prefect_path.exists():
print("Testing on Prefect codebase (290 modules as per issue #20)\n")

print("1. Testing with uvx (simulating Prefect's current usage):")
cmd_uvx = [
"uvx", "--with-editable", ".", "--refresh-package", "mdxify",
"mdxify", "--all", "--root-module", "prefect",
"--output-dir", str(prefect_path / "docs/v3/api-ref/python"),
"--anchor-name", "Python SDK Reference",
"--exclude", "prefect.agent",
"--include-inheritance",
"--repo-url", "https://github.com/PrefectHQ/prefect"
]
print("Command: uvx ... mdxify --all --root-module prefect ...")
results_uvx = run_benchmark(cmd_uvx, runs=3)
print(f" Average: {results_uvx['mean']:.3f}s ± {results_uvx['stdev']:.3f}s\n")

print("2. Testing with uvx without --refresh-package:")
cmd_uvx_no_refresh = [
"uvx", "--with-editable", ".",
"mdxify", "--all", "--root-module", "prefect",
"--output-dir", str(prefect_path / "docs/v3/api-ref/python"),
"--anchor-name", "Python SDK Reference",
"--exclude", "prefect.agent",
"--include-inheritance",
"--repo-url", "https://github.com/PrefectHQ/prefect"
]
print("Command: uvx --with-editable . mdxify ...")
results_no_refresh = run_benchmark(cmd_uvx_no_refresh, runs=3)
print(f" Average: {results_no_refresh['mean']:.3f}s ± {results_no_refresh['stdev']:.3f}s\n")

print("3. Testing with uv run (direct execution):")
cmd_uv = [
"uv", "run", "mdxify", "--all", "--root-module", "prefect",
"--output-dir", str(prefect_path / "docs/v3/api-ref/python"),
"--anchor-name", "Python SDK Reference",
"--exclude", "prefect.agent",
"--include-inheritance",
"--repo-url", "https://github.com/PrefectHQ/prefect"
]
print("Command: uv run mdxify ...")
results_uv = run_benchmark(cmd_uv, runs=3)
print(f" Average: {results_uv['mean']:.3f}s ± {results_uv['stdev']:.3f}s\n")

print("4. Testing import time only:")
import_test = [
"uv", "run", "python", "-c",
"import time; s=time.perf_counter(); from mdxify.cli import app; print(f'Import time: {time.perf_counter()-s:.3f}s')"
]
print("Testing CLI import time...")
subprocess.run(import_test)

print("\n=== Summary ===")
print(f"uvx with --refresh-package: {results_uvx['mean']:.3f}s")
print(f"uvx without refresh: {results_no_refresh['mean']:.3f}s")
print(f"uv run (direct): {results_uv['mean']:.3f}s")
print(f"Overhead from --refresh-package: {results_uvx['mean'] - results_no_refresh['mean']:.3f}s")
print(f"Overhead from uvx vs uv run: {results_no_refresh['mean'] - results_uv['mean']:.3f}s")
else:
print("Prefect test directory not found. Please ensure sandbox/prefect exists.")

if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions src/mdxify/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""mdxify - Generate MDX API documentation from Python modules."""

from .api import generate_docs
from .cli import main
from .discovery import find_all_modules, get_module_source_file, should_include_module
from .formatter import escape_mdx_content, format_docstring_with_griffe
Expand All @@ -12,6 +13,7 @@
from .parser import extract_docstring, extract_function_signature, parse_module_fast, parse_modules_with_inheritance, ClassRegistry

__all__ = [
"generate_docs",
"main",
"find_all_modules",
"get_module_source_file",
Expand Down
Loading