AGENTS.md

This file provides guidance for AI coding agents working on the sphinx-codelinks repository.

Project Overview

sphinx-codelinks is a Sphinx extension that provides fast source code traceability for Sphinx-Needs. It enables:

Code analysis: Scan source code files (C++, Python, C#, Rust, YAML) for special comment markers
Automatic documentation generation: Create Sphinx-Needs items from discovered code markers
Source tracing: Link documentation to exact source code locations with line numbers
Multiple languages: Support for various programming languages via tree-sitter parsers
CLI interface: Command-line tools for analyzing code and generating RST documentation

The project integrates with Sphinx-Needs to provide seamless source code traceability in technical documentation.

Repository Structure

pyproject.toml          # Project configuration and dependencies
tox.ini                 # Tox test environment configuration
README.md               # Project README
LICENSE                 # MIT License

src/sphinx_codelinks/   # Main source code
├── __init__.py         # Package init with Sphinx setup() entry point
├── cmd.py              # CLI commands using Typer
├── config.py           # Configuration models using Pydantic
├── logger.py           # Logging utilities
├── needextend_write.py # Write RST files with Sphinx-Needs directives
├── analyse/            # Code analysis module
│   ├── analyse.py      # Main analysis orchestration
│   ├── models.py       # Pydantic models for analysis results
│   ├── oneline_parser.py # One-line comment parser
│   ├── projects.py     # Project-specific analyzers (C++, Python, etc.)
│   └── utils.py        # Analysis utilities
├── source_discover/    # Source file discovery
│   ├── config.py       # Discovery configuration
│   └── source_discover.py # File discovery logic
└── sphinx_extension/   # Sphinx extension components
    ├── source_tracing.py # Main Sphinx extension setup
    ├── html_wrapper.py  # HTML output wrapper for traced source
    ├── debug.py         # Debug utilities
    ├── ub_sct.css       # CSS for source tracing UI
    └── directives/      # Custom Sphinx directives

tests/                  # Test suite
├── __init__.py
├── conftest.py         # Pytest fixtures and configuration
├── test_analyse.py     # Analysis tests
├── test_*.py           # Various test modules
├── __snapshots__/      # Syrupy snapshot test fixtures
└── data/               # Test data and fixtures

docs/                   # Documentation source (RST)
├── conf.py             # Sphinx configuration
├── source/
│   ├── index.rst       # Documentation index
│   ├── basics/         # Basic usage documentation
│   ├── components/     # Component documentation
│   └── development/    # Development documentation

Development Commands

All commands should be run via tox for consistency. The project uses tox-uv for faster environment creation.

Testing

# Run default test environment
tox

# Run tests for specific Python/Sphinx combination
tox -e py312-sphinx8

# Run a specific test file
tox -e py312-sphinx8 -- tests/test_analyse.py

# Run a specific test function
tox -e py312-sphinx8 -- tests/test_analyse.py::test_function_name

# Run with coverage
tox -e py312-sphinx8 -- --cov=sphinx_codelinks

# Update snapshot test fixtures
tox -e py312-sphinx8 -- --snapshot-update

Documentation

# Build docs (clean)
tox -e docs-clean

# Build docs (incremental, after clean build)
tox -e docs-update

# Build with different builder (e.g., linkcheck)
BUILDER=linkcheck tox -e docs-clean

# Live rebuild with browser auto-reload
tox -e docs-live

Code Quality

# Type checking with mypy
tox -e mypy

# Linting with ruff (check only)
tox -e ruff-check

# Auto-format with ruff
tox -e ruff-fmt

# Run pre-commit hooks on all files
pre-commit run --all-files

Code Style Guidelines

Formatter/Linter: Ruff (configured in pyproject.toml)
Type Checking: Mypy with strict settings (configured in pyproject.toml)
Markdown: Follow markdownlint rules for consistent and well-formatted Markdown files
Pre-commit: Use pre-commit hooks for consistent code style

Best Practices

Type annotations: Use complete type annotations for all function signatures. Use Pydantic models for configuration and data structures.
Docstrings: Use Sphinx-style docstrings (:param:, :return:, :raises:). Types are not required in docstrings as they should be in type hints.
Markdown formatting: Write clear, well-structured Markdown that adheres to markdownlint rules. Use proper headings, lists, and code blocks.
Immutability: Prefer immutable data structures where possible. Use frozen Pydantic models for configuration.
Pure functions: Where possible, write pure functions without side effects.
Error handling: Raise descriptive exceptions with helpful error messages. Use custom exception types where appropriate.
Testing: Write tests for all new functionality. Use syrupy for snapshot testing of complex outputs.

Docstring Example

def discover_source_files(
    root_dir: Path,
    include_patterns: list[str],
    exclude_patterns: list[str],
    *,
    respect_gitignore: bool = True,
) -> list[Path]:
    """Discover source files matching the given patterns.

    :param root_dir: The root directory to search from.
    :param include_patterns: Glob patterns for files to include.
    :param exclude_patterns: Glob patterns for files to exclude.
    :param respect_gitignore: Whether to respect .gitignore rules.
    :return: List of discovered file paths.
    :raises ValueError: If root_dir does not exist.
    """
    ...

Testing Guidelines

Test Structure

Tests use pytest with fixtures from conftest.py
Snapshot testing uses syrupy for complex output comparisons
Test data is in tests/data/ directory
Sphinx integration tests use actual Sphinx projects in tests/doc_test/

Writing Tests

For code analysis tests, create test data in tests/data/ with source files
Use syrupy for comparing complex analysis outputs (JSON, doctrees, etc.)
For Sphinx integration, create minimal projects in tests/doc_test/
Use parametrized tests for testing multiple language parsers

Test Best Practices

Test coverage: Write tests for all new functionality and bug fixes
Isolation: Each test should be independent and not rely on state from other tests
Descriptive names: Test function names should describe what is being tested
Snapshot testing: Use snapshot.assert_match() for complex output comparisons
Parametrization: Use @pytest.mark.parametrize for multiple test scenarios
Fixtures: Define reusable fixtures in conftest.py

Example Test Pattern

import pytest
from pathlib import Path

def test_analyse_cpp_file(snapshot, tmp_path):
    """Test C++ file analysis produces correct output."""
    # Arrange
    source_file = tmp_path / "test.cpp"
    source_file.write_text("""
    // @req{REQ-001}
    void function() {}
    """)

    # Act
    result = analyse_file(source_file)

    # Assert
    assert snapshot == result

Commit Message Format

Use this format:

<EMOJI> <KEYWORD>: Summarize in 72 chars or less (#<PR>)

Optional detailed explanation.

Keywords:

✨ NEW: – New feature
🐛 FIX: – Bug fix
👌 IMPROVE: – Improvement (no breaking changes)
‼️ BREAKING: – Breaking change
📚 DOCS: – Documentation
🔧 MAINTAIN: – Maintenance changes only (typos, etc.)
🧪 TEST: – Tests or CI changes only
♻️ REFACTOR: – Refactoring

PR Title and Description Format

Use the same as for the commit message format, but for the title you can omit the KEYWORD and only use EMOJI

Pull Request Requirements

When submitting changes:

Description: Include a meaningful description or link explaining the change
Tests: Include test cases for new functionality or bug fixes
Documentation: Update docs if behavior changes or new features are added
Changelog: Update relevant changelog or release notes
Code Quality: Ensure pre-commit run --all-files passes

Architecture Overview

Analysis Pipeline

The code analysis follows a multi-stage pipeline:

Source Files → Discovery → Parsing → Analysis → Results (JSON) → RST Generation

Discovery (source_discover/): Scan directories for source files matching patterns
Parsing (analyse/oneline_parser.py): Use tree-sitter to parse source code AST
Analysis (analyse/analyse.py, analyse/projects.py): Extract markers and metadata
Output (needextend_write.py): Generate RST with Sphinx-Needs directives

Sphinx Integration Flow

The Sphinx extension hooks into multiple build events to provide source tracing:

flowchart TB
    subgraph init["Initialization (config-inited)"]
        setup["setup() in __init__.py"]
        load_toml["load_config_from_toml()"]
        sn_options["update_sn_extra_options()"]
        sn_types["update_sn_types()"]
        check_config["check_sphinx_configuration()"]
    end

    subgraph prepare["Build Preparation"]
        builder_init["builder_inited: Copy CSS assets"]
        env_prepare["env-before-read-docs: prepare_env()"]
    end

    subgraph generate["Page Generation (html-collect-pages)"]
        gen_pages["generate_code_page()"]
        html_wrap["html_wrapper()"]
    end

    subgraph context["Page Context (html-page-context)"]
        add_css["add_custom_css()"]
    end

    subgraph finish["Build Finished"]
        warnings["emit_warnings()"]
        timing["debug.process_timing()"]
    end

    setup --> load_toml --> sn_options --> sn_types --> check_config
    check_config --> builder_init --> env_prepare
    env_prepare --> gen_pages --> html_wrap
    html_wrap --> add_css --> warnings --> timing

    style load_toml fill:#e1f5fe
    style gen_pages fill:#e1f5fe
    style html_wrap fill:#e1f5fe

Event Handlers

The extension connects to these Sphinx events (in execution order):

Event	Handler	Purpose
`config-inited`	`load_config_from_toml()`	Load configuration from TOML file if specified
`config-inited`	`update_sn_extra_options()`	Register sphinx-needs extra options (project, file, directory, URLs)
`config-inited`	`update_sn_types()`	Add `srctrace` need type to sphinx-needs
`config-inited`	`check_sphinx_configuration()`	Validate configuration and raise errors
`builder-inited`	`builder_inited()`	Copy CSS assets to output directory
`env-before-read-docs`	`prepare_env()`	Initialize timing measurements and debug filters
`html-collect-pages`	`generate_code_page()`	Generate HTML pages for traced source files
`html-page-context`	`add_custom_css()`	Inject custom CSS for source tracing UI
`build-finished`	`emit_warnings()`	Emit collected warnings from analysis
`build-finished`	`debug.process_timing()`	Output timing measurements if enabled

Key Integration Points

sphinx-needs Dependency: The extension requires sphinx-needs and checks for its presence in setup(). It adds extra options (project, file, directory, URL fields) and a custom need type (srctrace).
TOML Configuration: Configuration can be loaded from a TOML file specified in conf.py via src_trace_config_from_toml. The TOML is parsed and values are set on the Sphinx config object.
Source Page Generation: The generate_code_page() function yields tuples of (pagename, context, template) for each traced source file, allowing Sphinx to generate standalone HTML pages with syntax-highlighted source code and line-number anchors.
CSS Injection: Custom CSS (ub_sct.css) is copied to _static/source_tracing/ and added only to pages that contain traced source code.

Key Components

Configuration (`config.py`)

Pydantic models define all configuration options:

AnalyseConfig: Main analysis configuration with source paths, patterns, markers
Uses Pydantic v2 with validation and serialization
Configuration loaded from TOML files

Source Discovery (`source_discover/`)

discover_source_files(): Find source files matching include/exclude patterns
Respects .gitignore rules using gitignore-parser
Returns filtered list of files to analyze

Code Analysis (`analyse/`)

analyse.py: Main orchestrator that coordinates analysis across all source files
projects.py: Language-specific analyzers (C++, Python, C#, Rust, YAML)
oneline_parser.py: Tree-sitter based parser for extracting comment markers
models.py: Pydantic models for analysis results (markers, line ranges, etc.)
utils.py: Helper functions for path handling, marker extraction

Tree-sitter Integration

Uses tree-sitter parsers for each supported language
Extracts comments from AST nodes
Parses special marker syntax (e.g., @req{ID}, @test{ID})
Maintains line number information for source tracing

Sphinx Extension (`sphinx_extension/`)

source_tracing.py: Main extension setup with setup() function
html_wrapper.py: Wraps source code blocks with tracing metadata
debug.py: Debug utilities for development
Hooks into Sphinx build events to inject source tracing information

CLI Interface

The CLI uses Typer for command definitions:

codelinks analyse <config>: Analyze source code and output JSON
codelinks write <format> <input> --outpath <file>: Generate RST from JSON

Key Files

pyproject.toml - Project configuration, dependencies, and tool settings
src/sphinx_codelinks/__init__.py - Package entry point with setup() for Sphinx
src/sphinx_codelinks/cmd.py - CLI commands and argument parsing
src/sphinx_codelinks/config.py - Pydantic configuration models
src/sphinx_codelinks/analyse/analyse.py - Main analysis orchestration
src/sphinx_codelinks/analyse/projects.py - Language-specific analyzers
src/sphinx_codelinks/analyse/oneline_parser.py - Tree-sitter comment parser
src/sphinx_codelinks/sphinx_extension/source_tracing.py - Sphinx extension setup
tests/conftest.py - Pytest fixtures and test configuration

Debugging

Use --pdb with pytest to drop into debugger on failures: tox -e py312-sphinx8 -- --pdb
Use -v for verbose test output: tox -e py312-sphinx8 -- -v
Build docs with -T flag for full tracebacks: tox -e docs-clean -- -T
Set logging level in tests: tox -e py312-sphinx8 -- --log-cli-level=DEBUG
Use debug.py module functions for development debugging

Common Patterns

Adding Support for a New Language

Add tree-sitter parser dependency to pyproject.toml (e.g., tree-sitter-java)

Create language-specific analyzer in analyse/projects.py:

class JavaAnalyzer(BaseAnalyzer):
    language = "java"
    parser_language = "java"

    def get_comment_nodes(self, tree):
        # Return comment nodes from tree

Register analyzer in LANGUAGE_ANALYZERS dict in projects.py
Add test files in tests/data/<language>/
Add tests in tests/test_analyse.py

Adding a New Marker Type

Update marker regex patterns in config.py or analyzer
Update models.py if new fields are needed
Update parsing logic in oneline_parser.py
Update RST generation in needextend_write.py if needed
Add tests with new marker examples

Adding a CLI Command

Add command function in cmd.py using Typer decorators:

@app.command()
def new_command(arg: str = typer.Argument(..., help="Description")):
    """Command description."""
    # Implementation

Add tests in tests/test_cmd.py
Update documentation in docs/source/components/cli.rst

Adding Configuration Options

Add field to AnalyseConfig or relevant Pydantic model in config.py
Add validation if needed using Pydantic validators
Update TOML configuration examples in docs/ and tests/data/configs/
Add tests for new configuration option
Document in docs/source/components/configuration.rst

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Project Overview

Repository Structure

Development Commands

Testing

Documentation

Code Quality

Code Style Guidelines

Best Practices

Docstring Example

Testing Guidelines

Test Structure

Writing Tests

Test Best Practices

Example Test Pattern

Commit Message Format

PR Title and Description Format

Pull Request Requirements

Architecture Overview

Analysis Pipeline

Sphinx Integration Flow

Event Handlers

Key Integration Points

Key Components

Configuration (config.py)

Source Discovery (source_discover/)

Code Analysis (analyse/)

Tree-sitter Integration

Sphinx Extension (sphinx_extension/)

CLI Interface

Key Files

Debugging

Common Patterns

Adding Support for a New Language

Adding a New Marker Type

Adding a CLI Command

Adding Configuration Options

Reference Documentation

Configuration (`config.py`)

Source Discovery (`source_discover/`)

Code Analysis (`analyse/`)

Sphinx Extension (`sphinx_extension/`)