-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersinfrastructure
Description
Documentation Infrastructure: Schema-Driven Validation with Pydantic & Automated Enforcement
π― Vision
Create a unified, schema-driven documentation pipeline where:
- JSON Schemas (
docs/schemas/) define the truth - Pydantic models are generated from schemas for type-safe Python validation
- Scripts (
scripts/) use Pydantic models for consistent validation - Pre-commit hooks enforce compliance on every commit
- CI pipeline validates on every PR
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Schema-Driven Validation Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β docs/schemas/ scripts/ Enforcement β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββ β
β β docstring- β generates β Pydantic Models β used β Pre-commit β β
β β schema.json ββββββββββββΆβ (type-safe) βββββββββΆβ Hooks β β
β ββββββββββββββββββββ€ ββββββββββββββββββββ€ ββββββββββββββ€ β
β β vitepress- β β validate_*.py β β CI/CD β β
β β mapping.json β β generate_*.py β β Pipeline β β
β ββββββββββββββββββββ€ β check_*.py β ββββββββββββββ β
β β default- β ββββββββββββββββββββ β
β β mapping.json β β
β ββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Current Assets
JSON Schemas (docs/schemas/)
| File | Purpose | Lines |
|---|---|---|
docstring-schema.json |
Defines COCO/BBOB docstring structure (AlgorithmMetadata, Args, Attributes, etc.) | 722 |
vitepress-mapping-schema.json |
Maps docstring sections β VitePress rendering rules | 372 |
default-mapping.json |
Default configuration + transformation rules | 503 |
Scripts (scripts/) - Currently Independent
| Script | Purpose | Uses Schema? |
|---|---|---|
validate_optimizer_docs.py |
COCO/BBOB compliance | β Regex-based |
check_google_docstring_inline_descriptions.py |
Inline format | β Regex-based |
batch_update_docstrings.py |
Generate templates | β Hardcoded |
generate_docs.py |
VitePress generation | β Partial |
fix_docstring_indentation.py |
Fix indentation | β Regex-based |
fix_multiline_returns.py |
Fix Returns format | β Regex-based |
Problem: Scripts duplicate validation logic instead of using the schema as single source of truth.
π― Target Architecture
1. Pydantic Models from JSON Schema
Generate type-safe Python models from docstring-schema.json:
# opt/docstring_models.py (generated from schema)
from pydantic import BaseModel, Field
from typing import Literal
class AlgorithmMetadata(BaseModel):
algorithm_name: str
acronym: str = Field(pattern=r"^[A-Z][A-Z0-9-]*$")
year_introduced: int = Field(ge=1900, le=2100)
authors: str
algorithm_class: Literal[
"Swarm Intelligence", "Evolutionary", "Gradient-Based",
"Classical", "Metaheuristic", "Physics-Inspired",
"Probabilistic", "Social-Inspired", "Constrained", "Multi-Objective"
]
complexity: str # LaTeX notation
properties: list[str]
implementation: str = "Python 3.10+"
coco_compatible: bool
class COCOBBOBSettings(BaseModel):
search_space: str
evaluation_budget: str
default_dimensions: list[int]
performance_metrics: list[str]
class DocstringSchema(BaseModel):
"""Root model - single source of truth for validation."""
summary: str = Field(max_length=80)
algorithm_metadata: AlgorithmMetadata
coco_bbob_benchmark_settings: COCOBBOBSettings
args: dict[str, "ArgDefinition"]
attributes: dict[str, "AttributeDefinition"]
# ... etc2. Scripts Use Pydantic Models
# scripts/validate_optimizer_docs.py
from opt.docstring_models import DocstringSchema
from pydantic import ValidationError
def validate_docstring(parsed_docstring: dict) -> list[str]:
"""Validate using Pydantic model (schema-driven)."""
try:
DocstringSchema.model_validate(parsed_docstring)
return []
except ValidationError as e:
return [str(err) for err in e.errors()]3. Pre-Commit Hook Integration
# .pre-commit-config.yaml
- repo: local
hooks:
- id: validate-docstrings-pydantic
name: Validate docstrings against schema (Pydantic)
entry: python scripts/validate_optimizer_docs.py
language: python
files: ^opt/(classical|constrained|...|swarm_intelligence)/.*\.py$
additional_dependencies: [pydantic>=2.0]4. CI Pipeline
# .github/workflows/docs-validation.yml
jobs:
validate-docstrings:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
- name: Install dependencies
run: uv sync
- name: Validate all optimizer docstrings
run: uv run python scripts/validate_optimizer_docs.py --all
- name: Generate docs (dry-run)
run: uv run python scripts/generate_docs.py --dry-runβ Acceptance Criteria
Phase 1: Pydantic Model Generation
- Install
datamodel-code-generatoror manual Pydantic models - Generate
opt/docstring_models.pyfromdocstring-schema.json - Add Pydantic as project dependency (
pyproject.toml) - Validate models match schema definitions
Phase 2: Script Integration
- Refactor
validate_optimizer_docs.pyto use Pydantic models - Refactor
check_google_docstring_inline_descriptions.pyto use Pydantic - Refactor
generate_docs.pyto use VitePress mapping schema - Add
DocstringParserclass to convert raw docstring β dict β Pydantic model
Phase 3: Pre-Commit Enforcement
- Update
.pre-commit-config.yamlwith Pydantic-based validators - Consolidate overlapping hooks into unified validator
- Add schema validation to pre-commit
Phase 4: CI Pipeline
- Create
.github/workflows/docs-validation.yml - Add job: Validate all optimizer docstrings
- Add job: Generate docs (dry-run validation)
- Add job: Schema consistency check
Phase 5: Documentation
- Document Pydantic model usage in
scripts/README.md - Add schema update workflow documentation
- Document how to extend schema for new fields
π§ Implementation Tasks
Task 1: Generate Pydantic Models
# Option A: Use datamodel-code-generator
uv add --dev datamodel-code-generator
datamodel-codegen --input docs/schemas/docstring-schema.json --output opt/docstring_models.py
# Option B: Manual implementation (more control)
# Create opt/docstring_models.py manually based on schemaTask 2: Create DocstringParser
# scripts/docstring_parser.py
import ast
import re
from opt.docstring_models import DocstringSchema
class DocstringParser:
"""Parse Python docstring into validated Pydantic model."""
def parse_file(self, filepath: str) -> DocstringSchema:
"""Extract and validate class docstring from file."""
with open(filepath) as f:
tree = ast.parse(f.read())
# ... parse docstring sections
return DocstringSchema.model_validate(parsed_dict)Task 3: Unified Pre-Commit Hook
# scripts/unified_validator.py
"""Unified docstring validator using Pydantic schemas."""
from opt.docstring_models import DocstringSchema
from scripts.docstring_parser import DocstringParser
def main(files: list[str]) -> int:
parser = DocstringParser()
errors = []
for file in files:
try:
parser.parse_file(file) # Pydantic validation happens here
except ValidationError as e:
errors.extend(format_errors(file, e))
if errors:
print("\n".join(errors))
return 1
return 0π Progress Tracking
| Phase | Status | Tracking |
|---|---|---|
| Phase 1: Pydantic Models | π§ Not Started | |
| Phase 2: Script Integration | π§ Not Started | |
| Phase 3: Pre-Commit | β Partial (hooks exist, need Pydantic) | |
| Phase 4: CI Pipeline | π§ Not Started | |
| Phase 5: Documentation | π§ Not Started |
Existing Work (Reference)
π Related PRs
| PR | Description |
|---|---|
| #91 | Batch docstring update script |
| #94 | Pre-commit hooks for validation |
| #100-#112 | Category docstring updates (all 10 categories) |
π References
- Pydantic v2 Documentation
- datamodel-code-generator
- JSON Schema Draft-07
- Google Python Style Guide - Docstrings
- Pre-commit Hooks
π·οΈ Priority
High - Foundation for consistent documentation across 120 algorithms.
β±οΈ Estimated Effort
| Task | Time |
|---|---|
| Pydantic model generation | 2-3 hours |
| Script refactoring | 4-6 hours |
| Pre-commit integration | 1-2 hours |
| CI pipeline | 2-3 hours |
| Documentation | 2-3 hours |
| Total | 11-17 hours |
Reactions are currently unavailable
Metadata
Metadata
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersinfrastructure