Skip to content

CI/Documentation Quality: Pre-commit Failures & Ruff Exception Reduction #68

@Anselmoo

Description

@Anselmoo

TL;DR

Pre-commit hooks are failing with ~24 pydocstyle errors and ~600+ COCO/BBOB compliance violations across the codebase. Additionally, pyproject.toml contains an excessive global Ruff ignore list (23+ codes) that masks code quality issues and must be reduced by 50% within 2 sprints.

CRITICAL: This issue blocks clean CI and requires complete remediation across 60-80 optimizer files, not just a subset.


Problem

Current state prevents clean CI runs and masks widespread documentation quality issues:

  1. ~24 pydocstyle violations (D107, D103, D417) across swarm_intelligence, constrained, multi_objective, probabilistic, and metaheuristic categories
  2. ~600+ COCO/BBOB compliance violations across 60-80 files in all 10 optimizer categories
  3. 23+ global Ruff ignore codes in pyproject.toml with no per-file justifications
  4. D107 and D103 are globally ignored yet still failing in pre-commit pydocstyle checks (configuration conflict)

Need to validate that batch script and template work correctly before processing remaining 107+ files. This issue creates 10 exemplar files spanning all categories, then scales remediation to complete coverage.


Solution

Phase 1: Validation (10 Exemplar Files)
Manually complete 10 high-priority optimizers with full COCO/BBOB compliance to serve as validation checkpoints and reference implementations.

Phase 2: Complete Remediation (60-80 Files)
Apply validated approach to ALL affected files for comprehensive CI/documentation quality restoration.


Files to Update

Phase 1: Validation Checkpoints (10 files)

  • opt/swarm_intelligence/particle_swarm.py (validates swarm intelligence category)
  • opt/gradient_based/adamw.py (validates gradient-based category)
  • opt/classical/simulated_annealing.py (validates classical category)
  • opt/evolutionary/genetic_algorithm.py (validates evolutionary category)
  • opt/metaheuristic/harmony_search.py (validates metaheuristic category)
  • opt/swarm_intelligence/ant_colony.py (validates swarm diversity)
  • opt/gradient_based/sgd_momentum.py (validates gradient diversity)
  • opt/classical/nelder_mead.py (validates classical diversity)
  • opt/evolutionary/differential_evolution.py (validates evolutionary diversity)
  • opt/physics_inspired/gravitational_search.py (validates physics-inspired category)

Phase 2: Complete Coverage (50-70 additional files)

Organized by failure type for systematic remediation:

Tier 1: pydocstyle + COCO/BBOB Failures (~24 files - HIGHEST PRIORITY)

Files with both D107/D103/D417 violations AND missing COCO/BBOB sections:

swarm_intelligence/ (15 files)

  • glowworm_swarm_optimization, wild_horse, orca_predator, sand_cat, african_buffalo_optimization, brown_bear, barnacles_mating, coati_optimizer, mountain_gazelle, artificial_hummingbird, mayfly_optimizer, honey_badger, black_widow, moth_search, dingo_optimizer

constrained/ (3 files)

  • barrier_method, penalty_method, sequential_quadratic_programming

multi_objective/ (1 file)

  • spea2

probabilistic/ (1 file)

  • parzen_tree_stimator

metaheuristic/ (1 file)

  • forensic_based

Tier 2: COCO/BBOB Only Failures (~40-50 files)

Files missing Algorithm Metadata, COCO/BBOB Benchmark Settings, seed=42 examples, or seed attribute documentation:

swarm_intelligence/ (~30 files)

  • salp_swarm_algorithm, emperor_penguin, slime_mould, pelican_optimizer, marine_predators_algorithm, osprey_optimizer, firefly_algorithm, zebra_optimizer, ant_lion_optimizer, giant_trevally, chimp_optimization, artificial_fish_swarm_algorithm, grey_wolf_optimizer, golden_eagle, tunicate_swarm, cat_swarm_optimization, african_vultures_optimizer, dragonfly_algorithm, grasshopper_optimization, fennec_fox, harris_hawks_optimization, spotted_hyena, aquila_optimizer, starling_murmuration, seagull_optimization, moth_flame_optimization, bat_algorithm, dandelion_optimizer, manta_ray, cuckoo_search, flower_pollination, snow_geese, pathfinder, whale_optimization_algorithm, artificial_rabbits, reptile_search, bee_algorithm, artificial_gorilla_troops, squirrel_search

gradient_based/ (9 files)

  • adadelta, amsgrad, adagrad, stochastic_gradient_descent, rmsprop, adaptive_moment_estimation, nesterov_accelerated_gradient, nadam, adamax

classical/ (7 files)

  • bfgs, lbfgs, hill_climbing, trust_region, conjugate_gradient, powell, tabu_search

metaheuristic/ (9 files)

  • cross_entropy_method, sine_cosine_algorithm, arithmetic_optimization, variable_neighbourhood_search, eagle_strategy, colliding_bodies_optimization, stochastic_fractal_search, very_large_scale_neighborhood_search, stochastic_diffusion_search, variable_depth_search, shuffled_frog_leaping_algorithm, particle_filter

evolutionary/ (5 files)

  • cultural_algorithm, estimation_of_distribution_algorithm, imperialist_competitive_algorithm, cma_es

physics_inspired/ (4 files)

  • rime_optimizer, atom_search, equilibrium_optimizer

social_inspired/ (3 files)

  • teaching_learning, political_optimizer, soccer_league_optimizer

probabilistic/ (4 files)

  • adaptive_metropolis, linear_discriminant_analysis, sequential_monte_carlo, bayesian_optimizer

constrained/ (2 files)

  • augmented_lagrangian_method, successive_linear_programming

multi_objective/ (2 files)

  • nsga_ii, moead

Detailed Pre-commit Failure Analysis

1. pydocstyle Failures (~24 violations)

Common patterns:

  • D107: Missing docstring in __init__ methods (14 files)
  • D103: Missing docstring in public functions (6 files)
  • D417: Missing argument descriptions in docstrings (2 files)

Example violations:

opt/swarm_intelligence/glowworm_swarm_optimization.py:144 in private method `_compute_fitness`:
    D417: Missing argument descriptions in the docstring (argument(s) population are missing descriptions)
opt/swarm_intelligence/wild_horse.py:68 in public method `__init__`:
    D107: Missing docstring in __init__
opt/constrained/barrier_method.py:227 in public function `constraint`:
    D103: Missing docstring in public function
opt/multi_objective/spea2.py:430 in public function `f1`:
    D103: Missing docstring in public function

2. COCO/BBOB Compliance Failures (~600+ violations)

The custom pre-commit hook validate-optimizer-docs found widespread missing sections across 60-80 optimizer files. Each file typically has 5-7 violations.

Required sections missing:

  • Algorithm Metadata: (author, year, DOI)
  • COCO/BBOB Benchmark Settings: (recommended hyperparameters)
  • Args/Attributes: (missing or incomplete)
  • Notes: (complexity analysis)
  • References: (original paper citations)
  • seed=42 in examples (BBOB reproducibility requirement)
  • seed attribute documentation (BBOB compliance)

Example violations:

opt/swarm_intelligence/salp_swarm_algorithm.py: Missing required section 'Algorithm Metadata:' in class docstring
opt/swarm_intelligence/emperor_penguin.py: Example section should include 'seed=42' for reproducibility
opt/evolutionary/cultural_algorithm.py: Args section should document 'seed' parameter for BBOB compliance
opt/gradient_based/adadelta.py: Missing required section 'COCO/BBOB Benchmark Settings:' in class docstring

3. Excessive Ruff Global Ignore List (23+ codes)

Current state (pyproject.toml excerpt):

[tool.ruff.lint]
select = ["ALL"]
ignore = [
    "PLR0913",   # Too many arguments - should use config objects
    "PLR1704",   # Redefining argument - needs refactor
    "N803",      # Argument name not lowercase
    "N806",      # Variable name not lowercase
    "E741",      # Ambiguous variable name
    "E501",      # Line too long
    "T201",      # Print statements - used in example scripts
    "COM812",    # Missing trailing comma
    "NPY002",    # Legacy numpy random calls - used extensively
    "D107",      # Missing docstring in __init__
    "D103",      # Missing docstring in public function
    "PLR2004",   # Magic value comparisons - common in optimization
    "PLR0912",   # Too many branches
    "PLR0915",   # Too many statements
    "C901",      # Too complex
    "SIM109",    # Use in tuple comparison
    "SIM110",    # Use all() generator
    "PLR1714",   # Consider merging comparisons
    "PERF401",   # Use list comprehension
    "RET504",    # Unnecessary assignment before return
    "S112",      # try-except-continue
    "BLE001",    # Blind exception
    "B007",      # Loop control variable not used
    "B023",      # Function doesn't bind loop variable
    "PLC0415",   # Import not at top level
    "F841",      # Unused variable
]

Problems:

  1. D107 and D103 are globally ignored yet still failing pre-commit pydocstyle checks
  2. 23+ global exceptions mask code quality issues across 120 optimizer files
  3. No per-file justifications - unclear which exceptions are temporary vs. permanent
  4. Complexity codes (PLR0912, PLR0915, C901) are blanket-ignored instead of being documented per-algorithm

Goal: Reduce global ignore list by 50% within 2 sprints with measurable per-category targets.


Implementation Steps

Phase 1: Validation (10 Exemplar Files)

  1. Run batch script to generate FIXME templates
  2. For each file:
    • Research original paper for metadata (authors, year, DOI)
    • Document mathematical formulation with LaTeX equations
    • Add BBOB-recommended hyperparameters
    • Create working doctest with seed=42
    • Add complexity analysis
    • Document BBOB performance characteristics if literature available
  3. Verify all 10 files pass validation criteria
  4. Extract lessons learned and refine batch script

Phase 2: Systematic Remediation (All Files)

  1. Tier 1 Remediation (Weeks 1-2):

    • Fix all 24 files with pydocstyle + COCO/BBOB failures
    • Target: 100% of Tier 1 files compliant
  2. Tier 2 Remediation (Weeks 2-4):

    • Batch process 40-50 files with COCO/BBOB only failures
    • Use validated batch script approach from Phase 1
    • Target: 100% of Tier 2 files compliant
  3. Ruff Ignore Reduction (Weeks 2-4 - Parallel):

    • Remove D107, D103 from global ignores
    • Reduce PLR2004 (already migrated in 4 files, expand to 20 more)
    • Fix minor style issues (SIM109, SIM110, PLR1714, PERF401, RET504) across 15 files
    • Document permanent exceptions with per-algorithm justifications
    • Target: Global ignore list reduced from 23 → 12 codes (50% reduction)
  4. Automation & Prevention (Week 4):

    • Update batch docstring script with all COCO/BBOB sections
    • Add pre-commit hook examples to .github/copilot-instructions.md
    • Create docstring template checklist in .github/PULL_REQUEST_TEMPLATE.md

Acceptance Criteria

Phase 1: Validation Checkpoints

  • All 10 files completed with full COCO/BBOB compliance
  • All files pass uv run ruff check opt/
  • All doctests execute successfully with seeds logged
  • No FIXME markers remain in any file
  • Metadata validation passes:
for file in particle_swarm adamw simulated_annealing genetic_algorithm harmony_search ant_colony sgd_momentum nelder_mead differential_evolution gravitational_search; do
  uv run python -c "
import importlib, inspect
module = importlib.import_module('opt.${file%%.*}')
doc = inspect.getdoc(getattr(module, '$(echo $file | sed 's/_/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) substr($i,2)}1' | sed 's/ //g')'))
assert 'Algorithm Metadata' in doc
assert 'COCO/BBOB' in doc
assert 'seed' in doc.lower()
print(f'✅ ${file} validated')
"
done

Phase 2: Complete Coverage

  • pre-commit run -a passes on main branch (CRITICAL)
  • Zero pydocstyle D107/D103/D417 violations across all 120 optimizer files
  • Zero COCO/BBOB compliance violations across all 60-80 affected files
  • Global Ruff ignore list ≤ 12 codes (50% reduction from 23)
  • All remaining global ignores have documented justifications in pyproject.toml comments
  • Per-file ignores tracked in separate issue for phased removal
  • Batch docstring script updated and documented

Quality Metrics

  • 100% of Tier 1 files (24 files) fully compliant
  • 100% of Tier 2 files (40-50 files) fully compliant
  • 100% of validation files (10 files) serve as working reference implementations
  • Documentation coverage: 120/120 optimizer files with complete docstrings

Complexity

High - Requires systematic research + careful implementation across 60-80 diverse algorithms spanning 10 categories. Estimated 4-6 weeks for complete remediation.


Dependencies

Depends on: #3 (batch script generates initial templates)
Blocks: #5-#14 (validates approach before scaling to all categories), #96 (documentation dependency migration - needs clean CI first)
Related: #52 (documentation quality tracking), #83 (CI/documentation improvements)


Context: Manually Deleted Commits

Three commits were proposed but manually deleted by maintainer:

  1. scripts/__init__.py + check_google_docstring_inline_descriptions.py changes
  2. docs/package-lock.json + .npmrc changes (legacy-peer-deps workaround)

Content from deleted commits preserved in issue comments for reference.

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions