Add comprehensive benchmark tests with quality validation by Anselmoo · Pull Request #33 · Anselmoo/useful-optimizer

Anselmoo · 2025-12-21T08:02:51Z

Summary

This PR adds comprehensive benchmark tests that validate optimizer solutions against known optimal points, ensuring optimizers actually find correct solutions rather than just running without errors.

Changes

New Test Files

conftest.py: Pytest fixtures with 17 benchmark functions and their known optima
test_benchmarks.py: Quality tolerance tests for all optimizer categories
test_performance.py: Performance regression baselines and critical path tests

Key Features

Solution Quality Validation

Tests validate solutions against known optimal points (e.g., shifted_ackley optimal at [1.0, 0.5])
Critical failures flagged when solution distance exceeds tolerance (e.g., (1.2, 0.7) is flagged as failure for shifted_ackley)
Different tolerance levels for easy/medium/hard benchmark functions

xfail Markers for Known Issues
The following optimizers are marked with @pytest.mark.xfail as they converge to local minima on multimodal functions like shifted_ackley:

BFGS - converges to local minimum
LBFGS - converges to local minimum
NelderMead - converges to local minimum
GreyWolfOptimizer - convergence issues

These tests still run but don't fail the suite, allowing tracking of potential improvements.

Test Coverage

356 test cases total
Critical shifted_ackley benchmark tests
Sphere, rosenbrock, and other benchmark functions
Performance regression baselines
Statistical consistency tests
Reproducibility and bounds checking

Test Results

15 passed, 4 xfailed in 10.25s

The 4 xfailed tests are the known local-minima-prone optimizers which are documented above.

Benchmark Functions with Known Optima

Function	Optimal Point	Tolerance	Difficulty
sphere	`[0, 0]`	0.1	easy
shifted_ackley	`[1.0, 0.5]`	0.2	medium
rosenbrock	`[1, 1]`	0.5	hard
himmelblau	4 optima	0.5	medium
...	...	...	...

Usage

# Run all benchmark tests
uv run pytest opt/test/test_benchmarks.py -v

# Run only critical shifted_ackley tests
uv run pytest opt/test/test_benchmarks.py::TestShiftedAckleyBenchmark -v

# Skip xfail tests
uv run pytest opt/test/test_benchmarks.py --ignore-glob="*xfail*"

- Add conftest.py with 17 benchmark functions and known optima - Add test_benchmarks.py with quality tolerance tests for all optimizer categories - Add test_performance.py with regression baselines and critical path tests Test improvements: - Validate solutions against known optimal points (shifted_ackley at [1.0, 0.5]) - Flag solutions deviating from optimum as critical failures (distance > 0.2) - Add xfail markers for optimizers prone to local minima on multimodal functions - Include Himmelblau multi-optima handling - Add reproducibility and bounds checking tests 356 test cases covering optimizer quality validation"

…nal commits

Copilot

Pull request overview

This PR adds comprehensive benchmark tests that validate optimizer solutions against known optimal points, ensuring optimizers find correct solutions rather than just executing without errors. The tests include quality validation, performance regression detection, and critical path testing for the optimization library.

Key Changes

Introduces quality tolerance tests validating solutions against known optima (e.g., shifted_ackley optimal at [1.0, 0.5])
Adds performance regression baselines for tracking optimizer behavior changes
Implements xfail markers for known optimizer limitations (BFGS, LBFGS, NelderMead, GreyWolfOptimizer on multimodal functions)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

File	Description
`opt/test/conftest.py`	Adds pytest fixtures for 17 benchmark functions with known optima and helper utilities for quality assessment
`opt/test/test_benchmarks.py`	Implements quality tolerance tests across optimizer categories with varying difficulty levels
`opt/test/test_performance.py`	Adds performance regression baselines and critical path tests for shifted_ackley, sphere, and rosenbrock functions
`.github/workflows/python-publish.yaml`	Updates sigstore action from v3.1.0 to v3.2.0 for package signing

Copilot · 2025-12-21T08:07:39Z

opt/test/test_performance.py

+    Attributes:
+        optimizer_class: The optimizer class.
+        function_name: Name of the benchmark function.
+        expected_fitness_upper: Upper bound on expected fitness (worse case).


Typo in the docstring: "worse case" should be "worst case".

Suggested change

expected_fitness_upper: Upper bound on expected fitness (worse case).

expected_fitness_upper: Upper bound on expected fitness (worst case).

Copilot · 2025-12-21T08:07:40Z

opt/test/test_performance.py

+        mean_fitness = np.mean(results)
+        std_fitness = np.std(results)
+        best_fitness = min(results)
+        max(results)


The variable worst_fitness is assigned but never used. The result of max(results) is computed but not assigned to anything. Consider removing this line or assigning it to worst_fitness if you intend to use it for future assertions.

Suggested change

max(results)

Copilot · 2025-12-21T08:07:40Z

opt/test/test_performance.py

+            dim=2,
+            max_iter=200,
+        )
+        solution, _fitness = optimizer.search()


The fitness variable is assigned but never used in these critical tests. Consider adding an assertion to check the fitness value is within acceptable bounds, similar to the pattern used in test_particle_swarm_critical (line 309), or use an underscore prefix to indicate it's intentionally unused.

Copilot · 2025-12-21T08:07:40Z

opt/test/test_performance.py

+            dim=2,
+            max_iter=300,
+        )
+        solution, _fitness = optimizer.search()


The fitness variable is assigned but never used in this critical test. Consider adding an assertion to check the fitness value is within acceptable bounds, or use an underscore prefix to indicate it's intentionally unused.

Copilot · 2025-12-21T08:07:40Z

opt/test/test_benchmarks.py

+        solution, _fitness = optimizer.search()
+
+        distance = np.linalg.norm(solution - self.OPTIMAL_POINT)


The fitness variable is assigned but never used in this test. Consider adding an assertion to check the fitness value or use an underscore prefix to indicate it's intentionally unused.

Suggested change

solution, _fitness = optimizer.search()

distance = np.linalg.norm(solution - self.OPTIMAL_POINT)

solution, fitness = optimizer.search()

distance = np.linalg.norm(solution - self.OPTIMAL_POINT)

assert np.isfinite(fitness), (

f"{optimizer_class.__name__} returned non-finite fitness {fitness} "

"on shifted_ackley for medium-performance benchmark."

)

Copilot · 2025-12-21T08:07:40Z

opt/test/test_benchmarks.py

+
+    OPTIMAL_POINT = np.array([1.0, 0.5])
+    CRITICAL_TOLERANCE = 0.2  # Distance > 0.2 is a critical failure
+    WARNING_TOLERANCE = 0.1  # Distance > 0.1 but <= 0.2 is a warning


The class attribute WARNING_TOLERANCE is defined but never used in any of the test methods. Consider removing this constant or implementing warning-level assertions if it was intended to provide additional validation between the critical tolerance and warning tolerance thresholds.

Suggested change

WARNING_TOLERANCE = 0.1 # Distance > 0.1 but <= 0.2 is a warning

Copilot · 2025-12-21T08:07:41Z

opt/test/test_benchmarks.py

+
+    OPTIMAL_POINT = np.array([0.0, 0.0])
+    TIGHT_TOLERANCE = 0.1
+    RELAXED_TOLERANCE = 0.5


The class attribute RELAXED_TOLERANCE is defined but never used in the test method. Consider removing this constant or adding a test method for medium/lower performance optimizers that uses this relaxed tolerance threshold.

Suggested change

RELAXED_TOLERANCE = 0.5

Copilot · 2025-12-21T08:07:41Z

opt/test/test_benchmarks.py

+VARIABLE_PERFORMANCE_OPTIMIZERS = [
+    ArtificialFishSwarm,
+    CatSwarmOptimization,
+    GlowwormSwarmOptimization,
+    SquirrelSearchAlgorithm,
+    CollidingBodiesOptimization,
+    EagleStrategy,
+    CulturalAlgorithm,
+    EstimationOfDistributionAlgorithm,
+    ImperialistCompetitiveAlgorithm,
+    ParticleFilter,
+    ShuffledFrogLeapingAlgorithm,
+    StochasticDiffusionSearch,
+    StochasticFractalSearch,
+    VariableDepthSearch,
+    VariableNeighborhoodSearch,
+    VeryLargeScaleNeighborhood,
+]
+
+# Gradient-based optimizers (may converge to local optima)
+GRADIENT_OPTIMIZERS = [
+    AdaDelta,
+    ADAGrad,
+    AdaMax,
+    AdamW,
+    ADAMOptimization,
+    AMSGrad,
+    Nadam,
+    NesterovAcceleratedGradient,
+    RMSprop,
+    SGD,
+    SGDMomentum,
+    ConjugateGradient,
+    TrustRegion,
+    HillClimbing,
+    TabuSearch,
+]
+
+# Constrained/Probabilistic optimizers
+SPECIALIZED_OPTIMIZERS = [
+    AugmentedLagrangian,
+    SuccessiveLinearProgramming,
+    LDAnalysis,
+    ParzenTreeEstimator,
+]


The optimizer category lists VARIABLE_PERFORMANCE_OPTIMIZERS, GRADIENT_OPTIMIZERS, and SPECIALIZED_OPTIMIZERS are defined but not used in any test methods. Consider either removing these unused constants or adding test methods that utilize them. If they are intended for future use, add a comment indicating this.

Copilot · 2025-12-21T08:07:41Z

opt/test/test_performance.py

+        _solution, fitness = optimizer.search()
+
+        assert fitness <= baseline.expected_fitness_upper, (
+            f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: "
+            f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}"
+        )
+
+


The solution variable is assigned but never used in this test. Either remove the underscore prefix from _solution and add an assertion about the solution's distance from the optimum (similar to test_sphere_regression), or remove the variable assignment entirely if only fitness checking is needed.

Suggested change

_solution, fitness = optimizer.search()

assert fitness <= baseline.expected_fitness_upper, (

f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: "

f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}"

)

solution, fitness = optimizer.search()

assert fitness <= baseline.expected_fitness_upper, (

f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: "

f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}"

)

distance = np.linalg.norm(solution - OPTIMAL_POINTS["rosenbrock"])

assert distance <= baseline.max_distance_from_optimum, (

f"REGRESSION: {baseline.optimizer_class.__name__} distance {distance:.4f} "

f"exceeds {baseline.max_distance_from_optimum:.4f}"

)

Copilot · 2025-12-21T08:07:41Z

opt/test/test_benchmarks.py

+            n_bats=30,
+            max_iter=200,
+        )
+        solution, _fitness = optimizer.search()


The fitness variable is assigned but never used in this test. Consider adding an assertion to check the fitness value or use an underscore prefix to indicate it's intentionally unused.

Copilot AI review requested due to automatic review settings December 21, 2025 08:02

pull-request-size bot added the size/XXL label Dec 21, 2025

Copilot started reviewing on behalf of Anselmoo December 21, 2025 08:03 View session

docs: update copilot-instructions with Fish shell rules and conventio…

aebc987

…nal commits

Anselmoo merged commit ae3aa00 into main Dec 21, 2025
2 of 5 checks passed

Copilot AI reviewed Dec 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add comprehensive benchmark tests with quality validation#33

Add comprehensive benchmark tests with quality validation#33
Anselmoo merged 2 commits intomainfrom
feature/comprehensive-benchmark-tests

Anselmoo commented Dec 21, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Copilot AI Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	expected_fitness_upper: Upper bound on expected fitness (worse case).
	expected_fitness_upper: Upper bound on expected fitness (worst case).

		solution, _fitness = optimizer.search()

		distance = np.linalg.norm(solution - self.OPTIMAL_POINT)

Comments

Conversation

Anselmoo commented Dec 21, 2025

Summary

Changes

New Test Files

Key Features

Test Results

Benchmark Functions with Known Optima

Usage

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant