Add comprehensive benchmark tests with quality validation#33
Conversation
- Add conftest.py with 17 benchmark functions and known optima - Add test_benchmarks.py with quality tolerance tests for all optimizer categories - Add test_performance.py with regression baselines and critical path tests Test improvements: - Validate solutions against known optimal points (shifted_ackley at [1.0, 0.5]) - Flag solutions deviating from optimum as critical failures (distance > 0.2) - Add xfail markers for optimizers prone to local minima on multimodal functions - Include Himmelblau multi-optima handling - Add reproducibility and bounds checking tests 356 test cases covering optimizer quality validation"
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive benchmark tests that validate optimizer solutions against known optimal points, ensuring optimizers find correct solutions rather than just executing without errors. The tests include quality validation, performance regression detection, and critical path testing for the optimization library.
Key Changes
- Introduces quality tolerance tests validating solutions against known optima (e.g., shifted_ackley optimal at [1.0, 0.5])
- Adds performance regression baselines for tracking optimizer behavior changes
- Implements xfail markers for known optimizer limitations (BFGS, LBFGS, NelderMead, GreyWolfOptimizer on multimodal functions)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
opt/test/conftest.py |
Adds pytest fixtures for 17 benchmark functions with known optima and helper utilities for quality assessment |
opt/test/test_benchmarks.py |
Implements quality tolerance tests across optimizer categories with varying difficulty levels |
opt/test/test_performance.py |
Adds performance regression baselines and critical path tests for shifted_ackley, sphere, and rosenbrock functions |
.github/workflows/python-publish.yaml |
Updates sigstore action from v3.1.0 to v3.2.0 for package signing |
| Attributes: | ||
| optimizer_class: The optimizer class. | ||
| function_name: Name of the benchmark function. | ||
| expected_fitness_upper: Upper bound on expected fitness (worse case). |
There was a problem hiding this comment.
Typo in the docstring: "worse case" should be "worst case".
| expected_fitness_upper: Upper bound on expected fitness (worse case). | |
| expected_fitness_upper: Upper bound on expected fitness (worst case). |
| mean_fitness = np.mean(results) | ||
| std_fitness = np.std(results) | ||
| best_fitness = min(results) | ||
| max(results) |
There was a problem hiding this comment.
The variable worst_fitness is assigned but never used. The result of max(results) is computed but not assigned to anything. Consider removing this line or assigning it to worst_fitness if you intend to use it for future assertions.
| max(results) |
| dim=2, | ||
| max_iter=200, | ||
| ) | ||
| solution, _fitness = optimizer.search() |
There was a problem hiding this comment.
The fitness variable is assigned but never used in these critical tests. Consider adding an assertion to check the fitness value is within acceptable bounds, similar to the pattern used in test_particle_swarm_critical (line 309), or use an underscore prefix to indicate it's intentionally unused.
| dim=2, | ||
| max_iter=300, | ||
| ) | ||
| solution, _fitness = optimizer.search() |
There was a problem hiding this comment.
The fitness variable is assigned but never used in this critical test. Consider adding an assertion to check the fitness value is within acceptable bounds, or use an underscore prefix to indicate it's intentionally unused.
| solution, _fitness = optimizer.search() | ||
|
|
||
| distance = np.linalg.norm(solution - self.OPTIMAL_POINT) |
There was a problem hiding this comment.
The fitness variable is assigned but never used in this test. Consider adding an assertion to check the fitness value or use an underscore prefix to indicate it's intentionally unused.
| solution, _fitness = optimizer.search() | |
| distance = np.linalg.norm(solution - self.OPTIMAL_POINT) | |
| solution, fitness = optimizer.search() | |
| distance = np.linalg.norm(solution - self.OPTIMAL_POINT) | |
| assert np.isfinite(fitness), ( | |
| f"{optimizer_class.__name__} returned non-finite fitness {fitness} " | |
| "on shifted_ackley for medium-performance benchmark." | |
| ) |
|
|
||
| OPTIMAL_POINT = np.array([1.0, 0.5]) | ||
| CRITICAL_TOLERANCE = 0.2 # Distance > 0.2 is a critical failure | ||
| WARNING_TOLERANCE = 0.1 # Distance > 0.1 but <= 0.2 is a warning |
There was a problem hiding this comment.
The class attribute WARNING_TOLERANCE is defined but never used in any of the test methods. Consider removing this constant or implementing warning-level assertions if it was intended to provide additional validation between the critical tolerance and warning tolerance thresholds.
| WARNING_TOLERANCE = 0.1 # Distance > 0.1 but <= 0.2 is a warning |
|
|
||
| OPTIMAL_POINT = np.array([0.0, 0.0]) | ||
| TIGHT_TOLERANCE = 0.1 | ||
| RELAXED_TOLERANCE = 0.5 |
There was a problem hiding this comment.
The class attribute RELAXED_TOLERANCE is defined but never used in the test method. Consider removing this constant or adding a test method for medium/lower performance optimizers that uses this relaxed tolerance threshold.
| RELAXED_TOLERANCE = 0.5 |
| VARIABLE_PERFORMANCE_OPTIMIZERS = [ | ||
| ArtificialFishSwarm, | ||
| CatSwarmOptimization, | ||
| GlowwormSwarmOptimization, | ||
| SquirrelSearchAlgorithm, | ||
| CollidingBodiesOptimization, | ||
| EagleStrategy, | ||
| CulturalAlgorithm, | ||
| EstimationOfDistributionAlgorithm, | ||
| ImperialistCompetitiveAlgorithm, | ||
| ParticleFilter, | ||
| ShuffledFrogLeapingAlgorithm, | ||
| StochasticDiffusionSearch, | ||
| StochasticFractalSearch, | ||
| VariableDepthSearch, | ||
| VariableNeighborhoodSearch, | ||
| VeryLargeScaleNeighborhood, | ||
| ] | ||
|
|
||
| # Gradient-based optimizers (may converge to local optima) | ||
| GRADIENT_OPTIMIZERS = [ | ||
| AdaDelta, | ||
| ADAGrad, | ||
| AdaMax, | ||
| AdamW, | ||
| ADAMOptimization, | ||
| AMSGrad, | ||
| Nadam, | ||
| NesterovAcceleratedGradient, | ||
| RMSprop, | ||
| SGD, | ||
| SGDMomentum, | ||
| ConjugateGradient, | ||
| TrustRegion, | ||
| HillClimbing, | ||
| TabuSearch, | ||
| ] | ||
|
|
||
| # Constrained/Probabilistic optimizers | ||
| SPECIALIZED_OPTIMIZERS = [ | ||
| AugmentedLagrangian, | ||
| SuccessiveLinearProgramming, | ||
| LDAnalysis, | ||
| ParzenTreeEstimator, | ||
| ] |
There was a problem hiding this comment.
The optimizer category lists VARIABLE_PERFORMANCE_OPTIMIZERS, GRADIENT_OPTIMIZERS, and SPECIALIZED_OPTIMIZERS are defined but not used in any test methods. Consider either removing these unused constants or adding test methods that utilize them. If they are intended for future use, add a comment indicating this.
| _solution, fitness = optimizer.search() | ||
|
|
||
| assert fitness <= baseline.expected_fitness_upper, ( | ||
| f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: " | ||
| f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}" | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
The solution variable is assigned but never used in this test. Either remove the underscore prefix from _solution and add an assertion about the solution's distance from the optimum (similar to test_sphere_regression), or remove the variable assignment entirely if only fitness checking is needed.
| _solution, fitness = optimizer.search() | |
| assert fitness <= baseline.expected_fitness_upper, ( | |
| f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: " | |
| f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}" | |
| ) | |
| solution, fitness = optimizer.search() | |
| assert fitness <= baseline.expected_fitness_upper, ( | |
| f"REGRESSION: {baseline.optimizer_class.__name__} on rosenbrock: " | |
| f"fitness {fitness:.4f} > {baseline.expected_fitness_upper:.4f}" | |
| ) | |
| distance = np.linalg.norm(solution - OPTIMAL_POINTS["rosenbrock"]) | |
| assert distance <= baseline.max_distance_from_optimum, ( | |
| f"REGRESSION: {baseline.optimizer_class.__name__} distance {distance:.4f} " | |
| f"exceeds {baseline.max_distance_from_optimum:.4f}" | |
| ) |
| n_bats=30, | ||
| max_iter=200, | ||
| ) | ||
| solution, _fitness = optimizer.search() |
There was a problem hiding this comment.
The fitness variable is assigned but never used in this test. Consider adding an assertion to check the fitness value or use an underscore prefix to indicate it's intentionally unused.
Summary
This PR adds comprehensive benchmark tests that validate optimizer solutions against known optimal points, ensuring optimizers actually find correct solutions rather than just running without errors.
Changes
New Test Files
conftest.py: Pytest fixtures with 17 benchmark functions and their known optimatest_benchmarks.py: Quality tolerance tests for all optimizer categoriestest_performance.py: Performance regression baselines and critical path testsKey Features
Solution Quality Validation
shifted_ackleyoptimal at[1.0, 0.5])(1.2, 0.7)is flagged as failure forshifted_ackley)xfail Markers for Known Issues
The following optimizers are marked with
@pytest.mark.xfailas they converge to local minima on multimodal functions likeshifted_ackley:BFGS- converges to local minimumLBFGS- converges to local minimumNelderMead- converges to local minimumGreyWolfOptimizer- convergence issuesThese tests still run but don't fail the suite, allowing tracking of potential improvements.
Test Coverage
shifted_ackleybenchmark testsTest Results
The 4 xfailed tests are the known local-minima-prone optimizers which are documented above.
Benchmark Functions with Known Optima
[0, 0][1.0, 0.5][1, 1]Usage