You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Better support for LLM feedback and handling of LLM ensembles.
- config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting
- ensemble.py supports n model ensembles
- OpenAILLM supports individual parameter config per model
- ensemble.py has a new generate_all_with_context() function
- evaluator.py uses prompt sampler to generate llm feedback prompts
- templates.py contains default prompts for llm feedback
With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
`
{
"readability": 0.92,
"maintainability": 0.88,
"efficiency": 0.82,
"reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes."
}
`
The evolution can then consider the additional values:
`
Evolution complete!
Best program metrics:
runs_successfully: 1.0000
value_score: 0.9997
distance_score: 0.9991
overall_score: 0.9905
standard_deviation_score: 0.9992
speed_score: 0.0610
reliability_score: 1.0000
combined_score: 0.9525
success_rate: 1.0000
llm_readability: 0.0904
llm_maintainability: 0.0816
llm_efficiency: 0.0764
Note: I did not evaluate the results yet.
0 commit comments