Skip to content

Conversation

@jvm123
Copy link
Contributor

@jvm123 jvm123 commented Jun 2, 2025

Better support for LLM feedback and handling of LLM ensembles with an arbitrary number of models.

  • config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting
  • ensemble.py supports n-model ensembles
  • OpenAILLM supports individual parameter config per model
  • ensemble.py has a new generate_all_with_context() function
  • evaluator.py uses prompt sampler to generate llm feedback prompts
  • templates.py contains default prompts for llm feedback
  • New unit test confirms that all config.yaml files in the project load without causing an exception

With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
{ "readability": 0.92, "maintainability": 0.88, "efficiency": 0.82, "reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes." }
The evolution can then consider the additional values:
Evolution complete! Best program metrics: runs_successfully: 1.0000 value_score: 0.9997 distance_score: 0.9991 overall_score: 0.9905 standard_deviation_score: 0.9992 speed_score: 0.0610 reliability_score: 1.0000 combined_score: 0.9525 success_rate: 1.0000 llm_readability: 0.0904 llm_maintainability: 0.0816 llm_efficiency: 0.0764

Note: I did not evaluate the results yet.

@CLAassistant
Copy link

CLAassistant commented Jun 2, 2025

CLA assistant check
All committers have signed the CLA.

- config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting
- ensemble.py supports n model ensembles
- OpenAILLM supports individual parameter config per model
- ensemble.py has a new generate_all_with_context() function
- evaluator.py uses prompt sampler to generate llm feedback prompts
- templates.py contains default prompts for llm feedback

With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
`
{
    "readability": 0.92,
    "maintainability": 0.88,
    "efficiency": 0.82,
    "reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes."
}
`
The evolution can then consider the additional values:
`
Evolution complete!
Best program metrics:
  runs_successfully: 1.0000
  value_score: 0.9997
  distance_score: 0.9991
  overall_score: 0.9905
  standard_deviation_score: 0.9992
  speed_score: 0.0610
  reliability_score: 1.0000
  combined_score: 0.9525
  success_rate: 1.0000
  llm_readability: 0.0904
  llm_maintainability: 0.0816
  llm_efficiency: 0.0764

Note: I did not evaluate the results yet.
@jvm123 jvm123 force-pushed the feat-n-model-ensemble branch from 98f3b21 to f84be60 Compare June 2, 2025 21:58
@jvm123
Copy link
Contributor Author

jvm123 commented Jun 2, 2025

This resolves issue #41 "use_llm_feedback doesn't work"

@codelion
Copy link
Member

codelion commented Jun 4, 2025

I had to fix an urgent issue with formatting of floats that broke the examples. Can you please pull from main. I can take a look at this PR. I would need sometime to test the changes.

@codelion codelion merged commit 3fc9465 into algorithmicsuperintelligence:main Jun 4, 2025
3 checks passed
@jvm123 jvm123 deleted the feat-n-model-ensemble branch June 5, 2025 02:34
def __post_init__(self):
"""Post-initialization to set up model configurations"""
# Handle backward compatibility for primary_model(_weight) and secondary_model(_weight).
if (self.primary_model or self.primary_model_weight) and len(self.models) < 1:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primary_model and these parameters has a default value

so it will always hit this branch

the new way of configuring models is not working

@jvm123
Copy link
Contributor Author

jvm123 commented Jun 6, 2025

Thanks for the review, @Weaverzhu . Can you confirm whether PR #56 fixes it?

@Weaverzhu
Copy link

Thanks for the review, @Weaverzhu . Can you confirm whether PR #56 fixes it?

I think it will fix the problem. Hope this fix will apply soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants