-
Notifications
You must be signed in to change notification settings - Fork 56
OptoPrimeMulti optimizer allow to have multiple candidates generation and selection per step #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r iterations per step to allow to derive different strategies for candidate generation and selection
|
@doxav it seems like you need to respond verbatim |
|
@microsoft-github-policy-service agree |
|
@doxav Thanks for contributing. We will start reviewing this soon. |
|
@microsoft-github-policy-service agree |
…ode to evaluate the implementation.
|
I ran the notebook -- the implementation LGTM. I modified the API design a bit to allow configurations to be passed through optimizer construction. Also updated the evaluation script -- running the optimizer on MMLU-Physics, MMLU-ML, and GPQA. |
|
I ran a quick evaluation on MMLU ML and Physics: MMLU-Physics: MMLU-ML: Looking at the implementation, if def select_candidate(self, candidates: List[Dict]) -> Dict: # Fixed type annotation
"""
Select the best response based on the responses.
Args:
candidates (List[Dict]): List of candidate responses as dictionaries.
Returns:
Dict: The selected candidate or an empty dictionary if no candidates exist.
"""
return candidates[-1] if candidates else {} # Default to the last candidateThe issue with the design of |
|
Select candidate will take by default the last candidate which is the lowest temperature (0) because generate_candidates starts by highgest temp and ends by lowest (0). I put a dumb method for proof of concept but can be overriden or enriched by:
On the other hand, generate_candidates is by default based on different temperature variation/creativity on the same model, it could be replaced or enriched by using different models or roles. Tell me what you think and prefer. |
|
Sorry for the long wait. Coming back from holidays and starting the review. @doxav Here're some comments. Like to hear what you want to build first before I start modifying it.
|
|
|
I resolved 1 and 2 -- the function signature / |
|
(3.) Yes, totally agree with your proposition "If by default when selector is not provided, you want to return the answer with the lowest temperature, why not also call generate_candidates differently to generate one response only to save compute as well?" |
allenanie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging -- the code is runnable and the notebook is runnable as well.
OptoPrime demonstrates excellent efficiency in leveraging enriched feedback for effective exploitation. However, introducing exploration capabilities could help mitigate issues such as plateaus and local optima.
This PR does not implement the full STOP algorithm but represents a first step toward exploration optimization. The
generate_candidatesandselect_candidatesfunctions are designed to be easily overridden or redefined for flexibility.Xavier
PS:
. I considered merging
call_llmandgenerate_candidatefor simplicity but decided against it to retain greater flexibility.. My current research focuses on collaborative LLM-human-in-the-loop feedback mechanisms, which I believe could be beneficial in both the step and backward parts of this optimization framework.