OptoPrimeMulti optimizer allow to have multiple candidates generation and selection per step #20

doxav · 2024-11-28T16:47:13Z

OptoPrime demonstrates excellent efficiency in leveraging enriched feedback for effective exploitation. However, introducing exploration capabilities could help mitigate issues such as plateaus and local optima.

This PR does not implement the full STOP algorithm but represents a first step toward exploration optimization. The generate_candidates and select_candidates functions are designed to be easily overridden or redefined for flexibility.

Xavier

PS:
. I considered merging call_llm and generate_candidate for simplicity but decided against it to retain greater flexibility.
. My current research focuses on collaborative LLM-human-in-the-loop feedback mechanisms, which I believe could be beneficial in both the step and backward parts of this optimization framework.

…r iterations per step to allow to derive different strategies for candidate generation and selection

allenanie · 2024-12-01T06:44:25Z

@doxav it seems like you need to respond verbatim @microsoft-github-policy-service agree (with the "@" sign)

doxav · 2024-12-01T12:16:28Z

@microsoft-github-policy-service agree

chinganc · 2024-12-02T18:29:24Z

@doxav Thanks for contributing. We will start reviewing this soon.

doxav · 2024-12-06T15:40:12Z

@microsoft-github-policy-service agree

…ode to evaluate the implementation.

allenanie · 2024-12-24T02:22:06Z

I ran the notebook -- the implementation LGTM. I modified the API design a bit to allow configurations to be passed through optimizer construction. Also updated the evaluation script -- running the optimizer on MMLU-Physics, MMLU-ML, and GPQA.

allenanie · 2024-12-24T02:50:06Z

I ran a quick evaluation on MMLU ML and Physics:

MMLU-Physics:
[0.9411764705882353, 0.9411764705882353, 0.9509803921568627, 0.9509803921568627, 0.9411764705882353]

MMLU-ML:
[0.8660714285714286, 0.8839285714285714, 0.8928571428571429, 0.875, 0.8660714285714286]

Looking at the implementation, if selector=None, OptoPrimeMulti will just pick a high-temperature generation because:

    def select_candidate(self, candidates: List[Dict]) -> Dict:  # Fixed type annotation
        """
        Select the best response based on the responses.
        Args:
            candidates (List[Dict]): List of candidate responses as dictionaries.
        Returns:
            Dict: The selected candidate or an empty dictionary if no candidates exist.
        """
        return candidates[-1] if candidates else {}  # Default to the last candidate

The issue with the design of selector is that -- it's selecting between multiple candidates for each parameter update -- it seems a bit harder to design external validation / external verification for the parameter value.

doxav · 2024-12-28T12:48:37Z

Select candidate will take by default the last candidate which is the lowest temperature (0) because generate_candidates starts by highgest temp and ends by lowest (0).

I put a dumb method for proof of concept but can be overriden or enriched by:

a scoring/ranking method if a scoring method is given for the task or we could use by default a LLM to select best.
an ensembling strategy like self-consistency: "Identify commonalities or alignments among those X answers to generate a more robust and consistent fifth answer.".
similarly to STOP: test & override progressively the generate_candidates and select_candidate strategy. This would require much more trials, and performance might be much lower at beginning and uncertain (except if it is activated when progress is at a plateau).

On the other hand, generate_candidates is by default based on different temperature variation/creativity on the same model, it could be replaced or enriched by using different models or roles.

Tell me what you think and prefer.

chinganc · 2024-12-30T21:20:15Z

Sorry for the long wait. Coming back from holidays and starting the review. @doxav Here're some comments. Like to hear what you want to build first before I start modifying it.

What does using _default_value_rewrite try to achieve? In the current implementation, I think it won't have any effects. It will always just return the value of the input arguments that is either input manually or set default in the method's signature. I guess perhaps you want to achieve the effects of setting the default value at __init__? In this case, perhaps you need to set the default value at the method's signature to e.g. None (or some unused values) and detect those.
There're several usages of temperature_range: List[float] = [1.3, 0.] to set the default value in method's signature. This can lead to buggy code as the list can be modified in-place. Let's set it to None in the signature and have another if condition to set it to [1.3, 0.] if None is seen.
If by default when selector is not provided, you want to return the answer with the lowest temperature, why not also call generate_candidates differently to generate one response only to save compute as well?
@allenanie The selector API makes sense to me. Perhaps we can write a selector class and have some common selectors implemented.

allenanie · 2024-12-31T06:16:14Z

@chinganc

I added _default_value_rewrite because I wanted to provide an option to set configuration through __init__ but maintain the flexibility of changing these configurations during the function call. I just realized this code does not achieve what I wanted. Let me do a change on this...
temperature_range I remember if I did this or it was in the original code...
I'll leave this to @doxav
Sounds good!

allenanie · 2024-12-31T06:33:38Z

@chinganc

I resolved 1 and 2 -- the function signature / __init__ should allow the default overwriting behavior I wanted (by setting function signature default to be None, detect and then overwrite).

doxav · 2025-01-02T21:39:10Z

(3.) Yes, totally agree with your proposition "If by default when selector is not provided, you want to return the answer with the lowest temperature, why not also call generate_candidates differently to generate one response only to save compute as well?"

allenanie

Merging -- the code is runnable and the notebook is runnable as well.

a subclassed optimizer from OptoPrime to allow multiple candidates pe…

80faef5

…r iterations per step to allow to derive different strategies for candidate generation and selection

chinganc requested review from adith387, allenanie and chinganc December 2, 2024 18:27

chinganc self-assigned this Dec 2, 2024

expose opto-prime multi's config API to the class construction. Add c…

6e8ff95

…ode to evaluate the implementation.

allenanie self-assigned this Dec 24, 2024

chinganc added the enhancement New feature or request label Dec 30, 2024

updated the __init__ and function signature

16fdfee

chinganc mentioned this pull request Jan 16, 2025

Update PR-20 - add improved candidate generation techniques and selector can take best of bread if not self refine is used for text generation #33

Closed

allenanie reviewed Jan 23, 2025

View reviewed changes

allenanie merged commit b83155c into microsoft:main Jan 23, 2025
1 of 2 checks passed

OptoPrimeMulti optimizer allow to have multiple candidates generation and selection per step #20

OptoPrimeMulti optimizer allow to have multiple candidates generation and selection per step #20

Conversation

doxav commented Nov 28, 2024

Uh oh!

allenanie commented Dec 1, 2024

Uh oh!

doxav commented Dec 1, 2024

Uh oh!

chinganc commented Dec 2, 2024

Uh oh!

doxav commented Dec 6, 2024

Uh oh!

allenanie commented Dec 24, 2024

Uh oh!

allenanie commented Dec 24, 2024

Uh oh!

doxav commented Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chinganc commented Dec 30, 2024

Uh oh!

allenanie commented Dec 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Dec 31, 2024

Uh oh!

doxav commented Jan 2, 2025

Uh oh!

allenanie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

doxav commented Dec 28, 2024 •

edited

Loading

allenanie commented Dec 31, 2024 •

edited

Loading