Skip to content

Conversation

@doxav
Copy link
Contributor

@doxav doxav commented Nov 28, 2024

OptoPrime demonstrates excellent efficiency in leveraging enriched feedback for effective exploitation. However, introducing exploration capabilities could help mitigate issues such as plateaus and local optima.

This PR does not implement the full STOP algorithm but represents a first step toward exploration optimization. The generate_candidates and select_candidates functions are designed to be easily overridden or redefined for flexibility.

Xavier

PS:
. I considered merging call_llm and generate_candidate for simplicity but decided against it to retain greater flexibility.
. My current research focuses on collaborative LLM-human-in-the-loop feedback mechanisms, which I believe could be beneficial in both the step and backward parts of this optimization framework.

…r iterations per step to allow to derive different strategies for candidate generation and selection
@allenanie
Copy link
Collaborator

@doxav it seems like you need to respond verbatim @microsoft-github-policy-service agree (with the "@" sign)

@doxav
Copy link
Contributor Author

doxav commented Dec 1, 2024

@microsoft-github-policy-service agree

@chinganc chinganc self-assigned this Dec 2, 2024
@chinganc
Copy link
Collaborator

chinganc commented Dec 2, 2024

@doxav Thanks for contributing. We will start reviewing this soon.

@doxav
Copy link
Contributor Author

doxav commented Dec 6, 2024

@microsoft-github-policy-service agree

@allenanie allenanie self-assigned this Dec 24, 2024
@allenanie
Copy link
Collaborator

I ran the notebook -- the implementation LGTM. I modified the API design a bit to allow configurations to be passed through optimizer construction. Also updated the evaluation script -- running the optimizer on MMLU-Physics, MMLU-ML, and GPQA.

@allenanie
Copy link
Collaborator

I ran a quick evaluation on MMLU ML and Physics:

MMLU-Physics:
[0.9411764705882353, 0.9411764705882353, 0.9509803921568627, 0.9509803921568627, 0.9411764705882353]

MMLU-ML:
[0.8660714285714286, 0.8839285714285714, 0.8928571428571429, 0.875, 0.8660714285714286]

Looking at the implementation, if selector=None, OptoPrimeMulti will just pick a high-temperature generation because:

    def select_candidate(self, candidates: List[Dict]) -> Dict:  # Fixed type annotation
        """
        Select the best response based on the responses.
        Args:
            candidates (List[Dict]): List of candidate responses as dictionaries.
        Returns:
            Dict: The selected candidate or an empty dictionary if no candidates exist.
        """
        return candidates[-1] if candidates else {}  # Default to the last candidate

The issue with the design of selector is that -- it's selecting between multiple candidates for each parameter update -- it seems a bit harder to design external validation / external verification for the parameter value.

@doxav
Copy link
Contributor Author

doxav commented Dec 28, 2024

Select candidate will take by default the last candidate which is the lowest temperature (0) because generate_candidates starts by highgest temp and ends by lowest (0).

I put a dumb method for proof of concept but can be overriden or enriched by:

  • a scoring/ranking method if a scoring method is given for the task or we could use by default a LLM to select best.
  • an ensembling strategy like self-consistency: "Identify commonalities or alignments among those X answers to generate a more robust and consistent fifth answer.".
  • similarly to STOP: test & override progressively the generate_candidates and select_candidate strategy. This would require much more trials, and performance might be much lower at beginning and uncertain (except if it is activated when progress is at a plateau).

On the other hand, generate_candidates is by default based on different temperature variation/creativity on the same model, it could be replaced or enriched by using different models or roles.

Tell me what you think and prefer.

@chinganc
Copy link
Collaborator

Sorry for the long wait. Coming back from holidays and starting the review. @doxav Here're some comments. Like to hear what you want to build first before I start modifying it.

  1. What does using _default_value_rewrite try to achieve? In the current implementation, I think it won't have any effects. It will always just return the value of the input arguments that is either input manually or set default in the method's signature. I guess perhaps you want to achieve the effects of setting the default value at __init__? In this case, perhaps you need to set the default value at the method's signature to e.g. None (or some unused values) and detect those.

  2. There're several usages of temperature_range: List[float] = [1.3, 0.] to set the default value in method's signature. This can lead to buggy code as the list can be modified in-place. Let's set it to None in the signature and have another if condition to set it to [1.3, 0.] if None is seen.

  3. If by default when selector is not provided, you want to return the answer with the lowest temperature, why not also call generate_candidates differently to generate one response only to save compute as well?

  4. @allenanie The selector API makes sense to me. Perhaps we can write a selector class and have some common selectors implemented.

@chinganc chinganc added the enhancement New feature or request label Dec 30, 2024
@allenanie
Copy link
Collaborator

allenanie commented Dec 31, 2024

@chinganc

  1. I added _default_value_rewrite because I wanted to provide an option to set configuration through __init__ but maintain the flexibility of changing these configurations during the function call. I just realized this code does not achieve what I wanted. Let me do a change on this...
  2. temperature_range I remember if I did this or it was in the original code...
  3. I'll leave this to @doxav
  4. Sounds good!

@allenanie
Copy link
Collaborator

@chinganc

I resolved 1 and 2 -- the function signature / __init__ should allow the default overwriting behavior I wanted (by setting function signature default to be None, detect and then overwrite).

@doxav
Copy link
Contributor Author

doxav commented Jan 2, 2025

(3.) Yes, totally agree with your proposition "If by default when selector is not provided, you want to return the answer with the lowest temperature, why not also call generate_candidates differently to generate one response only to save compute as well?"

Copy link
Collaborator

@allenanie allenanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging -- the code is runnable and the notebook is runnable as well.

@allenanie allenanie merged commit b83155c into microsoft:main Jan 23, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants