-
Notifications
You must be signed in to change notification settings - Fork 74
Implement Metadata for SMAC enabling Multi-Fidelity #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
548af15
36ac67a
4cc133b
5ab03c8
bfd2a42
16208f4
938f8f0
81d6d56
bf2f3cc
1686c7c
d263613
6766b8d
bae1763
53af62b
81e8bb0
dcff9cc
50ef16c
c32bd67
2b15694
574b8cc
3d4c055
41ee533
cfa936a
abd3eb6
e0ac571
c1e0845
9234599
054fce3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
jsfreischuetz marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Optimizers | ||
|
||
This is a directory that contains wrappers for different optimizers to integrate into MLOS. | ||
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`. | ||
|
||
The main goal of these optimizers is to `suggest` configurations, possibly based on prior trial data to find an optimum based on some objective(s). | ||
This process is interacted with through `register` and `suggest` interfaces. | ||
|
||
The following definitions are useful for understanding the implementation | ||
|
||
- `configuration`: a vector representation of a configuration of a system to be evaluated. | ||
- `score`: the objective(s) associated with a configuration | ||
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation. | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- `context`: additional (static) information about the evaluation used to extend the internal model used for suggesting samples. | ||
jsfreischuetz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For instance, a descriptor of the VM size (vCore count and # of GB of RAM), and some descriptor of the workload. | ||
The intent being to allow either sharing or indexing of trial info between "similar" experiments in order to help make the optimization process more efficient for new scenarios. | ||
> Note: This is not yet implemented. | ||
The interface for these classes can be described as follows: | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
- `register`: this is a function that takes a configuration, a score, and, optionally, metadata about the evaluation to update the model for future evaluations. | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- `suggest`: this function returns a new configuration for evaluation. | ||
|
||
Some optimizers will return additional metadata for evaluation, that should be used during the register phase. | ||
This function can also optionally take context (not yet implemented), and an argument to force the function to return the default configuration. | ||
- `register_pending`: registers a configuration and metadata pair as pending to the optimizer. | ||
- `get_observations`: returns all observations reported to the optimizer as a triplet of DataFrames (config, score, context, metadata). | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- `get_best_observations`: returns the best observation as a triplet of best (config, score, context, metadata) DataFrames. | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,9 +56,9 @@ def __init__(self, *, | |
raise ValueError("Number of weights must match the number of optimization targets") | ||
|
||
self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter | ||
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = [] | ||
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = [] | ||
self._has_context: Optional[bool] = None | ||
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = [] | ||
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = [] | ||
|
||
def __repr__(self) -> str: | ||
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})" | ||
|
@@ -98,7 +98,7 @@ def register(self, *, configs: pd.DataFrame, scores: pd.DataFrame, | |
"Mismatched number of configs and context." | ||
assert configs.shape[1] == len(self.parameter_space.values()), \ | ||
"Mismatched configuration shape." | ||
self._observations.append((configs, scores, context)) | ||
self._observations.append((configs, scores, context, metadata)) | ||
self._has_context = context is not None | ||
|
||
if self._space_adapter: | ||
|
@@ -197,26 +197,48 @@ def register_pending(self, *, configs: pd.DataFrame, | |
""" | ||
pass # pylint: disable=unnecessary-pass # pragma: no cover | ||
|
||
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]: | ||
def _get_observations(self, observations: | ||
List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] | ||
) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]: | ||
""" | ||
Returns the observations as a triplet of DataFrames (config, score, context). | ||
Returns the observations as a quad of DataFrames(config, score, context, metadata) | ||
for a specific set of observations. | ||
|
||
Parameters | ||
---------- | ||
observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] | ||
Observations to run the transformation on | ||
|
||
Returns | ||
------- | ||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]] | ||
A triplet of (config, score, context) DataFrames of observations. | ||
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]] | ||
A quad of(config, score, context, metadata) DataFrames of observations. | ||
""" | ||
if len(self._observations) == 0: | ||
if len(observations) == 0: | ||
raise ValueError("No observations registered yet.") | ||
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True) | ||
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True) | ||
configs = pd.concat([config for config, _, _, _ in observations]).reset_index(drop=True) | ||
scores = pd.concat([score for _, score, _, _ in observations]).reset_index(drop=True) | ||
contexts = pd.concat([pd.DataFrame() if context is None else context | ||
for _, _, context in self._observations]).reset_index(drop=True) | ||
return (configs, scores, contexts if len(contexts.columns) > 0 else None) | ||
for _, _, context, _ in observations]).reset_index(drop=True) | ||
metadatas = pd.concat([pd.DataFrame() if metadata is None else metadata | ||
for _, _, _, metadata in observations]).reset_index(drop=True) | ||
return (configs, scores, contexts if len(contexts.columns) > 0 else None, metadatas if len(metadatas.columns) > 0 else None) | ||
|
||
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These Think we discussed creating a NamedTuple or small DataClass for them instead so that they can be accessed by name in order to make it more readable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want I can do this in this PR, or another follow up PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a predecessor PR would be better. Much like we did with adding the metadata args and named args first. |
||
""" | ||
Returns the observations as a quad of DataFrames(config, score, context, metadata). | ||
|
||
Returns | ||
------- | ||
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]] | ||
A quad of(config, score, context, metadata) DataFrames of observations. | ||
""" | ||
return self._get_observations(self._observations) | ||
|
||
def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]: | ||
def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], | ||
Optional[pd.DataFrame]]: | ||
""" | ||
Get the N best observations so far as a triplet of DataFrames (config, score, context). | ||
Get the N best observations so far as a quad of DataFrames (config, score, context, metadata). | ||
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets. | ||
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood. | ||
|
||
|
@@ -227,15 +249,16 @@ def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.Dat | |
|
||
Returns | ||
------- | ||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]] | ||
A triplet of best (config, score, context) DataFrames of best observations. | ||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]] | ||
A quad of best (config, score, context, metadata) DataFrames of best observations. | ||
""" | ||
if len(self._observations) == 0: | ||
raise ValueError("No observations registered yet.") | ||
(configs, scores, contexts) = self.get_observations() | ||
(configs, scores, contexts, metadatas) = self.get_observations() | ||
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index | ||
return (configs.loc[idx], scores.loc[idx], | ||
None if contexts is None else contexts.loc[idx]) | ||
None if contexts is None else contexts.loc[idx], | ||
None if metadatas is None else metadatas.loc[idx]) | ||
|
||
def cleanup(self) -> None: | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# | ||
""" | ||
Contains utils used for implementing the mlos_core optimizers | ||
""" | ||
import inspect | ||
from typing import Any, Callable, Dict, List, Optional | ||
import pandas as pd | ||
|
||
|
||
def to_metadata(metadata: Optional[pd.DataFrame]) -> Optional[List[pd.Series]]: | ||
""" | ||
Converts a list of metadata dataframe objects to a list of metadata objects | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Parameters | ||
jsfreischuetz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
---------- | ||
metadata : Optional[pd.DataFrame] | ||
The dataframe to convert to metadata | ||
Returns | ||
------- | ||
Optional[List[pd.Series]] | ||
The created metadata object | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
""" | ||
if metadata is None: | ||
return None | ||
return [idx_series[1] for idx_series in metadata.iterrows()] | ||
|
||
|
||
def filter_kwargs(function: Callable, **kwargs: Any) -> Dict[str, Any]: | ||
bpkroth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Filters arguments provided in the kwargs dictionary to be restricted to the arguments legal for | ||
the called function. | ||
Parameters | ||
---------- | ||
function : Callable | ||
function over which we filter kwargs for. | ||
kwargs: | ||
kwargs that we are filtering for the target function | ||
Returns | ||
------- | ||
dict | ||
kwargs with the non-legal argument filtered out | ||
""" | ||
sig = inspect.signature(function) | ||
filter_keys = [ | ||
param.name | ||
for param in sig.parameters.values() | ||
if param.kind == param.POSITIONAL_OR_KEYWORD | ||
] | ||
filtered_dict = { | ||
filter_key: kwargs[filter_key] for filter_key in filter_keys & kwargs.keys() | ||
} | ||
return filtered_dict |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
# | ||
""" | ||
Tests for Optimizers using Metadata. | ||
""" | ||
|
||
from typing import Callable | ||
|
||
import logging | ||
import pytest | ||
|
||
import pandas as pd | ||
import ConfigSpace as CS | ||
|
||
from smac import MultiFidelityFacade as MFFacade | ||
from smac.intensifier.successive_halving import SuccessiveHalving | ||
|
||
from mlos_core.optimizers import ( | ||
OptimizerType, OptimizerFactory, BaseOptimizer) | ||
from mlos_core.tests import SEED | ||
|
||
_LOG = logging.getLogger(__name__) | ||
_LOG.setLevel(logging.DEBUG) | ||
|
||
|
||
def smac_verify_best(metadata: pd.DataFrame) -> bool: | ||
""" | ||
Function to verify if the metadata used by SMAC is in a legal state | ||
Parameters | ||
---------- | ||
metadata : pd.DataFrame | ||
metadata returned by SMAC | ||
Returns | ||
------- | ||
bool | ||
if the metadata that is returned is valid | ||
""" | ||
max_budget = metadata["budget"].max() | ||
if isinstance(max_budget, float): | ||
return max_budget == 9 | ||
return False | ||
|
||
|
||
@pytest.mark.parametrize(('optimizer_type', 'verify', 'kwargs'), [ | ||
# Enumerate all supported Optimizers | ||
*[(member, verify, {"seed": SEED, "facade": MFFacade, "intensifier": SuccessiveHalving, "min_budget": 1, "max_budget": 9}) | ||
for member, verify in [(OptimizerType.SMAC, smac_verify_best)]], | ||
|
||
]) | ||
def test_optimizer_metadata(optimizer_type: OptimizerType, verify: Callable[[pd.DataFrame], bool], kwargs: dict) -> None: | ||
""" | ||
Toy problem to test if metadata is properly being handled for each supporting optimizer | ||
""" | ||
max_iterations = 100 | ||
|
||
def objective(point: pd.DataFrame) -> pd.DataFrame: | ||
# mix of hyperparameters, optimal is to select the highest possible | ||
return pd.DataFrame({"score": point["x"] + point["y"]}) | ||
|
||
input_space = CS.ConfigurationSpace(seed=SEED) | ||
# add a mix of numeric datatypes | ||
input_space.add_hyperparameter(CS.UniformIntegerHyperparameter(name='x', lower=0, upper=5)) | ||
input_space.add_hyperparameter(CS.UniformFloatHyperparameter(name='y', lower=0.0, upper=5.0)) | ||
|
||
optimizer: BaseOptimizer = OptimizerFactory.create( | ||
parameter_space=input_space, | ||
optimization_targets=['score'], | ||
optimizer_type=optimizer_type, | ||
optimizer_kwargs=kwargs, | ||
) | ||
|
||
with pytest.raises(ValueError, match="No observations"): | ||
optimizer.get_best_observations() | ||
|
||
with pytest.raises(ValueError, match="No observations"): | ||
optimizer.get_observations() | ||
|
||
for _ in range(max_iterations): | ||
config, metadata = optimizer.suggest() | ||
assert isinstance(metadata, pd.DataFrame) | ||
|
||
optimizer.register(configs=config, scores=objective(config), metadata=metadata) | ||
bpkroth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
(all_configs, all_scores, all_contexts, all_metadata) = optimizer.get_observations() | ||
assert isinstance(all_configs, pd.DataFrame) | ||
assert isinstance(all_scores, pd.DataFrame) | ||
assert all_contexts is None | ||
assert isinstance(all_metadata, pd.DataFrame) | ||
assert smac_verify_best(all_metadata) | ||
|
||
(best_configs, best_scores, best_contexts, best_metadata) = optimizer.get_best_observations() | ||
assert isinstance(best_configs, pd.DataFrame) | ||
assert isinstance(best_scores, pd.DataFrame) | ||
assert best_contexts is None | ||
assert isinstance(best_metadata, pd.DataFrame) | ||
assert smac_verify_best(best_metadata) | ||
jsfreischuetz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.