The use of higher-order functions, i.e. functions that take other functions as parameters, can replace the strategy pattern, where an object supporting the respective function is passed as an argument.
Consider the following alternative implementations of a regression model evaluation use case, where the metric to use shall be parametrisable:
-
Because we require additional information on the metric in order to apply it correctly, specifically
- a string indicating its name for reporting and
- a flag indicating whether higher values are better,
the function itself is not sufficient and we thus add three parameters to the evaluation function's interface:
metric_fn: Callable[[np.ndarray, np.ndarray], float], metric_name: str, higher_is_better: bool,
-
object-oriented implementation
We define an abstract class
Metricwhich handles all the required aspects.class Metric(ABC): @abstractmethod def compute_value(self, y_ground_truth: np.ndarray, y_predicted: np.ndarray) -> float: pass @abstractmethod def get_name(self) -> str: pass @abstractmethod def is_larger_better(self) -> bool: pass
The evaluation function takes a single argument:
metric: Metric
Please take a close look at both Python files and think about the implications.
The OOP solution has significant advantages over the functional solution:
-
Encapsulation: The class-based approach allows the algorithm itself and additional information/operations to be appropriately grouped.
Grouping the actual computation (method
compute_value) and the meta-data (name and "larger is better" flag) in a representational unit (abstraction) leads to less cumbersome usage patterns (1 vs. 3 arguments) and furthermore helps to avoid errors: Users are not required to think about which combination of parameters is appropriate, as the intended combinations are pre-configured in classes.A class is simply a more general concept than a function and therefore is more amenable to complex use cases; classes soundly generalise to cases where an algorithmic method cannot reasonably be reduced to a single function.
-
Discoverability: The abstract base class provides a (searchable) type bound, which straightforwardly enables discovery of potential implementations. We need only to use our IDE's hierarchy view in order to discover all existing implementations of
Metricthat fit the class-based interface:By contrast, the functional interface, which specifies
Callable[[np.ndarray, np.ndarray], float], supports no such search. This can be a significant problem in cases where the functions to apply are not co-located with the function applying them; they could be anywhere within a very large codebase encompassing dozens of modules and thousands of lines of code. Only extensive documentation could avoid severe usability limitations. The OOP solution can do without; the type information is sufficient to discover all existing implementations. -
Parametrisability: Objects can straightforwardly parametrise their behaviour through attributes,
which alleviates the need for cumbersome currying, e.g. through alambdafunction, a local function (closure) orfunctools.partial.In our example, we used a lambda function to specify the metric's threshold parameter:
lambda t, u: compute_metric_rel_freq_error_within(t, u, max_error)
In the object-oriented case, we simply parametrise the object:
MetricRelFreqErrorWithin(max_error)
-
Logging and persistence: Objects have representations which can more readily be logged and stored.
A
lambdafunction or anonymous function lacks both, a representation amenable to logging and the possibility of serialisation. While the use offunctools.partialallows for serialisation (by converting the curried function to an object), it does not have a customizable representation for logging. The class, by contrast, gives us full control; we could implement__str__or__repr__in any way we please - and we optionally have fine-grained control over persistence by implementing__getstate__and__setstate__. -
Explicit type relationships.
In functional interface specifications, we use
Callableto declare the type of the function - or a previously definedProtocolin the case of a complex function signature involving keyword arguments. In either case, the relationship between the declared type and its implementors is, in typical duck typing fashion, normally an implicit one. By contrast, the object-oriented solution establishes an explicit type relationship. An explicit type relationship has the advantage of being checkable, i.e. a static type checker can test whether an interface is indeed implemented correctly (as far as types are concerned), whereas a divergence in duck-typed implementations will remain unnoticed: If a function signature changes (for whatever reason), static type checking will not reveal that the function no longer aligns with the type annotation of the higher-order function it was intended to be used in conjunction with.
These are already important reasons to prefer the object-oriented solution. But now consider the case where we want to support multiple metrics, as in our case study in the OOP essentials module. Since the function alone is not sufficient in our case - we also need the meta-data (name, higher is better) - we would have the following options to provide multiple metrics in the functional case:
-
a list of tuples bundling the function with the required metadata: the elements of the tuple can only be accessed by the corresponding index and the docstring would need to explain this in detail.
metric_tuples: List[Tuple[Callable[[np.ndarray, np.ndarray], float], str, bool]],
-
a list of dictionaries with keys such as "metric_fn", "metric_name", "higher_is_better", i.e
metrics: List[Dict[str, Any]],
where we completely lose type information for the values; static type checking cannot take place. Again, we would need to explain in detail how to construct the dictionary and would need checks to ensure that the format is adhered to. Since a dictionary is basically a primitive object with the aforementioned downsides, this would be a most questionable choice.
-
three separate lists
metric_fns: List[Callable[[np.ndarray, np.ndarray], float], metric_names: List[str], higher_is_better_flags: List[bool],
which we could iterate over using
zip. However, we then would need to check for consistency (equal lengths).
Compare these options to the straightforward object-oriented solution, where we simply specify
metrics: List[Metric],which gives us full static type checking support and is entirely self-documenting.
All things considered, we cannot think of many reasons to prefer the functional alternative. Of course, there are cases where the function is simple, does not need to be parametrised and does not need to be meaningfully logged, etc. - and in such cases, we could think about using a functional interface, as it could be the most concise solution. As a general rule, however, the object-oriented strategy pattern is the most elegant solution for injecting algorithms.
