Skip to content

Feature request: ability to run evaluators against only a subset of the data #1856

@dmontagu

Description

@dmontagu

@DouweM and I discussed this, we want something that lets you select a specific field of the data in the EvaluatorContext before applying some existing evaluator (like Equals or IsInstance or LLMJudge). Here's some essentially-pseudocode that maybe conveys what I had in mind:

from dataclasses import replace, dataclass
from typing import Callable, Any

from pydantic_evals import Dataset
from pydantic_evals.evaluators import Equals, Evaluator, EvaluatorContext


@dataclass
class Selector:
    attributes: list[str]
    
    def __call__(self, value: Any) -> Any:
        """Selects the attribute from the value based on the attributes list."""
        for attr in self.attributes:
            if isinstance(value, dict):
                value = value.get(attr)
            else:
                value = getattr(value, attr, None)
        return value

class SelectorEvaluator[InputT, OutputT, MetadataT](Evaluator[InputT, OutputT, MetadataT]):
    input_selector: Selector | Callable[[InputT], Any] | None = None
    output_selector: Selector | Callable[[OutputT], Any] | None = None
    metadata_selector: Selector | Callable[[MetadataT | None], Any] | None = None

    evaluator: Evaluator[Any, Any, Any]

    def evaluate(self, ctx: EvaluatorContext[InputT, OutputT, MetadataT]):
        if self.input_selector is not None:
            ctx = replace(ctx, inputs=self.input_selector(ctx.inputs))
        if self.output_selector is not None:
            ctx = replace(ctx, output=self.output_selector(ctx.output), expected_output=None if ctx.expected_output is None else self.output_selector(ctx.expected_output))
        if self.metadata_selector is not None and ctx.metadata is not None:
            ctx = replace(ctx, metadata=self.metadata_selector(ctx.metadata))
        return self.evaluator.evaluate(ctx)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions