Skip to content

Add support for Evals #215

@scottpersinger

Description

@scottpersinger

We should spec out an easy Eval system for testing agents:

  • Write tests using simple Yaml
  • Input prompt and expected output
  • Indicated expected tool calls

Like:

name: Weather Agent Eval
evals:
  - it: should not know the time
     input: What time is it?
     eval_response: I don't have access to time information.
  - it: should know the weather report
     input: what is the weather in Toronto?
     eval_judge: An accurate weather report was returned.
  - it: should call the weather tool
     input: what is the weather in Toronto?
     expected_tools:
        - name: get_weather
           eval_arguments: Lat and long for Toronto
           eval_response: the weather report for Toronto

This is just a first pass. But the idea is that the eval_xx attributes mean that the LLM evaluates the response, or in the case of a tool call it could evaluate both the inputs and outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions