-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
We should spec out an easy Eval system for testing agents:
- Write tests using simple Yaml
- Input prompt and expected output
- Indicated expected tool calls
Like:
name: Weather Agent Eval
evals:
- it: should not know the time
input: What time is it?
eval_response: I don't have access to time information.
- it: should know the weather report
input: what is the weather in Toronto?
eval_judge: An accurate weather report was returned.
- it: should call the weather tool
input: what is the weather in Toronto?
expected_tools:
- name: get_weather
eval_arguments: Lat and long for Toronto
eval_response: the weather report for TorontoThis is just a first pass. But the idea is that the eval_xx attributes mean that the LLM evaluates the response, or in the case of a tool call it could evaluate both the inputs and outputs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels