Skip to content

Commit 9d307bc

Browse files
authored
DEV: Allow prompts to have multiple tests per config and followups with tools (#9)
This commit builds on #8. - An eval may now have multiple tests denoted by `args` (multiple tests if args is an array, assume `args` is `tests`). See `tool_call_chains.yml`. - An eval may have one or many `followups`. See `tool_call_no_tools.yml` (one) or `tool_call_chains.yml` (many) - Followups may use tools or the typical prompt message we support
1 parent f0b3045 commit 9d307bc

File tree

3 files changed

+75
-0
lines changed

3 files changed

+75
-0
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,14 @@
33
This repo is not a plugin, and is meant to be used in conjunction with [Discourse AI](https://github.com/discourse/discourse-ai) plugin.
44

55
See https://github.com/discourse/discourse-ai?tab=readme-ov-file#evals for more information.
6+
7+
8+
#### Prompts
9+
10+
Each eval config may contain a single or multiple test cases. Attributes (prompts, messages, followups) will be singular or plural accordingly.
11+
12+
Single test case example, see
13+
- https://github.com/discourse/discourse-ai-evals/blob/main/tool_calls/tool_calls_with_no_tool.yml
14+
Multiple test case example, see
15+
- https://github.com/discourse/discourse-ai-evals/blob/main/translate/translate_topic_title.yml (with judge)
16+
- https://github.com/discourse/discourse-ai-evals/blob/main/tool_calls/tool_call_chains.yml (with multiple followups)

tool_calls/tool_call_chains.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
id: tool_call_chains
2+
name: Tool call chains
3+
description: Call multiple tools in multiple tests
4+
type: prompt
5+
args:
6+
- id: addition-test
7+
name: Addition
8+
description: Test the addition works in subsequent tool calls
9+
temperature: 0
10+
stream: false
11+
prompts:
12+
- "You are a helpful bot"
13+
messages:
14+
- "Add 1 and 2"
15+
tools:
16+
-
17+
name: "addition"
18+
description: "Will add two numbers"
19+
parameters:
20+
- name: "text"
21+
type: "string"
22+
description: "the numbers to add"
23+
required: true
24+
followups:
25+
-
26+
tools: []
27+
message:
28+
type: "tool"
29+
id: ["tool_call", "id"]
30+
name: ["tool_call", "name"]
31+
content: "3"
32+
-
33+
tools: []
34+
message:
35+
type: "user"
36+
content: "add 4 to that"
37+
expected_output_regex: "add.*4.*3"
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
id: tool_call_no_tools
2+
name: Tool calls with no tool
3+
description: Eval see what happens after a tool call comes back and we resubmit with no tools, does the llm get confused?
4+
type: prompt
5+
args:
6+
output_thinking: true
7+
temperature: 0
8+
prompt: "You are a helpful bot"
9+
message: "echo the text sam and then respond to me with the text done"
10+
tools:
11+
-
12+
name: "echo"
13+
description: "will echo the text"
14+
parameters:
15+
- name: "text"
16+
type: "string"
17+
description: "the text to echo"
18+
required: true
19+
followup:
20+
tools: []
21+
message:
22+
type: "tool"
23+
id: ["tool_call", "id"]
24+
name: ["tool_call", "name"]
25+
content: "content was echoed"
26+
expected_output_regex: "one"
27+

0 commit comments

Comments
 (0)