Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,14 @@
This repo is not a plugin, and is meant to be used in conjunction with [Discourse AI](https://github.com/discourse/discourse-ai) plugin.

See https://github.com/discourse/discourse-ai?tab=readme-ov-file#evals for more information.


#### Prompts

Each eval config may contain a single or multiple test cases. Attributes (prompts, messages, followups) will be singular or plural accordingly.

Single test case example, see
- https://github.com/discourse/discourse-ai-evals/blob/main/tool_calls/tool_calls_with_no_tool.yml
Multiple test case example, see
- https://github.com/discourse/discourse-ai-evals/blob/main/translate/translate_topic_title.yml (with judge)
- https://github.com/discourse/discourse-ai-evals/blob/main/tool_calls/tool_call_chains.yml (with multiple followups)
37 changes: 37 additions & 0 deletions tool_calls/tool_call_chains.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
id: tool_call_chains
name: Tool call chains
description: Call multiple tools in multiple tests
type: prompt
args:
- id: addition-test
name: Addition
description: Test the addition works in subsequent tool calls
temperature: 0
stream: false
prompts:
- "You are a helpful bot"
messages:
- "Add 1 and 2"
tools:
-
name: "addition"
description: "Will add two numbers"
parameters:
- name: "text"
type: "string"
description: "the numbers to add"
required: true
followups:
-
tools: []
message:
type: "tool"
id: ["tool_call", "id"]
name: ["tool_call", "name"]
content: "3"
-
tools: []
message:
type: "user"
content: "add 4 to that"
expected_output_regex: "add.*4.*3"
27 changes: 27 additions & 0 deletions tool_calls/tool_calls_with_no_tool.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
id: tool_call_no_tools
name: Tool calls with no tool
description: Eval see what happens after a tool call comes back and we resubmit with no tools, does the llm get confused?
type: prompt
args:
output_thinking: true
temperature: 0
prompt: "You are a helpful bot"
message: "echo the text sam and then respond to me with the text done"
tools:
-
name: "echo"
description: "will echo the text"
parameters:
- name: "text"
type: "string"
description: "the text to echo"
required: true
followup:
tools: []
message:
type: "tool"
id: ["tool_call", "id"]
name: ["tool_call", "name"]
content: "content was echoed"
expected_output_regex: "one"