Skip to content

Commit 8ac172f

Browse files
committed
Added example for LLM autogen tests
1 parent 8b916ff commit 8ac172f

File tree

1 file changed

+64
-30
lines changed

1 file changed

+64
-30
lines changed

README.md

Lines changed: 64 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -94,39 +94,73 @@ See the [Compliance Command (WIP)](#compliance-command-wip) section below for de
9494
}
9595
```
9696

97-
2. **Create tool and/or eval test files**:
98-
99-
**`filesystem-tool-tests.yaml`**:
100-
101-
```yaml
102-
tools:
103-
expected_tool_list: ['write_file']
104-
tests:
105-
- name: 'Write file successfully'
106-
tool: 'write_file'
107-
params: { path: '/tmp/test.txt', content: 'Hello world' }
108-
expect: { success: true }
109-
```
97+
2a. **Create tool and/or eval test files manually**:
11098

111-
**`filesystem-eval-tests.yaml`**:
112-
113-
```yaml
114-
evals:
115-
models: ['claude-3-5-haiku-latest']
116-
tests:
117-
- name: 'LLM can write files'
118-
prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
119-
expected_tool_calls:
120-
required: ['write_file']
121-
response_scorers:
122-
- type: 'llm-judge'
123-
criteria: 'Did the assistant successfully create the file?'
124-
threshold: 0.8
125-
```
99+
**`filesystem-tool-tests.yaml`**:
126100

127-
See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
101+
```yaml
102+
tools:
103+
expected_tool_list: ['write_file']
104+
tests:
105+
- name: 'Write file successfully'
106+
tool: 'write_file'
107+
params: { path: '/tmp/test.txt', content: 'Hello world' }
108+
expect: { success: true }
109+
```
110+
111+
**`filesystem-eval-tests.yaml`**:
112+
113+
```yaml
114+
evals:
115+
models: ['claude-3-5-haiku-latest']
116+
tests:
117+
- name: 'LLM can write files'
118+
prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
119+
expected_tool_calls:
120+
required: ['write_file']
121+
response_scorers:
122+
- type: 'llm-judge'
123+
criteria: 'Did the assistant successfully create the file?'
124+
threshold: 0.8
125+
```
126+
127+
See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
128+
129+
2b. **Create tool and eval tests using an LLM**:
130+
131+
Try out this prompt, replacing the server config information with your own:
132+
133+
```
134+
Please create tool tests and eval tests for me to use with the mcp server tester tool.
135+
To see how to use it, read the documentation at: https://github.com/steviec/mcp-server-tester/ and then run:
136+
137+
`npx -y mcp-server-tester --help`
138+
139+
My server config file is at ./filesystem-server-config.json. To know what tools you need to create tests for, run this command:
140+
141+
`npx -y @modelcontextprotocol/inspector --cli --config filesystem-server-config.json --server filesystem-server --method tools/list`
142+
143+
Please follow these steps:
144+
145+
1. Create tool tests
146+
- Create a file called `tool-tests.yaml` that contains a single test for each tool. Follow these guidelines:
147+
- Do NOT force an individual test to pass; if the expected output is not returned, the test should fail
148+
- if there is a clear dependency between tool calls, you can chain them using the "calls" property
149+
- Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
150+
151+
2. Create eval tests
152+
- Create a file called `eval-tests.yaml` with eval tests that will test the server's behavior. Follow these guidelines:
153+
- start with a few simple evals, and then build up to more complex ones
154+
- create between 5 and 10 eval tests
155+
- Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
156+
157+
3. Provide a summary, which includes:
158+
- a list of the tools that are being tested and what you chose to test
159+
- a list of the eval tests and your reasoning for why you chose them
160+
- an explanation of how to run the tool tests and eval tests
161+
```
128162
129-
3. **Run tests**:
163+
1. **Run tests**:
130164
131165
```bash
132166
# Run tools tests (fast, no API key needed)

0 commit comments

Comments
 (0)