@@ -94,39 +94,73 @@ See the [Compliance Command (WIP)](#compliance-command-wip) section below for de
9494 }
9595 ```
9696
97- 2 . ** Create tool and/or eval test files** :
98-
99- ** ` filesystem-tool-tests.yaml ` ** :
100-
101- ``` yaml
102- tools :
103- expected_tool_list : ['write_file']
104- tests :
105- - name : ' Write file successfully'
106- tool : ' write_file'
107- params : { path: '/tmp/test.txt', content: 'Hello world' }
108- expect : { success: true }
109- ` ` `
97+ 2a. ** Create tool and/or eval test files manually** :
11098
111- **` filesystem-eval-tests.yaml`**:
112-
113- ` ` ` yaml
114- evals:
115- models: ['claude-3-5-haiku-latest']
116- tests:
117- - name: 'LLM can write files'
118- prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
119- expected_tool_calls:
120- required: ['write_file']
121- response_scorers:
122- - type: 'llm-judge'
123- criteria: 'Did the assistant successfully create the file?'
124- threshold: 0.8
125- ` ` `
99+ ** ` filesystem-tool-tests.yaml ` ** :
126100
127- See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
101+ ``` yaml
102+ tools :
103+ expected_tool_list : ['write_file']
104+ tests :
105+ - name : ' Write file successfully'
106+ tool : ' write_file'
107+ params : { path: '/tmp/test.txt', content: 'Hello world' }
108+ expect : { success: true }
109+ ` ` `
110+
111+ **` filesystem-eval-tests.yaml`**:
112+
113+ ` ` ` yaml
114+ evals:
115+ models: ['claude-3-5-haiku-latest']
116+ tests:
117+ - name: 'LLM can write files'
118+ prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
119+ expected_tool_calls:
120+ required: ['write_file']
121+ response_scorers:
122+ - type: 'llm-judge'
123+ criteria: 'Did the assistant successfully create the file?'
124+ threshold: 0.8
125+ ` ` `
126+
127+ See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
128+
129+ 2b. **Create tool and eval tests using an LLM** :
130+
131+ Try out this prompt, replacing the server config information with your own :
132+
133+ ` ` `
134+ Please create tool tests and eval tests for me to use with the mcp server tester tool.
135+ To see how to use it, read the documentation at: https://github.com/steviec/mcp-server-tester/ and then run:
136+
137+ ` npx -y mcp-server-tester --help`
138+
139+ My server config file is at ./filesystem-server-config.json. To know what tools you need to create tests for, run this command :
140+
141+ ` npx -y @modelcontextprotocol/inspector --cli --config filesystem-server-config.json --server filesystem-server --method tools/list`
142+
143+ Please follow these steps :
144+
145+ 1. Create tool tests
146+ - Create a file called `tool-tests.yaml` that contains a single test for each tool. Follow these guidelines :
147+ - Do NOT force an individual test to pass; if the expected output is not returned, the test should fail
148+ - if there is a clear dependency between tool calls, you can chain them using the "calls" property
149+ - Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
150+
151+ 2. Create eval tests
152+ - Create a file called `eval-tests.yaml` with eval tests that will test the server's behavior. Follow these guidelines :
153+ - start with a few simple evals, and then build up to more complex ones
154+ - create between 5 and 10 eval tests
155+ - Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
156+
157+ 3. Provide a summary, which includes :
158+ - a list of the tools that are being tested and what you chose to test
159+ - a list of the eval tests and your reasoning for why you chose them
160+ - an explanation of how to run the tool tests and eval tests
161+ ` ` `
128162
129- 3 . **Run tests** :
163+ 1 . **Run tests**:
130164
131165 ` ` ` bash
132166 # Run tools tests (fast, no API key needed)
0 commit comments