Added example for LLM autogen tests

steviec · steviec · commit 8ac172fe87d7 · 2025-07-23T17:28:02.000-07:00
diff --git a/README.md b/README.md
@@ -94,39 +94,73 @@ See the [Compliance Command (WIP)](#compliance-command-wip) section below for de
    }
    ```
 
-2. **Create tool and/or eval test files**:
-
-   **`filesystem-tool-tests.yaml`**:
-
-   ```yaml
-   tools:
-     expected_tool_list: ['write_file']
-     tests:
-       - name: 'Write file successfully'
-         tool: 'write_file'
-         params: { path: '/tmp/test.txt', content: 'Hello world' }
-         expect: { success: true }
-   ```
+2a. **Create tool and/or eval test files manually**:
 
-   **`filesystem-eval-tests.yaml`**:
-
-   ```yaml
-   evals:
-     models: ['claude-3-5-haiku-latest']
-     tests:
-       - name: 'LLM can write files'
-         prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
-         expected_tool_calls:
-           required: ['write_file']
-         response_scorers:
-           - type: 'llm-judge'
-             criteria: 'Did the assistant successfully create the file?'
-             threshold: 0.8
-   ```
+**`filesystem-tool-tests.yaml`**:
 
-   See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
+```yaml
+tools:
+  expected_tool_list: ['write_file']
+  tests:
+    - name: 'Write file successfully'
+      tool: 'write_file'
+      params: { path: '/tmp/test.txt', content: 'Hello world' }
+      expect: { success: true }
+```
+
+**`filesystem-eval-tests.yaml`**:
+
+```yaml
+evals:
+  models: ['claude-3-5-haiku-latest']
+  tests:
+    - name: 'LLM can write files'
+      prompt: 'Create a file at /tmp/greeting.txt with the content "Hello from Claude"'
+      expected_tool_calls:
+        required: ['write_file']
+      response_scorers:
+        - type: 'llm-judge'
+          criteria: 'Did the assistant successfully create the file?'
+          threshold: 0.8
+```
+
+See the [Tools Testing](#tools-testing) and [Evals Testing](#evals-testing) sections for comprehensive syntax examples.
+
+2b. **Create tool and eval tests using an LLM**:
+
+Try out this prompt, replacing the server config information with your own:
+
+```
+Please create tool tests and eval tests for me to use with the mcp server tester tool.
+To see how to use it, read the documentation at: https://github.com/steviec/mcp-server-tester/ and then run:
+
+  `npx -y mcp-server-tester --help`
+
+My server config file is at ./filesystem-server-config.json. To know what tools you need to create tests for, run this command:
+
+  `npx -y @modelcontextprotocol/inspector --cli --config filesystem-server-config.json --server filesystem-server --method tools/list`
+
+Please follow these steps:
+
+1. Create tool tests
+  - Create a file called `tool-tests.yaml` that contains a single test for each tool. Follow these guidelines:
+    - Do NOT force an individual test to pass; if the expected output is not returned, the test should fail
+    - if there is a clear dependency between tool calls, you can chain them using the "calls" property
+  - Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
+
+2. Create eval tests
+  - Create a file called `eval-tests.yaml` with eval tests that will test the server's behavior. Follow these guidelines:
+    - start with a few simple evals, and then build up to more complex ones
+    - create between 5 and 10 eval tests
+  - Run the tests and confirm that the syntax is correct and that each test runs (they do not have to pass)
+
+3. Provide a summary, which includes:
+  - a list of the tools that are being tested and what you chose to test
+  - a list of the eval tests and your reasoning for why you chose them
+  - an explanation of how to run the tool tests and eval tests
+```
 
-3. **Run tests**:
+1. **Run tests**:
 
    ```bash
    # Run tools tests (fast, no API key needed)