You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: evals/README.md
+45-1Lines changed: 45 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,6 +143,13 @@ To get the best performance out of Claude when using tools, follow these guideli
143
143
-**Prioritize descriptions over examples.**
144
144
While you can include examples of how to use a tool in its description or accompanying prompt, this is less important than having a clear and comprehensive explanation of the tool’s purpose and parameters.
145
145
Only add examples **after** you’ve fully developed the description.
146
+
147
+
## Optimize metadata for OpenAI models
148
+
149
+
- Name – pair the domain with the action (calendar.create_event).
150
+
- Description – start with “Use this when…” and call out disallowed cases (“Do not use for reminders”).
151
+
- Parameter docs – describe each argument, include examples, and use enums for constrained values.
152
+
- Read-only hint – annotate readOnlyHint: true on tools that never mutate state so ChatGPT can streamline confirmation.
146
153
---
147
154
148
155
## How to analyze and improve a specific tool
@@ -212,6 +219,43 @@ Always make improvements **manually**, based on your understanding of the proble
212
219
LLMs are very likely to worsen the issue instead of fixing it.
213
220
214
221
215
-
# References:
222
+
# Tool definition patterns
223
+
224
+
Based on analysis of [Cursor Agent Tools v1.0](https://raw.githubusercontent.com/x1xhlol/system-prompts-and-models-of-ai-tools/refs/heads/main/Cursor%20Prompts/Agent%20Tools%20v1.0.json), [Lovable Agent Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Lovable/Agent%20Tools.json), and [Claude Code Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Claude%20Code/claude-code-tools.json):
225
+
226
+
## Tool description vs parameter description
227
+
228
+
**Tool description** should contain:
229
+
- What the tool does (core functionality)
230
+
- When to use it (usage context)
231
+
- Key limitations (what it doesn't do)
232
+
- High-level behavior (how it works conceptually)
233
+
234
+
**Parameter description** should contain:
235
+
- Parameter-specific details (what each parameter does)
236
+
- Input constraints (validation rules, formats)
237
+
- Usage examples (specific examples for that parameter)
238
+
- Parameter-specific guidance (how to use that specific parameter)
239
+
240
+
## Key patterns
241
+
242
+
1.**Concise but comprehensive** - Avoid overly verbose descriptions
243
+
2.**Semantic clarity** - Use language that matches user intent
244
+
3.**Clear separation** - Tool purpose vs parameter-specific guidance
245
+
4.**Operational constraints** - State limitations and boundaries
246
+
5.**Contextual guidance** - Include usage instructions where relevant
247
+
248
+
## References
216
249
217
250
-[Example of a good tool description](https://docs.claude.com/en/docs/agents-and-tools/tool-use/implement-tool-use#example-of-a-good-tool-description)
Copy file name to clipboardExpand all lines: evals/test-cases.json
+11-12Lines changed: 11 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
{
2
-
"version": "1.1",
2
+
"version": "1.3",
3
3
"testCases": [
4
4
{
5
5
"id": "fetch-actor-details-1",
@@ -66,15 +66,15 @@
66
66
"id": "search-actors-1",
67
67
"category": "search-actors",
68
68
"query": "How to scrape Instagram posts",
69
-
"expectedTools": ["search-actors"],
70
-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram posts' or similar"
69
+
"expectedTools": [],
70
+
"reference": "Either it should explain how to scrape Instagram posts or call 'search-actors' tool with the query: 'Instagram posts' or similar"
71
71
},
72
72
{
73
73
"id": "search-actors-2",
74
74
"category": "search-actors",
75
75
"query": "What are the best Instagram scrapers?",
76
76
"expectedTools": ["search-actors"],
77
-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram scraper' or similar."
77
+
"reference": "It must call the 'search-actors' tool with the query: `Instagram`, 'Instagram scraper', or similar."
78
78
},
79
79
{
80
80
"id": "search-actors-3",
@@ -196,9 +196,9 @@
196
196
{
197
197
"id": "search-vs-rag-1",
198
198
"category": "search-actors",
199
-
"query": "Find posts about AI on Instagram",
199
+
"query": "Find posts about the Rock on Instagram",
200
200
"expectedTools": ["search-actors"],
201
-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram' or 'Instagram posts' or similar. It must not use extended queries such as 'Instagram posts AI' or any more detailed variations."
201
+
"reference": "It must call the 'search-actors' tool with the query: 'Instagram' or 'Instagram posts' or similar. It must not use extended queries such as 'Instagram posts the Rock' or any more detailed variations."
202
202
},
203
203
{
204
204
"id": "search-vs-rag-2",
@@ -234,9 +234,8 @@
234
234
{
235
235
"id": "search-vs-rag-7",
236
236
"category": "search-actors",
237
-
"query": "Fetch flight details for New York to London",
238
-
"expectedTools": ["search-actors"],
239
-
"reference": "It must call the 'search-actors' tool with the query: 'flight data' or 'flight booking' or similar"
237
+
"query": "Find one way flights from New York to London tomorrow",
238
+
"expectedTools": ["search-actors"]
240
239
},
241
240
{
242
241
"id": "search-vs-rag-8",
@@ -391,8 +390,8 @@
391
390
},
392
391
{
393
392
"id": "misleading-query-1",
394
-
"category": "search-actors",
395
-
"query": "What's the weather like today?",
393
+
"category": "apify-slash-rag-web-browser",
394
+
"query": "What's the weather like today in San Francisco?",
396
395
"expectedTools": ["apify-slash-rag-web-browser"]
397
396
},
398
397
{
@@ -410,7 +409,7 @@
410
409
{
411
410
"id": "ambiguous-query-1",
412
411
"category": "search-actors",
413
-
"query": "Instagram posts",
412
+
"query": "Get instagram posts",
414
413
"expectedTools": ["search-actors"],
415
414
"reference": "It must call the 'search-actors' tool with the query: 'Instagram posts' or similar"
0 commit comments