Skip to content

Commit 6e235b9

Browse files
committed
fix: update dataset
1 parent e8c7650 commit 6e235b9

File tree

2 files changed

+56
-13
lines changed

2 files changed

+56
-13
lines changed

evals/README.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,13 @@ To get the best performance out of Claude when using tools, follow these guideli
143143
- **Prioritize descriptions over examples.**
144144
While you can include examples of how to use a tool in its description or accompanying prompt, this is less important than having a clear and comprehensive explanation of the tool’s purpose and parameters.
145145
Only add examples **after** you’ve fully developed the description.
146+
147+
## Optimize metadata for OpenAI models
148+
149+
- Name – pair the domain with the action (calendar.create_event).
150+
- Description – start with “Use this when…” and call out disallowed cases (“Do not use for reminders”).
151+
- Parameter docs – describe each argument, include examples, and use enums for constrained values.
152+
- Read-only hint – annotate readOnlyHint: true on tools that never mutate state so ChatGPT can streamline confirmation.
146153
---
147154

148155
## How to analyze and improve a specific tool
@@ -212,6 +219,43 @@ Always make improvements **manually**, based on your understanding of the proble
212219
LLMs are very likely to worsen the issue instead of fixing it.
213220

214221

215-
# References:
222+
# Tool definition patterns
223+
224+
Based on analysis of [Cursor Agent Tools v1.0](https://raw.githubusercontent.com/x1xhlol/system-prompts-and-models-of-ai-tools/refs/heads/main/Cursor%20Prompts/Agent%20Tools%20v1.0.json), [Lovable Agent Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Lovable/Agent%20Tools.json), and [Claude Code Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Claude%20Code/claude-code-tools.json):
225+
226+
## Tool description vs parameter description
227+
228+
**Tool description** should contain:
229+
- What the tool does (core functionality)
230+
- When to use it (usage context)
231+
- Key limitations (what it doesn't do)
232+
- High-level behavior (how it works conceptually)
233+
234+
**Parameter description** should contain:
235+
- Parameter-specific details (what each parameter does)
236+
- Input constraints (validation rules, formats)
237+
- Usage examples (specific examples for that parameter)
238+
- Parameter-specific guidance (how to use that specific parameter)
239+
240+
## Key patterns
241+
242+
1. **Concise but comprehensive** - Avoid overly verbose descriptions
243+
2. **Semantic clarity** - Use language that matches user intent
244+
3. **Clear separation** - Tool purpose vs parameter-specific guidance
245+
4. **Operational constraints** - State limitations and boundaries
246+
5. **Contextual guidance** - Include usage instructions where relevant
247+
248+
## References
216249

217250
- [Example of a good tool description](https://docs.claude.com/en/docs/agents-and-tools/tool-use/implement-tool-use#example-of-a-good-tool-description)
251+
- [Cursor Agent Tools v1.0](https://raw.githubusercontent.com/x1xhlol/system-prompts-and-models-of-ai-tools/refs/heads/main/Cursor%20Prompts/Agent%20Tools%20v1.0.json)
252+
- [Lovable Agent Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Lovable/Agent%20Tools.json)
253+
- [Claude Code Tools](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Claude%20Code/claude-code-tools.json)
254+
- [OpenAI optimize metadata](https://developers.openai.com/apps-sdk/guides/optimize-metadata)
255+
256+
NOTES:
257+
258+
// System prompt - instructions mainly cursor (very similar instructions in copilot)
259+
// https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Cursor%20Prompts/Agent%20Prompt%20v1.2.txt
260+
// https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/VSCode%20Agent/Prompt.txt
261+

evals/test-cases.json

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "1.1",
2+
"version": "1.3",
33
"testCases": [
44
{
55
"id": "fetch-actor-details-1",
@@ -66,15 +66,15 @@
6666
"id": "search-actors-1",
6767
"category": "search-actors",
6868
"query": "How to scrape Instagram posts",
69-
"expectedTools": ["search-actors"],
70-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram posts' or similar"
69+
"expectedTools": [],
70+
"reference": "Either it should explain how to scrape Instagram posts or call 'search-actors' tool with the query: 'Instagram posts' or similar"
7171
},
7272
{
7373
"id": "search-actors-2",
7474
"category": "search-actors",
7575
"query": "What are the best Instagram scrapers?",
7676
"expectedTools": ["search-actors"],
77-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram scraper' or similar."
77+
"reference": "It must call the 'search-actors' tool with the query: `Instagram`, 'Instagram scraper', or similar."
7878
},
7979
{
8080
"id": "search-actors-3",
@@ -196,9 +196,9 @@
196196
{
197197
"id": "search-vs-rag-1",
198198
"category": "search-actors",
199-
"query": "Find posts about AI on Instagram",
199+
"query": "Find posts about the Rock on Instagram",
200200
"expectedTools": ["search-actors"],
201-
"reference": "It must call the 'search-actors' tool with the query: 'Instagram' or 'Instagram posts' or similar. It must not use extended queries such as 'Instagram posts AI' or any more detailed variations."
201+
"reference": "It must call the 'search-actors' tool with the query: 'Instagram' or 'Instagram posts' or similar. It must not use extended queries such as 'Instagram posts the Rock' or any more detailed variations."
202202
},
203203
{
204204
"id": "search-vs-rag-2",
@@ -234,9 +234,8 @@
234234
{
235235
"id": "search-vs-rag-7",
236236
"category": "search-actors",
237-
"query": "Fetch flight details for New York to London",
238-
"expectedTools": ["search-actors"],
239-
"reference": "It must call the 'search-actors' tool with the query: 'flight data' or 'flight booking' or similar"
237+
"query": "Find one way flights from New York to London tomorrow",
238+
"expectedTools": ["search-actors"]
240239
},
241240
{
242241
"id": "search-vs-rag-8",
@@ -391,8 +390,8 @@
391390
},
392391
{
393392
"id": "misleading-query-1",
394-
"category": "search-actors",
395-
"query": "What's the weather like today?",
393+
"category": "apify-slash-rag-web-browser",
394+
"query": "What's the weather like today in San Francisco?",
396395
"expectedTools": ["apify-slash-rag-web-browser"]
397396
},
398397
{
@@ -410,7 +409,7 @@
410409
{
411410
"id": "ambiguous-query-1",
412411
"category": "search-actors",
413-
"query": "Instagram posts",
412+
"query": "Get instagram posts",
414413
"expectedTools": ["search-actors"],
415414
"reference": "It must call the 'search-actors' tool with the query: 'Instagram posts' or similar"
416415
},

0 commit comments

Comments
 (0)