Skip to content

Commit d541dd2

Browse files
committed
Add Turkish tool-calling dataset v1 with synthetic single-turn conversations generated using ToolsGen and Qwen models
1 parent 99492b7 commit d541dd2

File tree

1 file changed

+111
-0
lines changed

1 file changed

+111
-0
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Turkish Tool Calling v1
2+
3+
A synthetic Turkish tool-calling dataset generated using [ToolsGen](https://github.com/atasoglu/toolsgen) with Qwen models via OpenRouter.
4+
5+
## Dataset Details
6+
7+
- **Generated with**: ToolsGen
8+
- **Total Samples**: 1,000
9+
- **Language**: Turkish
10+
- **Format**: Single-turn conversations with tool calls
11+
12+
### Models Used
13+
14+
- **Problem Generator**: qwen/qwen3-235b-a22b-2507 (temp=1.0)
15+
- **Tool Caller**: qwen/qwen3-235b-a22b-2507 (temp=0.0)
16+
- **Judge**: qwen/qwen3-235b-a22b-2507 (temp=0.0)
17+
18+
## Dataset Structure
19+
20+
Each record contains:
21+
22+
```json
23+
{
24+
"id": "record_000000",
25+
"language": "turkish",
26+
"tools": [...],
27+
"messages": [
28+
{"role": "user", "content": "İstanbul'da hava durumu nasıl?"}
29+
],
30+
"assistant_calls": [
31+
{
32+
"id": "call_...",
33+
"type": "function",
34+
"function": {
35+
"name": "get_weather",
36+
"arguments": "{\"location\": \"Istanbul, Turkey\"}"
37+
}
38+
}
39+
],
40+
"problem_metadata": {...},
41+
"judge": {
42+
"tool_relevance": 0.4,
43+
"argument_quality": 0.38,
44+
"clarity": 0.2,
45+
"score": 0.98,
46+
"verdict": "accept",
47+
"rationale": "...",
48+
"rubric_version": "0.1.0",
49+
"model": "qwen/qwen3-235b-a22b-2507",
50+
"temperature": 0.0
51+
},
52+
"quality_tags": [],
53+
"tools_metadata": {"num_tools": 2}
54+
}
55+
```
56+
57+
## Generation Details
58+
59+
### Configuration
60+
61+
- **Strategy**: Random tool sampling
62+
- **Tools per sample**: 1-8 (k_min=1, k_max=8)
63+
- **Max attempts**: 1
64+
- **Train split**: 80%
65+
- **Seed**: Random (1-10M range)
66+
67+
### Quality Control
68+
69+
All samples passed through an LLM-as-a-judge evaluation with a multi-dimensional rubric:
70+
71+
- **Tool Relevance** (40%): Are the selected tools appropriate?
72+
- **Argument Quality** (38%): Are arguments valid and plausible?
73+
- **Clarity** (20%): Is the response complete and clear?
74+
75+
Samples with `score >= 0.7` and `verdict == "accept"` are included.
76+
77+
## Usage
78+
79+
```python
80+
from datasets import load_dataset
81+
82+
dataset = load_dataset("atasoglu/turkish-tool-calling-v1")
83+
84+
# Access a sample
85+
sample = dataset["train"][0]
86+
print(sample["messages"])
87+
print(sample["assistant_calls"])
88+
```
89+
90+
## Limitations
91+
92+
- Single-turn conversations only
93+
- Turkish language only
94+
- Synthetic data generated by LLMs (may contain artifacts)
95+
- No actual tool execution or validation
96+
- Judge scores are model-based assessments
97+
98+
## Citation
99+
100+
```bibtex
101+
@software{toolsgen2025,
102+
title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},
103+
author = {Ataşoğlu, Ahmet},
104+
year = {2025},
105+
url = {https://github.com/atasoglu/toolsgen}
106+
}
107+
```
108+
109+
## License
110+
111+
MIT License

0 commit comments

Comments
 (0)