|
| 1 | +# Turkish Tool Calling v1 |
| 2 | + |
| 3 | +A synthetic Turkish tool-calling dataset generated using [ToolsGen](https://github.com/atasoglu/toolsgen) with Qwen models via OpenRouter. |
| 4 | + |
| 5 | +## Dataset Details |
| 6 | + |
| 7 | +- **Generated with**: ToolsGen |
| 8 | +- **Total Samples**: 1,000 |
| 9 | +- **Language**: Turkish |
| 10 | +- **Format**: Single-turn conversations with tool calls |
| 11 | + |
| 12 | +### Models Used |
| 13 | + |
| 14 | +- **Problem Generator**: qwen/qwen3-235b-a22b-2507 (temp=1.0) |
| 15 | +- **Tool Caller**: qwen/qwen3-235b-a22b-2507 (temp=0.0) |
| 16 | +- **Judge**: qwen/qwen3-235b-a22b-2507 (temp=0.0) |
| 17 | + |
| 18 | +## Dataset Structure |
| 19 | + |
| 20 | +Each record contains: |
| 21 | + |
| 22 | +```json |
| 23 | +{ |
| 24 | + "id": "record_000000", |
| 25 | + "language": "turkish", |
| 26 | + "tools": [...], |
| 27 | + "messages": [ |
| 28 | + {"role": "user", "content": "İstanbul'da hava durumu nasıl?"} |
| 29 | + ], |
| 30 | + "assistant_calls": [ |
| 31 | + { |
| 32 | + "id": "call_...", |
| 33 | + "type": "function", |
| 34 | + "function": { |
| 35 | + "name": "get_weather", |
| 36 | + "arguments": "{\"location\": \"Istanbul, Turkey\"}" |
| 37 | + } |
| 38 | + } |
| 39 | + ], |
| 40 | + "problem_metadata": {...}, |
| 41 | + "judge": { |
| 42 | + "tool_relevance": 0.4, |
| 43 | + "argument_quality": 0.38, |
| 44 | + "clarity": 0.2, |
| 45 | + "score": 0.98, |
| 46 | + "verdict": "accept", |
| 47 | + "rationale": "...", |
| 48 | + "rubric_version": "0.1.0", |
| 49 | + "model": "qwen/qwen3-235b-a22b-2507", |
| 50 | + "temperature": 0.0 |
| 51 | + }, |
| 52 | + "quality_tags": [], |
| 53 | + "tools_metadata": {"num_tools": 2} |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +## Generation Details |
| 58 | + |
| 59 | +### Configuration |
| 60 | + |
| 61 | +- **Strategy**: Random tool sampling |
| 62 | +- **Tools per sample**: 1-8 (k_min=1, k_max=8) |
| 63 | +- **Max attempts**: 1 |
| 64 | +- **Train split**: 80% |
| 65 | +- **Seed**: Random (1-10M range) |
| 66 | + |
| 67 | +### Quality Control |
| 68 | + |
| 69 | +All samples passed through an LLM-as-a-judge evaluation with a multi-dimensional rubric: |
| 70 | + |
| 71 | +- **Tool Relevance** (40%): Are the selected tools appropriate? |
| 72 | +- **Argument Quality** (38%): Are arguments valid and plausible? |
| 73 | +- **Clarity** (20%): Is the response complete and clear? |
| 74 | + |
| 75 | +Samples with `score >= 0.7` and `verdict == "accept"` are included. |
| 76 | + |
| 77 | +## Usage |
| 78 | + |
| 79 | +```python |
| 80 | +from datasets import load_dataset |
| 81 | + |
| 82 | +dataset = load_dataset("atasoglu/turkish-tool-calling-v1") |
| 83 | + |
| 84 | +# Access a sample |
| 85 | +sample = dataset["train"][0] |
| 86 | +print(sample["messages"]) |
| 87 | +print(sample["assistant_calls"]) |
| 88 | +``` |
| 89 | + |
| 90 | +## Limitations |
| 91 | + |
| 92 | +- Single-turn conversations only |
| 93 | +- Turkish language only |
| 94 | +- Synthetic data generated by LLMs (may contain artifacts) |
| 95 | +- No actual tool execution or validation |
| 96 | +- Judge scores are model-based assessments |
| 97 | + |
| 98 | +## Citation |
| 99 | + |
| 100 | +```bibtex |
| 101 | +@software{toolsgen2025, |
| 102 | + title = {ToolsGen: Synthetic Tool-Calling Dataset Generator}, |
| 103 | + author = {Ataşoğlu, Ahmet}, |
| 104 | + year = {2025}, |
| 105 | + url = {https://github.com/atasoglu/toolsgen} |
| 106 | +} |
| 107 | +``` |
| 108 | + |
| 109 | +## License |
| 110 | + |
| 111 | +MIT License |
0 commit comments