Skip to content

Commit fa20249

Browse files
author
Olivier Chafik
committed
Add proper tool call docs to server README
1 parent 8a44b2f commit fa20249

File tree

1 file changed

+101
-7
lines changed

1 file changed

+101
-7
lines changed

examples/server/README.md

Lines changed: 101 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1117,17 +1117,111 @@ curl http://localhost:8080/v1/chat/completions \
11171117
}'
11181118
```
11191119

1120-
... and even tool usage (needs `--jinja` flag):
1120+
*Tool call support*
11211121

1122-
```shell
1123-
llama-server --jinja -hfr lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF -hff Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf -fa
1122+
[Function calling](https://platform.openai.com/docs/guides/function-calling) is supported for all models (see https://github.com/ggerganov/llama.cpp/pull/9639):
1123+
1124+
- Needs `--jinja` flag
1125+
- Native tool call formats supported:
1126+
- Llama 3.1 / 3.3 (including builtin tools support - tool names for `wolfram_alpha`, `web_search` / `brave_search`, `code_interpreter`), Llama 3.2
1127+
- Functionary v3.1 / v3.2
1128+
- Hermes 2/3, Qwen 2.5
1129+
- Mistral Nemo
1130+
- Firefunction v2
1131+
- DeepSeek R1 (WIP / seems reluctant to call any tools?)
1132+
1133+
<details>
1134+
<summary>Show some common templates and which format handler they use</summary>
1135+
1136+
| Template | Format |
1137+
|----------|--------|
1138+
| CohereForAI-c4ai-command-r-plus-default.jinja | generic tool calls |
1139+
| CohereForAI-c4ai-command-r-plus-rag.jinja | generic tool calls |
1140+
| CohereForAI-c4ai-command-r-plus-tool_use.jinja | generic tool calls |
1141+
| MiniMaxAI-MiniMax-Text-01.jinja | generic tool calls |
1142+
| NexaAIDev-Octopus-v2.jinja | generic tool calls |
1143+
| NousResearch-Hermes-2-Pro-Llama-3-8B-default.jinja | generic tool calls |
1144+
| NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use.jinja | hermes 2 pro tool calls |
1145+
| NousResearch-Hermes-2-Pro-Mistral-7B-default.jinja | generic tool calls |
1146+
| NousResearch-Hermes-2-Pro-Mistral-7B-tool_use.jinja | hermes 2 pro tool calls |
1147+
| NousResearch-Hermes-3-Llama-3.1-70B-default.jinja | generic tool calls |
1148+
| NousResearch-Hermes-3-Llama-3.1-70B-tool_use.jinja | hermes 2 pro tool calls |
1149+
| OrionStarAI-Orion-14B-Chat.jinja | generic tool calls |
1150+
| Qwen-QwQ-32B-Preview.jinja | hermes 2 pro tool calls |
1151+
| Qwen-Qwen2-7B-Instruct.jinja | generic tool calls |
1152+
| Qwen-Qwen2-VL-7B-Instruct.jinja | generic tool calls |
1153+
| Qwen-Qwen2.5-7B-Instruct.jinja | hermes 2 pro tool calls |
1154+
| Qwen-Qwen2.5-Math-7B-Instruct.jinja | hermes 2 pro tool calls |
1155+
| TheBloke-FusionNet_34Bx2_MoE-AWQ.jinja | generic tool calls |
1156+
| abacusai-Fewshot-Metamath-OrcaVicuna-Mistral.jinja | generic tool calls |
1157+
| bofenghuang-vigogne-2-70b-chat.jinja | generic tool calls |
1158+
| databricks-dbrx-instruct.jinja | generic tool calls |
1159+
| deepseek-ai-DeepSeek-Coder-V2-Instruct.jinja | generic tool calls |
1160+
| deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja | deepseek r1 tool calls |
1161+
| deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja | deepseek r1 tool calls |
1162+
| deepseek-ai-DeepSeek-R1-Distill-Qwen-7B.jinja | deepseek r1 tool calls |
1163+
| deepseek-ai-DeepSeek-V2.5.jinja | deepseek r1 tool calls |
1164+
| deepseek-ai-deepseek-coder-33b-instruct.jinja | generic tool calls |
1165+
| google-gemma-2-2b-it.jinja | generic tool calls |
1166+
| google-gemma-7b-it.jinja | generic tool calls |
1167+
| indischepartij-MiniCPM-3B-OpenHermes-2.5-v2.jinja | generic tool calls |
1168+
| mattshumer-Reflection-Llama-3.1-70B.jinja | generic tool calls |
1169+
| meetkai-functionary-medium-v3.2.jinja | functionary v3.2 tool calls |
1170+
| meta-llama-Llama-3.1-8B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1171+
| meta-llama-Llama-3.2-3B-Instruct.jinja | llama 3.x tool calls |
1172+
| meta-llama-Llama-3.3-70B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1173+
| meta-llama-Meta-Llama-3.1-8B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1174+
| microsoft-Phi-3-medium-4k-instruct.jinja | generic tool calls |
1175+
| microsoft-Phi-3-mini-4k-instruct.jinja | generic tool calls |
1176+
| microsoft-Phi-3-small-8k-instruct.jinja | generic tool calls |
1177+
| microsoft-Phi-3.5-mini-instruct.jinja | generic tool calls |
1178+
| microsoft-Phi-3.5-vision-instruct.jinja | generic tool calls |
1179+
| mistralai-Mistral-7B-Instruct-v0.2.jinja | generic tool calls |
1180+
| mistralai-Mistral-Large-Instruct-2407.jinja | mistral nemo tool calls |
1181+
| mistralai-Mistral-Large-Instruct-2411.jinja | generic tool calls |
1182+
| mistralai-Mistral-Nemo-Instruct-2407.jinja | mistral nemo tool calls |
1183+
| mistralai-Mixtral-8x7B-Instruct-v0.1.jinja | generic tool calls |
1184+
| mlabonne-AlphaMonarch-7B.jinja | generic tool calls |
1185+
| nvidia-Llama-3.1-Nemotron-70B-Instruct-HF.jinja | llama 3.x tool calls (w/ builtin tools) |
1186+
| openchat-openchat-3.5-0106.jinja | generic tool calls |
1187+
| teknium-OpenHermes-2.5-Mistral-7B.jinja | generic tool calls |
1188+
1189+
This table can be generated with:
11241190

1125-
# https://huggingface.co/meetkai/functionary-medium-v3.2
1126-
llama-server --jinja -hfr bartowski/functionary-medium-v3.2-GGUF -hff functionary-medium-v3.2-IQ4_XS.gguf -fa
1191+
```bash
1192+
./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null
1193+
1194+
</details>
11271195

1128-
# https://huggingface.co/meetkai/functionary-medium-v3.1
1129-
llama-server --jinja -hfr meetkai/functionary-medium-v3.1-GGUF -hff functionary-medium-llama-3.1.Q4_0.gguf -fa
1196+
- Generic tool call is supported when the template isn't recognized by native format handlers (you'll see `Chat format: Generic` in the logs).
1197+
- Use `--chat-template-file` to override the template when appropriate (see examples below)
1198+
- Generic support may consume more tokens and be less efficient than a model's native format.
11301199
1200+
- Run with:
1201+
1202+
```shell
1203+
# Native support:
1204+
llama-server --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
1205+
llama-server --jinja -fa -hf bartowski/Mistral-Nemo-Instruct-2407-GGUF:Q4_K_M
1206+
llama-server --jinja -fa -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q6_K
1207+
llama-server --jinja -fa -hf bartowski/functionary-small-v3.2-GGUF:Q4_K_M
1208+
llama-server --jinja -fa -hf bartowski/Hermes-2-Pro-Llama-3-8B-GGUF:Q4_K_M \
1209+
--chat-template-file <( python scripts/get_chat_template.py NousResearch/Hermes-2-Pro-Llama-3-8B )
1210+
1211+
# Native support requires the right template for these GGUFs:
1212+
llama-server --jinja -fa -hf bartowski/Hermes-3-Llama-3.1-8B-GGUF:Q4_K_M \
1213+
--chat-template-file <( python scripts/get_chat_template.py NousResearch/Hermes-3-Llama-3.1-8B tool_use )
1214+
llama-server --jinja -fa -hf bartowski/firefunction-v2-GGUF -hff firefunction-v2-IQ1_M.gguf \
1215+
--chat-template-file <( python scripts/get_chat_template.py fireworks-ai/firellama-3-firefunction-v2 )
1216+
1217+
# Generic format support
1218+
llama-server --jinja -fa -hf bartowski/Phi-3.5-mini-instruct-GGUF:Q4_K_M
1219+
llama-server --jinja -fa -hf bartowski/gemma-2-2b-it-GGUF:Q4_K_M
1220+
```
1221+
1222+
- Test in CLI:
1223+
1224+
```bash
11311225
curl http://localhost:8080/v1/chat/completions -d '{
11321226
"model": "gpt-3.5-turbo",
11331227
"tools": [

0 commit comments

Comments
 (0)