@@ -1117,17 +1117,111 @@ curl http://localhost:8080/v1/chat/completions \
11171117}'
11181118```
11191119
1120- ... and even tool usage (needs ` --jinja ` flag):
1120+ * Tool call support *
11211121
1122- ``` shell
1123- llama-server --jinja -hfr lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF -hff Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf -fa
1122+ [ Function calling] ( https://platform.openai.com/docs/guides/function-calling ) is supported for all models (see https://github.com/ggerganov/llama.cpp/pull/9639 ):
1123+
1124+ - Needs ` --jinja ` flag
1125+ - Native tool call formats supported:
1126+ - Llama 3.1 / 3.3 (including builtin tools support - tool names for ` wolfram_alpha ` , ` web_search ` / ` brave_search ` , ` code_interpreter ` ), Llama 3.2
1127+ - Functionary v3.1 / v3.2
1128+ - Hermes 2/3, Qwen 2.5
1129+ - Mistral Nemo
1130+ - Firefunction v2
1131+ - DeepSeek R1 (WIP / seems reluctant to call any tools?)
1132+
1133+ <details >
1134+ <summary >Show some common templates and which format handler they use</summary >
1135+
1136+ | Template | Format |
1137+ | ----------| --------|
1138+ | CohereForAI-c4ai-command-r-plus-default.jinja | generic tool calls |
1139+ | CohereForAI-c4ai-command-r-plus-rag.jinja | generic tool calls |
1140+ | CohereForAI-c4ai-command-r-plus-tool_use.jinja | generic tool calls |
1141+ | MiniMaxAI-MiniMax-Text-01.jinja | generic tool calls |
1142+ | NexaAIDev-Octopus-v2.jinja | generic tool calls |
1143+ | NousResearch-Hermes-2-Pro-Llama-3-8B-default.jinja | generic tool calls |
1144+ | NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use.jinja | hermes 2 pro tool calls |
1145+ | NousResearch-Hermes-2-Pro-Mistral-7B-default.jinja | generic tool calls |
1146+ | NousResearch-Hermes-2-Pro-Mistral-7B-tool_use.jinja | hermes 2 pro tool calls |
1147+ | NousResearch-Hermes-3-Llama-3.1-70B-default.jinja | generic tool calls |
1148+ | NousResearch-Hermes-3-Llama-3.1-70B-tool_use.jinja | hermes 2 pro tool calls |
1149+ | OrionStarAI-Orion-14B-Chat.jinja | generic tool calls |
1150+ | Qwen-QwQ-32B-Preview.jinja | hermes 2 pro tool calls |
1151+ | Qwen-Qwen2-7B-Instruct.jinja | generic tool calls |
1152+ | Qwen-Qwen2-VL-7B-Instruct.jinja | generic tool calls |
1153+ | Qwen-Qwen2.5-7B-Instruct.jinja | hermes 2 pro tool calls |
1154+ | Qwen-Qwen2.5-Math-7B-Instruct.jinja | hermes 2 pro tool calls |
1155+ | TheBloke-FusionNet_34Bx2_MoE-AWQ.jinja | generic tool calls |
1156+ | abacusai-Fewshot-Metamath-OrcaVicuna-Mistral.jinja | generic tool calls |
1157+ | bofenghuang-vigogne-2-70b-chat.jinja | generic tool calls |
1158+ | databricks-dbrx-instruct.jinja | generic tool calls |
1159+ | deepseek-ai-DeepSeek-Coder-V2-Instruct.jinja | generic tool calls |
1160+ | deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja | deepseek r1 tool calls |
1161+ | deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja | deepseek r1 tool calls |
1162+ | deepseek-ai-DeepSeek-R1-Distill-Qwen-7B.jinja | deepseek r1 tool calls |
1163+ | deepseek-ai-DeepSeek-V2.5.jinja | deepseek r1 tool calls |
1164+ | deepseek-ai-deepseek-coder-33b-instruct.jinja | generic tool calls |
1165+ | google-gemma-2-2b-it.jinja | generic tool calls |
1166+ | google-gemma-7b-it.jinja | generic tool calls |
1167+ | indischepartij-MiniCPM-3B-OpenHermes-2.5-v2.jinja | generic tool calls |
1168+ | mattshumer-Reflection-Llama-3.1-70B.jinja | generic tool calls |
1169+ | meetkai-functionary-medium-v3.2.jinja | functionary v3.2 tool calls |
1170+ | meta-llama-Llama-3.1-8B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1171+ | meta-llama-Llama-3.2-3B-Instruct.jinja | llama 3.x tool calls |
1172+ | meta-llama-Llama-3.3-70B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1173+ | meta-llama-Meta-Llama-3.1-8B-Instruct.jinja | llama 3.x tool calls (w/ builtin tools) |
1174+ | microsoft-Phi-3-medium-4k-instruct.jinja | generic tool calls |
1175+ | microsoft-Phi-3-mini-4k-instruct.jinja | generic tool calls |
1176+ | microsoft-Phi-3-small-8k-instruct.jinja | generic tool calls |
1177+ | microsoft-Phi-3.5-mini-instruct.jinja | generic tool calls |
1178+ | microsoft-Phi-3.5-vision-instruct.jinja | generic tool calls |
1179+ | mistralai-Mistral-7B-Instruct-v0.2.jinja | generic tool calls |
1180+ | mistralai-Mistral-Large-Instruct-2407.jinja | mistral nemo tool calls |
1181+ | mistralai-Mistral-Large-Instruct-2411.jinja | generic tool calls |
1182+ | mistralai-Mistral-Nemo-Instruct-2407.jinja | mistral nemo tool calls |
1183+ | mistralai-Mixtral-8x7B-Instruct-v0.1.jinja | generic tool calls |
1184+ | mlabonne-AlphaMonarch-7B.jinja | generic tool calls |
1185+ | nvidia-Llama-3.1-Nemotron-70B-Instruct-HF.jinja | llama 3.x tool calls (w/ builtin tools) |
1186+ | openchat-openchat-3.5-0106.jinja | generic tool calls |
1187+ | teknium-OpenHermes-2.5-Mistral-7B.jinja | generic tool calls |
1188+
1189+ This table can be generated with:
11241190
1125- # https://huggingface.co/meetkai/functionary-medium-v3.2
1126- llama-server --jinja -hfr bartowski/functionary-medium-v3.2-GGUF -hff functionary-medium-v3.2-IQ4_XS.gguf -fa
1191+ ``` bash
1192+ ./build/bin/test-chat ../minja/build/tests/* .jinja 2> /dev/null
1193+
1194+ < /details>
11271195
1128- # https://huggingface.co/meetkai/functionary-medium-v3.1
1129- llama-server --jinja -hfr meetkai/functionary-medium-v3.1-GGUF -hff functionary-medium-llama-3.1.Q4_0.gguf -fa
1196+ - Generic tool call is supported when the template isn' t recognized by native format handlers (you' ll see ` Chat format: Generic` in the logs).
1197+ - Use ` --chat-template-file` to override the template when appropriate (see examples below)
1198+ - Generic support may consume more tokens and be less efficient than a model' s native format.
11301199
1200+ - Run with:
1201+
1202+ ```shell
1203+ # Native support:
1204+ llama-server --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
1205+ llama-server --jinja -fa -hf bartowski/Mistral-Nemo-Instruct-2407-GGUF:Q4_K_M
1206+ llama-server --jinja -fa -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q6_K
1207+ llama-server --jinja -fa -hf bartowski/functionary-small-v3.2-GGUF:Q4_K_M
1208+ llama-server --jinja -fa -hf bartowski/Hermes-2-Pro-Llama-3-8B-GGUF:Q4_K_M \
1209+ --chat-template-file <( python scripts/get_chat_template.py NousResearch/Hermes-2-Pro-Llama-3-8B )
1210+
1211+ # Native support requires the right template for these GGUFs:
1212+ llama-server --jinja -fa -hf bartowski/Hermes-3-Llama-3.1-8B-GGUF:Q4_K_M \
1213+ --chat-template-file <( python scripts/get_chat_template.py NousResearch/Hermes-3-Llama-3.1-8B tool_use )
1214+ llama-server --jinja -fa -hf bartowski/firefunction-v2-GGUF -hff firefunction-v2-IQ1_M.gguf \
1215+ --chat-template-file <( python scripts/get_chat_template.py fireworks-ai/firellama-3-firefunction-v2 )
1216+
1217+ # Generic format support
1218+ llama-server --jinja -fa -hf bartowski/Phi-3.5-mini-instruct-GGUF:Q4_K_M
1219+ llama-server --jinja -fa -hf bartowski/gemma-2-2b-it-GGUF:Q4_K_M
1220+ ```
1221+
1222+ - Test in CLI:
1223+
1224+ ```bash
11311225 curl http://localhost:8080/v1/chat/completions -d ' {
11321226 " model" : " gpt-3.5-turbo" ,
11331227 " tools" : [
0 commit comments