Skip to content

Commit a2e9153

Browse files
authored
[None][doc] Add K2 tool calling examples (NVIDIA#6667)
Signed-off-by: Lanyu Liao <[email protected]> Co-authored-by: Lanyu Liao <[email protected]>
1 parent 83dbc6c commit a2e9153

File tree

2 files changed

+328
-0
lines changed

2 files changed

+328
-0
lines changed
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# K2 (Kimi-K2-Instruct)
2+
3+
## Overview
4+
5+
Kimi K2 is Moonshot AI's Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. It achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models. Notably, K2 also excels in agentic capabilities, demonstrating outstanding performance across complex, multi-step tasks.
6+
7+
## Prerequisites for Tool Calling in Kimi-K2
8+
9+
K2 model supports tool calling functionality. The official guide can be found at: [tool_call_guidance](https://huggingface.co/moonshotai/Kimi-K2-Instruct/blob/main/docs/tool_call_guidance.md)
10+
11+
As described in the official guide, a tool calling process in Kimi-K2 includes:
12+
1. Passing function descriptions to Kimi-K2.
13+
2. Kimi-K2 decides to make a function call and returns the necessary information for the function call to the user.
14+
3. The user performs the function call, collects the call results, and passes the function call results to Kimi-K2
15+
4. Kimi-K2 continues to generate content based on the function call results until the model believes it has obtained sufficient information to respond to the user
16+
17+
Tools are the primary way to define callable functions for K2. Each tool requires:
18+
- A unique name
19+
- A clear description
20+
- A JSON schema defining the expected parameters
21+
22+
A possible example of tool description(you may refer to [Using tools](https://huggingface.co/docs/hugs/guides/function-calling) for more information) is as follows:
23+
```python
24+
# Collect the tool descriptions in tools
25+
tools = [{
26+
"type": "function",
27+
"function": {
28+
"name": "get_weather",
29+
"description": "Get weather information. Call this tool when the user needs to get weather information",
30+
"parameters": {
31+
"type": "object",
32+
"required": ["location"],
33+
"properties": {
34+
"location": {
35+
"type": "string",
36+
"description": "location name",
37+
}
38+
}
39+
}
40+
}
41+
}]
42+
```
43+
44+
Kimi currently supports two main approaches for tool calling:
45+
1. *Use openai.OpenAI to send messages to Kimi-K2 together with tool descriptions.*
46+
In this mode, the descriptions of the tools are passed as an argument to `client.chat.completions.create`, and the tool-call details can be read directly from the corresponding fields in the response.
47+
2. *Manually parse the tool-call requests from the outputs generated by Kimi-K2.*
48+
The tool call requests generated by Kimi-K2 are wrapped by <|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.{func_name}:{idx}, from which we can parse the function name.
49+
50+
**Note that TensorRT-LLM does not support the first approach for now. If you deploy K2 with TensorRT-LLM, you need to manually parse the tool-call requests from the outputs.**
51+
52+
The next section is an example that deploys the K2 model using TensorRT-LLM and then manually parses the tool-call results.
53+
54+
## Example: Manually Parsing Tool-Call Requests from Kimi-K2 Outputs
55+
56+
First, launch a server using trtllm-serve:
57+
58+
```bash
59+
cat > ./extra_llm_api_options.yaml <<EOF
60+
# define your extra parameters here
61+
cuda_graph_config:
62+
batch_sizes:
63+
- 1
64+
- 4
65+
enable_attention_dp: False
66+
EOF
67+
68+
trtllm-serve \
69+
--model /path_to_model/Kimi-K2-Instruct/ \
70+
--backend pytorch \
71+
--tp_size 8 \
72+
--ep_size 8 \
73+
--extra_llm_api_options extra_llm_api_options.yaml
74+
```
75+
76+
Run the script [kimi_k2_tool_calling_example.py](./kimi_k2_tool_calling_example.py), which performs the following steps:
77+
78+
1. The client provides tool definitions and a user prompt to the LLM server.
79+
2. Instead of answering the prompt directly, the LLM server responds with a selected tool and corresponding arguments based on the user prompt.
80+
3. The client calls the selected tool with the arguments and retrieves the results.
81+
82+
For example, you can query "What's the weather like in shanghai today?" with the following command:
83+
84+
```bash
85+
python kimi_k2_tool_calling_example.py \
86+
--model "moonshotai/Kimi-K2-Instruct" \
87+
--prompt "What's the weather like in shanghai today?"
88+
```
89+
90+
The output would look similar to:
91+
92+
```txt
93+
[The original output from Kimi-K2]: <|tool_calls_section_begin|>
94+
<|tool_call_begin|>functions.get_weather:0<|tool_call_argument_begin|>{"location": "shanghai"}<|tool_call_end|>
95+
<|tool_calls_section_end|>user
96+
97+
[The tool-call requests parsed from the output]: [{'id': 'functions.get_weather:0', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"location": "shanghai"}'}}]
98+
99+
[Tool call result]: tool_name=get_weather, tool_result=Cloudy
100+
```
101+
102+
The tool call works successfully:
103+
- In `[The original output from Kimi-K2]`, the LLM selects the correct tool `get_weather` and provides the appropriate arguments.
104+
- In `[The tool-call requests parsed from the output]`, the client parses the LLM response.
105+
- In `[Tool call result]`, the client executes the tool function and get the result.
106+
107+
Let's try another query, "What's the weather like in beijing today?", using a predefined system prompt to specify the output format as shown below.
108+
109+
```bash
110+
python kimi_k2_tool_calling_example.py \
111+
--model "moonshotai/Kimi-K2-Instruct" \
112+
--prompt "What's the weather like in beijing today?"
113+
--specify_output_format
114+
```
115+
116+
The output would look like:
117+
118+
```txt
119+
[The original output from Kimi-K2]: [get_weather(location='beijing')]user
120+
121+
[The tool-call requests parsed from the output]: [{'type': 'function', 'function': {'name': 'get_weather', 'arguments': {'location': 'beijing'}}}]
122+
123+
[Tool call result]: tool_name=get_weather, tool_result=Sunny
124+
```
125+
Once again, the tool call works successfully and the original output from Kimi-K2 is formatted.
126+
127+
**Note that, without guided decoding or other deterministic tool adapters, K2 sometimes deviates from the specified output format. Because TensorRT-LLM does not support K2 with guided decoding for now, you have to parse the tool calls carefully from the raw model output to ensure they meet the required format.**
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
#
2+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
import argparse
19+
import ast
20+
import json
21+
import re
22+
23+
from openai import OpenAI
24+
25+
SPECIFY_OUTPUT_FORMAT_PROMPT = """You are an AI assistant with the role name "assistant." \
26+
Based on the provided API specifications and conversation history from steps 1 to t, \
27+
generate the API requests that the assistant should call in step t+1. \
28+
The API requests should be output in the format [api_name(key1='value1', key2='value2', ...)], \
29+
replacing api_name with the actual API name, key1, key2, etc., with the actual parameter names, \
30+
and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
31+
If there are multiple API requests, separate them with commas, for example: \
32+
[api_name(key1='value1', key2='value2', ...), api_name(key1='value1', key2='value2', ...), ...]. \
33+
Do not include any other explanations, prompts, or API call results in the output.
34+
If the API parameter description does not specify otherwise, the parameter is optional \
35+
(parameters mentioned in the user input need to be included in the output; if not mentioned, they do not need to be included).
36+
If the API parameter description does not specify the required format for the value, use the user's original text for the parameter value. \
37+
If the API requires no parameters, output the API request directly in the format [api_name()], and do not invent any nonexistent parameter names.
38+
39+
API Specifications:
40+
{tools}"""
41+
42+
NOT_SPECIFY_OUTPUT_FORMAT_PROMPT = """Important: Only give the tool call requests, \
43+
do not include any other explanations, prompts, or API call results in the output.
44+
The tool call requests generated by you are wrapped by \
45+
<|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. \
46+
The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.func_name:idx, \
47+
from which we can parse the function name.
48+
49+
API Specifications:
50+
{tools}"""
51+
52+
53+
def get_weather(location: str):
54+
if location.lower() == "beijing":
55+
return "Sunny"
56+
elif location.lower() == "shanghai":
57+
return "Cloudy"
58+
else:
59+
return "Rainy"
60+
61+
62+
# Tool name->object mapping for easy calling later
63+
tool_map = {"get_weather": get_weather}
64+
65+
66+
# ref: https://huggingface.co/moonshotai/Kimi-K2-Instruct/blob/main/docs/tool_call_guidance.md
67+
def extract_tool_call_info(tool_call_rsp: str):
68+
if '<|tool_calls_section_begin|>' not in tool_call_rsp:
69+
# No tool calls
70+
return []
71+
pattern = r"<\|tool_calls_section_begin\|>(.*?)<\|tool_calls_section_end\|>"
72+
73+
tool_calls_sections = re.findall(pattern, tool_call_rsp, re.DOTALL)
74+
75+
# Extract multiple tool calls
76+
func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
77+
tool_calls = []
78+
for match in re.findall(func_call_pattern, tool_calls_sections[0],
79+
re.DOTALL):
80+
function_id, function_args = match
81+
# function_id: functions.get_weather:0
82+
function_name = function_id.split('.')[1].split(':')[0]
83+
tool_calls.append({
84+
"id": function_id,
85+
"type": "function",
86+
"function": {
87+
"name": function_name,
88+
"arguments": function_args
89+
}
90+
})
91+
return tool_calls
92+
93+
94+
def parse_specified_format_tool_calls(text: str):
95+
pattern = re.compile(r'(\w+)\s*\(([^)]*)\)')
96+
tool_calls = []
97+
98+
for m in pattern.finditer(text):
99+
api_name, kv_body = m.group(1), m.group(2)
100+
101+
kv_pattern = re.compile(r'(\w+)\s*=\s*([^,]+)')
102+
kwargs = {}
103+
for k, v in kv_pattern.findall(kv_body):
104+
try:
105+
kwargs[k] = ast.literal_eval(v.strip())
106+
except Exception:
107+
kwargs[k] = v.strip()
108+
109+
tool_calls.append({
110+
"type": "function",
111+
"function": {
112+
"name": api_name,
113+
"arguments": kwargs
114+
}
115+
})
116+
117+
return tool_calls
118+
119+
120+
def get_tools():
121+
# Collect the tool descriptions in tools
122+
return [{
123+
"type": "function",
124+
"function": {
125+
"name": "get_weather",
126+
"description":
127+
"Get weather information. Call this tool when the user needs to get weather information",
128+
"parameters": {
129+
"type": "object",
130+
"required": ["location"],
131+
"properties": {
132+
"location": {
133+
"type": "string",
134+
"description": "Location name",
135+
}
136+
}
137+
}
138+
}
139+
}]
140+
141+
142+
def get_tool_call_requests(args, client):
143+
model = args.model
144+
tools = get_tools()
145+
system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
146+
tools=tools)
147+
messages = [{
148+
"role": "system",
149+
"content": system_prompt
150+
}, {
151+
"role": "user",
152+
"content": args.prompt
153+
}]
154+
155+
response = client.chat.completions.create(model=model,
156+
messages=messages,
157+
max_tokens=256,
158+
temperature=0.0)
159+
160+
output = response.choices[0].message.content
161+
tool_calls = parse_specified_format_tool_calls(
162+
output) if args.specify_output_format else extract_tool_call_info(
163+
output)
164+
print(f"[The original output from Kimi-K2]: {output}\n")
165+
print(f"[The tool-call requests parsed from the output]: {tool_calls}\n")
166+
return tool_calls, messages
167+
168+
169+
if __name__ == "__main__":
170+
parser = argparse.ArgumentParser()
171+
parser.add_argument("--model",
172+
type=str,
173+
default="moonshotai/Kimi-K2-Instruct")
174+
parser.add_argument("--prompt",
175+
type=str,
176+
default="What's the weather like in Shanghai today?")
177+
parser.add_argument("--specify_output_format",
178+
action="store_true",
179+
default=False)
180+
181+
args = parser.parse_args()
182+
183+
# start trt-llm server before running this script
184+
client = OpenAI(
185+
api_key="tensorrt_llm",
186+
base_url="http://localhost:8000/v1",
187+
)
188+
189+
tool_calls, messages = get_tool_call_requests(args, client)
190+
191+
for tool_call in tool_calls:
192+
tool_name = tool_call['function']['name']
193+
if args.specify_output_format:
194+
tool_arguments = tool_call['function']['arguments']
195+
else:
196+
tool_arguments = json.loads(tool_call['function']['arguments'])
197+
tool_function = tool_map[tool_name]
198+
tool_result = tool_function(**tool_arguments)
199+
print(
200+
f"[Tool call result]: tool_name={tool_name}, tool_result={tool_result}\n"
201+
)

0 commit comments

Comments
 (0)