Skip to content

Commit 8adca48

Browse files
feat: added tool_call_parser; updated sglang to 0.5.2 (#26)
1 parent ec910bd commit 8adca48

File tree

7 files changed

+51
-584
lines changed

7 files changed

+51
-584
lines changed

.runpod/hub.json

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,29 @@
2121
}
2222
},
2323
{
24-
"key": "HF_TOKEN",
24+
"key": "TOOL_CALL_PARSER",
2525
"input": {
26-
"name": "Access Token",
26+
"name": "Tool Call Parser",
2727
"type": "string",
28-
"description": "Hugging Face access token for gated & private models",
28+
"description": "Defines the parser used to interpret tool call responses",
2929
"default": "",
30-
"required": false
30+
"required": false,
31+
"advanced": true,
32+
"options": [
33+
{ "value": "llama3", "label": "llama3" },
34+
{ "value": "llama4", "label": "llama4" },
35+
{ "value": "mistral", "label": "mistral" },
36+
{ "value": "qwen25", "label": "qwen25" },
37+
{ "value": "deepseekv3", "label": "deepseekv3" }
38+
]
3139
}
3240
},
3341
{
34-
"key": "TOOL_CALL_PARSER",
42+
"key": "REASONING_PARSER",
3543
"input": {
36-
"name": "Tool Call Parser",
44+
"name": "Reasoning Parser",
3745
"type": "string",
38-
"description": "Defines the parser used to interpret tool call responses",
46+
"description": "Defines the parser used to interpret reasoning traces",
3947
"default": "",
4048
"required": false,
4149
"advanced": true,

Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
FROM lmsysorg/sglang:v0.4.6.post4-cu124
1+
FROM lmsysorg/sglang:v0.5.2-cu126
22

33
# Install uv package manager
44
RUN curl -Ls https://astral.sh/uv/install.sh | sh \
5-
&& ln -s /root/.local/bin/uv /usr/local/bin/uv
5+
&& ln -sf /root/.local/bin/uv /usr/local/bin/uv
66
ENV PATH="/root/.local/bin:${PATH}"
77

88
# Set working directory to the one already used by the base image

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,19 @@ All behaviour is controlled through environment variables:
5151
| `ENABLE_P2P_CHECK` | Enable P2P check for GPU access | false | boolean (true or false) |
5252
| `ENABLE_FLASHINFER_MLA` | Enable FlashInfer MLA optimization | false | boolean (true or false) |
5353
| `TRITON_ATTENTION_REDUCE_IN_FP32` | Cast Triton attention reduce op to FP32 | false | boolean (true or false) |
54-
| `TOOL_CALL_PARSER` | Defines the parser used to interpret responses | qwen25 | "llama3", "llama4", "mistral", "qwen25", "deepseekv3" |
54+
| `TOOL_CALL_PARSER` | Defines the parser used to interpret responses | | "llama3", "llama4", "mistral", "qwen25", "deepseekv3" |
55+
| `REASONING_PARSER` | Defines the parser used for reasoning traces | | "llama3", "llama4", "mistral", "qwen25", "deepseekv3" |
56+
57+
## Tool/Function Calling and Reasoning
58+
59+
- **Tool/Function calling**: Set the `TOOL_CALL_PARSER` environment variable to match your model family. Supported values: `llama3`, `llama4`, `mistral`, `qwen25`, `deepseekv3`. If unset, this worker does not pass `--tool-call-parser` to SGLang.
60+
61+
- Example (docker-compose): add `TOOL_CALL_PARSER=llama3` under `environment:`.
62+
- Example (RunPod Hub): set the `TOOL_CALL_PARSER` env var in the UI.
63+
64+
- **Reasoning**: Set the `REASONING_PARSER` environment variable to match your model family if you want to enable reasoning traces parsing. If unset, this worker does not pass `--reasoning-parser` to SGLang.
65+
- Example (docker-compose): add `# REASONING_PARSER=llama3` under `environment:` (uncomment to use).
66+
- Example (RunPod Hub): set the `REASONING_PARSER` env var in the UI.
5567

5668
## API Usage
5769

docker-compose.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ services:
1919
- ATTENTION_BACKEND=flashinfer
2020
- SAMPLING_BACKEND=flashinfer
2121
- TOOL_CALL_PARSER=llama3
22+
# - REASONING_PARSER=llama3
2223
- HF_TOKEN=${HF_TOKEN}
2324

2425
# make it work locally with <= 8 GB VRAM

docs/conventions.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,21 @@ chore(deps): update requirements.txt
6161
- Test changes before committing
6262
- Write descriptive commit messages
6363
- Keep commits focused and atomic
64+
65+
## Configuration Conventions
66+
67+
- Single source of truth: use `.runpod/hub.json` for endpoint configuration.
68+
69+
- Define environment variables, UI options, and allowed CUDA versions here.
70+
- Do not add or rely on `worker-config.json` (removed).
71+
72+
- CUDA policy:
73+
74+
- Minimum supported CUDA is 12.6.
75+
- Base images must match this (e.g., `lmsysorg/sglang:vX.Y.Z-cu126`).
76+
- Keep `allowedCudaVersions` in `hub.json` at 12.6 or higher.
77+
78+
- Tool/function calling and reasoning:
79+
- `TOOL_CALL_PARSER`: required to enable tool/function calling; no runtime default is applied. If unset, `--tool-call-parser` is not passed to SGLang.
80+
- `REASONING_PARSER`: required to enable reasoning trace parsing; no runtime default is applied. If unset, `--reasoning-parser` is not passed to SGLang.
81+
- Choose a parser matching the model family (e.g., `llama3`, `llama4`, `mistral`, `qwen25`, `deepseekv3`).

engine.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,8 @@ def start_server(self):
6060
"LOAD_BALANCE_METHOD": "--load-balance-method",
6161
"ATTENTION_BACKEND": "--attention-backend",
6262
"SAMPLING_BACKEND": "--sampling-backend",
63-
"TOOL_CALL_PARSER": "--tool-call-parser"
63+
"TOOL_CALL_PARSER": "--tool-call-parser",
64+
"REASONING_PARSER": "--reasoning-parser",
6465
}
6566

6667
# Boolean flags

0 commit comments

Comments
 (0)