Skip to content

Conversation

@alecsolder
Copy link
Owner

@alecsolder alecsolder commented Oct 31, 2025

Run server with

CUDA_VISIBLE_DEVICES=0 HF_HUB_OFFLINE=1 vllm serve openai/gpt-oss-20b --tool-server=localhost:8081/browser,localhost:8081/python --structured_outputs_config='{"enable_in_reasoning": true, "reasoning_parser": "openai_gptoss"}'

Then you can do stuff like this

curl -X POST http://localhost:8000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Search for vLLM performance.",
"tools": [{"type": "web_search_preview"}]
}'
curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Look up the weather roughly in san francisco.",
    "enable_response_messages": true,
    "tools": [{
      "type": "function",
      "name": "get_weatherrrrr",
      "description": "Get current temperature for provided coordinates in celsius.",
      "parameters": {
        "type": "object",
        "properties": {
          "latitude": {"type": "number"},
          "longitude": {"type": "number"}
        },
        "required": ["latitude", "longitude"],
        "additionalProperties": false
      },
      "strict": true
    }]
  }'

And the chat format will be guided to ensure correct headers for all tools in the request specifically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants