From a3aefd70e7b267d7c5c66bc0a8c135ac88eebe9d Mon Sep 17 00:00:00 2001 From: Aaron Pham Date: Mon, 25 Aug 2025 14:27:52 -0400 Subject: [PATCH 1/4] Update GPT-OSS documentation with function calling --- OpenAI/GPT-OSS.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/OpenAI/GPT-OSS.md b/OpenAI/GPT-OSS.md index 49022fd..a504543 100644 --- a/OpenAI/GPT-OSS.md +++ b/OpenAI/GPT-OSS.md @@ -169,7 +169,15 @@ mcp run -t sse python_server.py:mcp vllm serve ... --tool-server ip-1:port-1,ip-2:port-2 ``` -The URLs are expected to be MCP SSE servers that implement `instructions` in server info and well documented tools. The tools will be injected into the system prompt for the model to enable them. +The URLs are expected to be MCP SSE servers that implement `instructions` in server info and well documented tools. The tools will be injected into the system prompt for the model to enable them. + +### Function calling + +vLLM supports function calling for Chat Completion API. Make sure to run your gpt-oss models with the following: + +```bash +vllm serve openai/gpt-oss-20b --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice +``` ## Accuracy Evaluation Panels @@ -265,8 +273,8 @@ Meaning: | Response API | ✅ | ✅ | ✅ | ✅ | ✅ | | Response API with Background Mode | ✅ | ✅ | ✅ | ❌ | ✅ | | Response API with Streaming | ✅ | ✅ | ❌ | ❌ | ❌ | -| Chat Completion API | ✅ | ✅ | ❌ | ❌ | ❌ | -| Chat Completion API with Streaming | ✅ | ✅ | ❌ | ❌ | ❌ | +| Chat Completion API | ✅ | ✅ | ❌ | ❌ | ✅ | +| Chat Completion API with Streaming | ✅ | ✅ | ❌ | ❌ | ✅ | If you want to use offline inference, you can treat vLLM as a token-in-token-out service and pass in tokens that are already formatted with Harmony. From e165b6baa0bf049302289ead270e04217071d9da Mon Sep 17 00:00:00 2001 From: Aaron Pham Date: Fri, 5 Sep 2025 00:47:56 -0400 Subject: [PATCH 2/4] chore: update Chen's rewording Co-authored-by: Chen Zhang --- OpenAI/GPT-OSS.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/OpenAI/GPT-OSS.md b/OpenAI/GPT-OSS.md index 85a5d93..f834442 100644 --- a/OpenAI/GPT-OSS.md +++ b/OpenAI/GPT-OSS.md @@ -173,11 +173,10 @@ The URLs are expected to be MCP SSE servers that implement `instructions` in ser ### Function calling -vLLM supports function calling for Chat Completion API. Make sure to run your gpt-oss models with the following: +vLLM also supports calling user-defined functions. Make sure to run your gpt-oss models with the following arguments. ```bash -vllm serve openai/gpt-oss-20b --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice -``` +vllm serve ... --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice ## Accuracy Evaluation Panels From d664785f155585add2f0cde73dc1816e55eda5ad Mon Sep 17 00:00:00 2001 From: Aaron Pham Date: Fri, 5 Sep 2025 00:50:06 -0400 Subject: [PATCH 3/4] Update OpenAI/GPT-OSS.md --- OpenAI/GPT-OSS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/OpenAI/GPT-OSS.md b/OpenAI/GPT-OSS.md index f834442..a9c972b 100644 --- a/OpenAI/GPT-OSS.md +++ b/OpenAI/GPT-OSS.md @@ -176,7 +176,7 @@ The URLs are expected to be MCP SSE servers that implement `instructions` in ser vLLM also supports calling user-defined functions. Make sure to run your gpt-oss models with the following arguments. ```bash -vllm serve ... --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice +vllm serve ... --tool-call-parser openai --enable-auto-tool-choice ## Accuracy Evaluation Panels From 14ed30849693560908b5a1922420fe351dacd547 Mon Sep 17 00:00:00 2001 From: Aaron Pham Date: Fri, 5 Sep 2025 00:51:58 -0400 Subject: [PATCH 4/4] Revise gpt-oss documentation for clarity and updates Updated installation and usage instructions for gpt-oss, including commands for setting up tool servers and running evaluations. --- OpenAI/GPT-OSS.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/OpenAI/GPT-OSS.md b/OpenAI/GPT-OSS.md index a9c972b..4064c43 100644 --- a/OpenAI/GPT-OSS.md +++ b/OpenAI/GPT-OSS.md @@ -153,7 +153,7 @@ One premier feature of gpt-oss is the ability to call tools directly, called "bu * By default, we integrate with the reference library's browser (with `ExaBackend`) and demo Python interpreter via docker container. In order to use the search backend, you need to get access to [exa.ai](http://exa.ai) and put `EXA_API_KEY=` as an environment variable. For Python, either have docker available, or set `PYTHON_EXECUTION_BACKEND=dangerously_use_uv` to dangerously allow execution of model generated code snippets to be executed on the same machine. Please note that `PYTHON_EXECUTION_BACKEND=dangerously_use_uv` needs `gpt-oss>=0.0.5`. -``` +```bash uv pip install gpt-oss vllm serve ... --tool-server demo @@ -162,7 +162,7 @@ vllm serve ... --tool-server demo * Please note that the default options are simply for demo purposes. For production usage, vLLM itself can act as MCP client to multiple services. Here is an [example tool server](https://github.com/openai/gpt-oss/tree/main/gpt-oss-mcp-server) that vLLM can work with, they wrap the demo tools: -``` +```bash mcp run -t sse browser_server.py:mcp mcp run -t sse python_server.py:mcp @@ -177,6 +177,7 @@ vLLM also supports calling user-defined functions. Make sure to run your gpt-oss ```bash vllm serve ... --tool-call-parser openai --enable-auto-tool-choice +``` ## Accuracy Evaluation Panels @@ -184,7 +185,7 @@ OpenAI recommends using the gpt-oss reference library to perform evaluation. First, deploy the model with vLLM: -``` +```bash # Example deployment on 8xH100 vllm serve openai/gpt-oss-120b \ --tensor_parallel_size 8 \ @@ -197,7 +198,7 @@ vllm serve openai/gpt-oss-120b \ Then, run the evaluation with gpt-oss. The following command will run all the 3 reasoning effort levels. -``` +```bash mkdir -p /tmp/gpqa_openai OPENAI_API_KEY=empty python -m gpt_oss.evals --model openai/gpt-oss-120b --eval gpqa --n-threads 128 ```