Tool Calls not working using VLLM #478

MarkoBL · 2025-07-24T13:07:00Z

MarkoBL
Jul 24, 2025

Hi, I'm using this library together with vllm, which has an OpenAI-compatible server together with Mistral Small. Until now, everything worked great. Today, I tried tool calling on the ChatEndpoint, but instead of the tools being called, I got the following text output:

Input:
What's the weather in Chicago and London?
Output:

[TOOL_CALLS]GetCurrentWeatherAsync_fb41a1e1cb8414e6de6f649aadbbf3f0{"location": "Chicago", "unit": "Celsius"}
[TOOL_CALLS]GetCurrentWeatherAsync_fb41a1e1cb8414e6de6f649aadbbf3f0{"location": "London", "unit": "Celsius"}

I know, this is probably out of scope for this library. Maybe you can help me to identify the problem.

To Reproduce

This is the code I'm using. I'm using StreamCompletionAsync.

            var tools = new List<Tool>
            {
                Tool.GetOrCreateTool(typeof(WeatherService), nameof(WeatherService.GetCurrentWeatherAsync))
            };

            var chatRequest = new ChatRequest(_messages, tools, "auto", Model.Id);

            var sequence = 0L;
            var response = await Api.ChatEndpoint.StreamCompletionAsync(chatRequest, async chatStreamingResponse =>
            {
                if (Streaming != null)
                {
                    var delta = chatStreamingResponse.FirstChoice?.Delta?.Content;
                    if (delta != null && delta.Length > 0)
                    {
                        await Streaming.Invoke(sequence, delta, streamingData, this);
                        ++sequence;
                    }
                }
            }, false, _tokenSource.Token);

Expected behavior

The tools should be called.

Answered by MarkoBL

Jul 29, 2025

Is there a simple docker compose script I can run to setup easily to install the additional packages and patch the files?

Sorry, I'm not using docker.

Anyway, with this pending PR: vllm-project/vllm#19425, everything is working now. It's a rework of the tool parsing for Mistral models and now it's compatible with this library 👍 Sorry for the hassle.

View full answer

StephenHodgson · 2025-07-24T13:18:02Z

StephenHodgson
Jul 24, 2025
Maintainer

Not a bug since I don't officially support VLLM, but I do use it. Xfering this to Discussions.

I do use VLLM, but have not attempted to do function calling.
It will be heavily dependent on the model you use and if it supports function calling.

0 replies

MarkoBL · 2025-07-25T06:44:26Z

MarkoBL
Jul 25, 2025
Author

Hi and thanks for your answer. And thanks for all the work you put in this library, it's awesome :) I can understand that you don't want to support anything else except the official OpenAI server.

Anyway, I have no problem to work around this limitation. I wrote some test code and it's working fine. Basically, I check if the response starts with [TOOLS_CALL] and handle it differently. Here's the current code I'm using:

            if (isToolCalls)
            {
                string content = choice.Message.Content;
                var toolCalls = content.Split("[TOOL_CALLS]", StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries);
                foreach (var toolCall in toolCalls)
                {
                    var jsonIdx = toolCall.IndexOf('{');
                    var call = toolCall.Substring(0, jsonIdx);
                    var callParts = call.Split('_');

                    var json = toolCall.Substring(jsonIdx + 1);
                    var toolFunctionName = callParts[0];
                    var toolCallId = callParts[1];

                    var r = new Random();
                    var a = r.Next(10, 40);
                    var b = r.Next(40, 80);

                    _messages.Add(new Message(toolCallId, toolFunctionName, new List<Content> { $"The Temperator is between {a} and {b} degree." }));
                }

                return await SendMessage(null, null, streamingData);
            }

You see, I have the json string, the toolFunctionName and the toolCallId. I'm just returning a static tool call message instead of calling the tool directly. And this code works. And now, here's my question:

The code to get the function and call it with the right parameters from the json array is probably already somewhere in the code. Is there a way to use it? Something like: var message = await Tools.CallToolWithParameters(toolFunctionName, toolCallId, json) or something? That would help a lot.

7 replies

MarkoBL Jul 28, 2025
Author

Can you send me some sample requests and responses in Json form?
I'm curious how difficult it would be to add support for it.

No problem, can I enable some sort of debug ouput for the requests?

I'm also curious a about the setup for the model.
Could you also send me this information as well. Just wanna play around with it to test the feasibility.

I'm currently testing with Mistral Small 3.2 24B (FP8). With a low token limit, it fits on a 5090. We are already using the unsloth 4Bit version for document analysis (translation, data extraction and other stuff), but tool calling is broken with this version.

How to run it:

You need to install the additional packages:

pip install flashinfer-python
pip install flash-attn --no-build-isolation

Apply the following two patches (I just edited the two python files locally):
vllm-project/vllm#21154
vllm-project/vllm#21167

And then you can run it with:

vllm serve unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --gpu-memory-utilization=0.85 --max-model-len 8000

You need at least 32 GB VRAM, if you have more, you can increase the model length.

StephenHodgson Jul 28, 2025
Maintainer

No problem, can I enable some sort of debug ouput for the requests?

yes:

var client = new OpenAIClient();
client.EnableDebug = true;

Is there a simple docker compose script I can run to setup easily to install the additional packages and patch the files?

for example this is how I setup vllm most of the time:

docker run -d --runtime nvidia --gpus all -v //d/LLM/cache:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=<hf_API_KEY>" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model <model_name> --max-model-len 16384 --enforce-eager

MarkoBL Jul 29, 2025
Author

yes

When I enable debugging, the library crashes:

The stream was already consumed. It cannot be read again.
System.InvalidOperationException: The stream was already consumed. It cannot be read again.
   at System.Net.Http.HttpConnectionResponseContent.ConsumeStream()
   at System.Net.Http.HttpConnectionResponseContent.<SerializeToStreamAsync>g__Impl|6_0(Stream stream, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpContent.WaitAndReturnAsync[TState,TResult](Task waitTask, TState state, Func`2 returnFunc)
   at OpenAI.Extensions.ResponseExtensions.ReadAsStringAsync(HttpResponseMessage response, Boolean debugResponse, HttpContent requestContent, MemoryStream responseStream, List`1 events, CancellationToken cancellationToken, String methodName)
   at OpenAI.Extensions.ResponseExtensions.CheckResponseAsync(HttpResponseMessage response, Boolean debug, StringContent requestContent, MemoryStream responseStream, List`1 events, CancellationToken cancellationToken, String methodName)
   at OpenAI.Extensions.BaseEndpointExtensions.StreamEventsAsync(OpenAIBaseEndpoint baseEndpoint, String endpoint, StringContent payload, Func`3 eventCallback, CancellationToken cancellationToken)
   at OpenAI.Extensions.BaseEndpointExtensions.StreamEventsAsync(OpenAIBaseEndpoint baseEndpoint, String endpoint, StringContent payload, Func`3 eventCallback, CancellationToken cancellationToken)
   at OpenAI.Chat.ChatEndpoint.StreamCompletionAsync(ChatRequest chatRequest, Func`2 resultHandler, Boolean streamUsage, CancellationToken cancellationToken)

MarkoBL Jul 29, 2025
Author

Is there a simple docker compose script I can run to setup easily to install the additional packages and patch the files?

Sorry, I'm not using docker.

Anyway, with this pending PR: vllm-project/vllm#19425, everything is working now. It's a rework of the tool parsing for Mistral models and now it's compatible with this library 👍 Sorry for the hassle.

Answer selected by StephenHodgson

StephenHodgson Jul 29, 2025
Maintainer

Anyway, with this pending PR: vllm-project/vllm#19425, everything is working now. It's a rework of the tool parsing for Mistral models and now it's compatible with this library 👍 Sorry for the hassle.

So no changes needed in library?

MarkoBL · 2025-07-29T11:08:56Z

MarkoBL
Jul 29, 2025
Author

I have another question: How can I call a tool with a custom parameter?

Just a stupid example:

        [Function("Get the current temperature for the current user location.")]
        public static Task<string> GetCurrentWeatherAsync(Context context, [FunctionParameter("The units the user has requested temperature in. Typically this is based on the users location.")] WeatherUnit unit)
        {
         var userid = context.userid;
         var userlocation = whatever.getlocation(userid)
         return temp.bylocation(userlocation, unit)
        }

         var functionResult = await toolCall.InvokeFunctionAsync(userContext);

Real world use: In the tool function, I need to be able to identify the user or send something over the connected websocket to the client, independent from the output.

1 reply

StephenHodgson Jul 29, 2025
Maintainer

Might be better in a new discussion just so it is easier for people to find answer for later.

Uh oh!

Tool Calls not working using VLLM #478

Uh oh!

Uh oh!

MarkoBL Jul 24, 2025

To Reproduce

Expected behavior

Replies: 3 comments · 8 replies

Uh oh!

Uh oh!

StephenHodgson Jul 24, 2025 Maintainer

Uh oh!

Uh oh!

MarkoBL Jul 25, 2025 Author

Uh oh!

Uh oh!

MarkoBL Jul 28, 2025 Author

Uh oh!

Uh oh!

StephenHodgson Jul 28, 2025 Maintainer

Uh oh!

MarkoBL Jul 29, 2025 Author

Uh oh!

MarkoBL Jul 29, 2025 Author

Uh oh!

StephenHodgson Jul 29, 2025 Maintainer

Uh oh!

Uh oh!

MarkoBL Jul 29, 2025 Author

Uh oh!

StephenHodgson Jul 29, 2025 Maintainer

MarkoBL
Jul 24, 2025

Replies: 3 comments 8 replies

StephenHodgson
Jul 24, 2025
Maintainer

MarkoBL
Jul 25, 2025
Author

MarkoBL Jul 28, 2025
Author

StephenHodgson Jul 28, 2025
Maintainer

MarkoBL Jul 29, 2025
Author

MarkoBL Jul 29, 2025
Author

StephenHodgson Jul 29, 2025
Maintainer

MarkoBL
Jul 29, 2025
Author

StephenHodgson Jul 29, 2025
Maintainer