-
Notifications
You must be signed in to change notification settings - Fork 427
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.12.0
Python Version
3.13.5
Operating System
Ubuntu 22.04.4 LTS
Installation Method
pip
Steps to Reproduce
I have a vLLM server with a Qwen3-coder model and am hitting the OpenAI compatible endpoint.
Steps to reproduce:
- pip install strands-agents[openai]
- Configure the OpenAI model provider with the vLLM base_url and model (see below)
- Configure strands with the OpenAI model provider
- Invoke the agent
from strands import Agent
@tool
def get_weather(city: str) -> str:
"""returns weather info for the specified city."""
return f"The weather in {city} is sunny with a high of 85F and a low of 58F"
model = OpenAIModel(
client_args={
"api_key": "EMPTY",
"base_url": "<my base url>",
},
# **model_config
model_id="<my Qwen3-coder model path>",
params={
"max_tokens": 1000,
"temperature": 0.7,
}
)
# Initialize the agent with tools, model, and configuration
agent = Agent(model=model,
tools=[get_weather],
system_prompt="You are the chief meteorologist at a TV station. When asked, you should look up the weather in the specified city and provide a colorful response"
)
result = agent(""What is the weather in San Francisco?"")
print(result.message)
I've also tried replacing the get_weather
tool with the calculator
tool from the strands-agents-tools package, and that doesn't invoke the tool either.
Expected Behavior
"The weather in San Francisco is sunny with a high of 85F and a low of 58F" or something like that.
FWIW, the same tool when used with the openai-agents SDK (with the relevant annotation) and the same vLLM server+ model combination generates the weather report as expected.
Actual Behavior
The result of running the code prints out:
<tool_call>
<function=get_weather>
<parameter=city>
San Francisco
</parameter>
</function>
</tool_call>
instead of the weather.
Here's the entire result:
AgentResult(stop_reason='end_turn',
message={'content': [{'text': '<tool_call>\n'
'<function=get_weather>\n'
'<parameter=city>\n'
'San Francisco\n'
'</parameter>\n'
'</function>\n'
'</tool_call>'}],
'role': 'assistant'},
metrics=EventLoopMetrics(cycle_count=1,
tool_metrics={},
cycle_durations=[0.3705911636352539],
traces=[<strands.telemetry.metrics.Trace object at 0x7fc473363460>],
accumulated_usage={'inputTokens': 310,
'outputTokens': 23,
'totalTokens': 333},
accumulated_metrics={'latencyMs': 0}),
state={})
Additional Context
I also issued a direct chat.completions request against the server using the OpenAI client. Here's that response:
ChatCompletion(id='chatcmpl-6f6f42cf28014456893aee0f6fa61b2e', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-15fa2e41b38a4a5aadece55724056f4a', function=Function(arguments='{"city": "San Francisco"}', name='get_weather'), type='function')], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760142049, model='<my Qwen3-coder model name>', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=23, prompt_tokens=300, total_tokens=323, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
The finish_reason
is set to tool_calls
as expected, and the tool call and arguments seem fine to me.
Possible Solution
No response
Related Issues
No response