Description
The LocalClient provider always sends requests in the Ollama-native format — a flat prompt string to POST /api/generate. This format is specific to Ollama and breaks with any other local inference server that uses the OpenAI-compatible API (POST /v1/chat/completions with a messages array).
Affected servers: LM Studio, vLLM, LocalAI, llama.cpp server, Llamafile, and any other OpenAI-compatible backend.
Steps to Reproduce
- Install LM Studio and start a local server on e.g.
http://192.168.x.x:1234/v1
- In AI Agent HA, set AI Provider → Local Model, Local API URL →
http://192.168.x.x:1234/v1
- Send any prompt from the panel
Result: LM Studio returns an error:
Unexpected endpoint or method (POST /)
HA logs show the request was sent as a flat prompt string to POST /api/generate (Ollama format) rather than the OpenAI-compatible POST /v1/chat/completions with a messages array.
Expected: Requests to /v1-style endpoints should use the OpenAI-compatible format automatically.
Root Cause
LocalClient.get_response unconditionally builds an Ollama payload and sends it to whatever URL the user configured, with no inspection of the URL or detection of the target server type:
# agent.py — LocalClient (current)
payload = {
"model": self.model,
"prompt": user_message, # Ollama-native flat string
"stream": False,
}
async with session.post(self.url, ...) # blindly posts to configured URL
LM Studio and OpenAI-compatible servers expect:
payload = {
"model": self.model,
"messages": messages, # OpenAI-style messages array
"temperature": 0.7,
"stream": False,
}
# POST to /v1/chat/completions, not /api/generate
Fix
URL-based auto-detection in LocalClient.__init__: if the configured URL contains /v1, use the OpenAI-compatible messages format; otherwise fall back to the existing Ollama-native prompt format. No configuration change required from users — the URL they already enter determines the path.
local_url |
Request format |
http://host:1234/v1 |
OpenAI-compatible (POST /v1/chat/completions, messages array) |
http://host:11434/api/generate |
Ollama-native (prompt string) — unchanged |
A working implementation is available in my fork: JLay2026/ai_agent_ha PR #47
That PR also fixes a related issue: the frontend panel had a 60-second hardcoded timeout, while local models on consumer hardware commonly take 60–180 seconds — causing false Request timed out errors even when the backend was still processing. The fix raises the frontend timeout to 300 seconds, matching the aiohttp.ClientTimeout(total=300) already set on all backend clients.
Workaround (until merged)
Use the OpenAI provider instead of Local Model, with:
- API Key: any non-empty string (e.g.
lmstudio)
- Custom Base URL:
http://<lmstudio-host>:1234/v1
This routes through OpenAIClient, which already sends the correct format.
Environment
- AI Agent HA v1.08.7
- LM Studio 0.3.x
- Home Assistant 2025.x
Description
The
LocalClientprovider always sends requests in the Ollama-native format — a flatpromptstring toPOST /api/generate. This format is specific to Ollama and breaks with any other local inference server that uses the OpenAI-compatible API (POST /v1/chat/completionswith amessagesarray).Affected servers: LM Studio, vLLM, LocalAI, llama.cpp server, Llamafile, and any other OpenAI-compatible backend.
Steps to Reproduce
http://192.168.x.x:1234/v1http://192.168.x.x:1234/v1Result: LM Studio returns an error:
HA logs show the request was sent as a flat
promptstring toPOST /api/generate(Ollama format) rather than the OpenAI-compatiblePOST /v1/chat/completionswith amessagesarray.Expected: Requests to
/v1-style endpoints should use the OpenAI-compatible format automatically.Root Cause
LocalClient.get_responseunconditionally builds an Ollama payload and sends it to whatever URL the user configured, with no inspection of the URL or detection of the target server type:LM Studio and OpenAI-compatible servers expect:
Fix
URL-based auto-detection in
LocalClient.__init__: if the configured URL contains/v1, use the OpenAI-compatiblemessagesformat; otherwise fall back to the existing Ollama-nativepromptformat. No configuration change required from users — the URL they already enter determines the path.local_urlhttp://host:1234/v1POST /v1/chat/completions,messagesarray)http://host:11434/api/generatepromptstring) — unchangedA working implementation is available in my fork: JLay2026/ai_agent_ha PR #47
That PR also fixes a related issue: the frontend panel had a 60-second hardcoded timeout, while local models on consumer hardware commonly take 60–180 seconds — causing false
Request timed outerrors even when the backend was still processing. The fix raises the frontend timeout to 300 seconds, matching theaiohttp.ClientTimeout(total=300)already set on all backend clients.Workaround (until merged)
Use the OpenAI provider instead of Local Model, with:
lmstudio)http://<lmstudio-host>:1234/v1This routes through
OpenAIClient, which already sends the correct format.Environment