Skip to content

bug: LocalClient sends Ollama-native POST /api/generate format to LM Studio / OpenAI-compatible servers #57

@JLay2026

Description

@JLay2026

Description

The LocalClient provider always sends requests in the Ollama-native format — a flat prompt string to POST /api/generate. This format is specific to Ollama and breaks with any other local inference server that uses the OpenAI-compatible API (POST /v1/chat/completions with a messages array).

Affected servers: LM Studio, vLLM, LocalAI, llama.cpp server, Llamafile, and any other OpenAI-compatible backend.


Steps to Reproduce

  1. Install LM Studio and start a local server on e.g. http://192.168.x.x:1234/v1
  2. In AI Agent HA, set AI Provider → Local Model, Local API URL → http://192.168.x.x:1234/v1
  3. Send any prompt from the panel

Result: LM Studio returns an error:

Unexpected endpoint or method (POST /)

HA logs show the request was sent as a flat prompt string to POST /api/generate (Ollama format) rather than the OpenAI-compatible POST /v1/chat/completions with a messages array.

Expected: Requests to /v1-style endpoints should use the OpenAI-compatible format automatically.


Root Cause

LocalClient.get_response unconditionally builds an Ollama payload and sends it to whatever URL the user configured, with no inspection of the URL or detection of the target server type:

# agent.py — LocalClient (current)
payload = {
    "model": self.model,
    "prompt": user_message,   # Ollama-native flat string
    "stream": False,
}
async with session.post(self.url, ...)  # blindly posts to configured URL

LM Studio and OpenAI-compatible servers expect:

payload = {
    "model": self.model,
    "messages": messages,     # OpenAI-style messages array
    "temperature": 0.7,
    "stream": False,
}
# POST to /v1/chat/completions, not /api/generate

Fix

URL-based auto-detection in LocalClient.__init__: if the configured URL contains /v1, use the OpenAI-compatible messages format; otherwise fall back to the existing Ollama-native prompt format. No configuration change required from users — the URL they already enter determines the path.

local_url Request format
http://host:1234/v1 OpenAI-compatible (POST /v1/chat/completions, messages array)
http://host:11434/api/generate Ollama-native (prompt string) — unchanged

A working implementation is available in my fork: JLay2026/ai_agent_ha PR #47

That PR also fixes a related issue: the frontend panel had a 60-second hardcoded timeout, while local models on consumer hardware commonly take 60–180 seconds — causing false Request timed out errors even when the backend was still processing. The fix raises the frontend timeout to 300 seconds, matching the aiohttp.ClientTimeout(total=300) already set on all backend clients.


Workaround (until merged)

Use the OpenAI provider instead of Local Model, with:

  • API Key: any non-empty string (e.g. lmstudio)
  • Custom Base URL: http://<lmstudio-host>:1234/v1

This routes through OpenAIClient, which already sends the correct format.


Environment

  • AI Agent HA v1.08.7
  • LM Studio 0.3.x
  • Home Assistant 2025.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions