Conversation
- Add lint, build_tests, license_tests, pip_audit workflows (@dev) - Add docs/, QUICK_FACTS.md, FAQ.md, AUDIT.md, SUGGESTIONS.md, MAINTENANCE_REPORT.md - Migrate setup.py -> pyproject.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Anthropic Claude, Google Gemini, Cohere, HuggingFace TGI, and AWS
Bedrock-compatible endpoints, plus OpenAI /models & /embeddings stubs
and Ollama /api/show, /api/ps, /api/pull, /api/push, /api/embeddings.
New files:
- schemas/anthropic.py, schemas/gemini.py: Pydantic schemas with Field constraints
- anthropic.py: POST /v1/messages with SSE streaming (Anthropic format)
- gemini.py: POST /v1beta/models/{id}:generateContent + :streamGenerateContent
- cohere.py: POST /v1/chat, /v1/generate, /v1/embed
- huggingface_tgi.py: POST /generate, /generate_stream, GET /info, /health
- aws_bedrock.py: POST /model/{id}/invoke, /invoke-with-response-stream,
/model/{id}/converse; multi-model response normalisation by model prefix
Modified files:
- chat.py: add GET /v1/models, POST /v1/embeddings; fix response_model=None
on streaming routes; fix Python 3.11 f-string nested dict literals
- ollama.py: add GET /api/show, /api/ps; POST /api/pull, /api/push,
/api/embeddings; typed Pydantic request schemas; response_model=None
test/unittests/test_compat_routers.py: 29 tests, all passing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each compat router now mounts under a prefix that identifies the API vendor: /v1/... OpenAI (chat_router — unchanged) /api/... Ollama (ollama_router — unchanged) /anthropic/v1/... Anthropic Claude (was /v1 — conflicted with OpenAI) /gemini/v1beta/... Google Gemini /cohere/v1/... Cohere (was /v1 — conflicted with OpenAI /v1/chat) /tgi/... HuggingFace TGI /bedrock/model/... AWS Bedrock (was /model — now scoped) Benefits: all routers coexist in one FastAPI app with no path conflicts; Swagger UI groups endpoints clearly by vendor tag + prefix. Fix Python 3.11 f-string nested dict literals in chat.py (used temp vars). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cy paths with deprecation OpenAI and Ollama routers now mount under vendor-namespaced prefixes: /openai/v1/... (was /v1/...) /ollama/api/... (was /api/...) Legacy paths (/v1/... and /api/...) are preserved via register_deprecated_routes() which re-mounts the same route handlers at the old paths with deprecated=True. An HTTP middleware (add_deprecation_middleware()) injects response headers on every request to a legacy path: Deprecation: true Link: </openai/v1/...>; rel="successor-version" This gives clients a clear migration signal without breaking existing integrations. New file: ovos_persona_server/deprecated_routers.py test: 31 passing (4 new tests for deprecated path behaviour) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The bots have finished their work. Take a look! 🤖I've aggregated the results of the automated checks for this PR below. 🔍 LintAnother piece of the puzzle! 🧩 ❌ ruff: issues found — see job log 🔨 Build TestsEverything is bolted down and ready to go. 🔩 ✅ All versions pass
⚖️ License CheckVerifying the source of all binary files. 💾 ✅ No license violations found (52 packages). License distribution: 14× MIT, 11× MIT License, 6× Apache Software License, 5× Apache-2.0, 3× BSD-3-Clause, 2× ISC License (ISCL), 2× PSF-2.0, 2× Python Software Foundation License, +7 more Full breakdown — 52 packages
Copyright (c) 2022 Phil Ewels Permission is hereby granted, free of charge, to any person obtaining a copy The above copyright notice and this permission notice shall be included in all THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed. 🔒 Security (pip-audit)I've checked the security of our API endpoints. 🌐 ✅ No known vulnerabilities found (71 packages scanned). The automation never sleeps, but I might reboot. 💤 |
Adds docs/api-compatibility.md (7 APIs + 2 deprecated, curl examples), docs/deprecation.md, docs/streaming.md, docs/embeddings.md, docs/bedrock-models.md. Updates FAQ.md (20+ Q&As), QUICK_FACTS.md, AUDIT.md, SUGGESTIONS.md, MAINTENANCE_REPORT.md. AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: document all seven canonical and two deprecated API layers exhaustively - Impact: docs/ fully populated, FAQ.md 20+ entries - Verified via: uv run pytest test/ -v (41 passed)
New TestOllamaRouter: tags, chat, show, ps, pull, embeddings (501). New TestDeprecatedOllamaRouter: /api/... returns 200 + Deprecation header. Additional Bedrock converse structure test. AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: validate Ollama router endpoints and /api/... deprecation headers - Impact: 31 → 41 tests - Verified via: uv run pytest test/ -v (41 passed)
AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: Expose OVOS persona via A2A protocol; implement AgentExecutor that bridges Persona.stream() (sync) to async A2A event queue via asyncio.to_thread - Impact: New a2a.py with OVOSPersonaAgentExecutor, _agent_card, create_a2a_application; a2a-sdk is optional - Verified via: 13 unit tests, 95% coverage
AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: Mount A2A Starlette sub-app at /a2a when --a2a-base-url is provided; add optional [a2a] dependency - Impact: create_persona_app gains a2a_base_url param; __main__.py exposes --a2a-base-url flag; pyproject.toml adds [a2a] and [dev] extras
… FAQ AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: Full test coverage without live A2A server; document new A2A endpoint for operators - Impact: 13 tests passing; docs/a2a.md and FAQ.md updated - Verified via: pytest test/unittests/test_a2a.py -v --cov=ovos_persona_server.a2a
AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: Rewrite README with all 8 API surfaces, A2A section, persona config examples, troubleshooting; update docs/index.md with A2A entry; expand docs/a2a.md with endpoints, curl examples, streaming, multi-turn, architecture; update FAQ.md with A2A section - Impact: README.md rewritten; docs/index.md and docs/a2a.md updated; FAQ.md expanded
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ovos_persona_server/ollama.py (1)
188-188:⚠️ Potential issue | 🟠 MajorChange streaming response media type to
application/x-ndjson.The Ollama API specifies that streaming responses must use the
application/x-ndjsoncontent type (newline-delimited JSON). Usingapplication/jsonmisrepresents the actual stream format and can cause compatible clients to fail or buffer until EOF. (docs.ollama.com/api/streaming)♻️ Proposed fix
- return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json") + return StreamingResponse(streaming_ollama_chat_response(), media_type="application/x-ndjson")- return StreamingResponse(streaming_ollama_generate_response(), media_type="application/json") + return StreamingResponse(streaming_ollama_generate_response(), media_type="application/x-ndjson")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/ollama.py` at line 188, The response uses the wrong content type; update the StreamingResponse in the function that returns the streaming Ollama chat (the return that calls streaming_ollama_chat_response()) to use media_type "application/x-ndjson" instead of "application/json" so the newline-delimited JSON stream is correctly advertised to clients.
🟡 Minor comments (14)
ovos_persona_server/aws_bedrock.py-174-186 (1)
174-186:⚠️ Potential issue | 🟡 MinorSame sync-in-async streaming concern as TGI router.
The synchronous
persona.stream(messages)iteration in an async generator (line 177) can block the event loop. Consider the same executor-based approach suggested for the TGI router.Additionally, the blind
Exceptioncatch at line 181 would benefit from logging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/aws_bedrock.py` around lines 174 - 186, The async generator _stream currently iterates the synchronous persona.stream(messages) directly (blocking) and swallows exceptions; fix by moving the synchronous streaming into a short sync helper that collects chunks and returns them (e.g., def _collect_stream(persona, messages): accumulated = []; for chunk in persona.stream(messages): if chunk: accumulated.append(chunk); return accumulated), then call it inside the async _stream with loop.run_in_executor(None, _collect_stream, persona, messages) to avoid blocking the event loop; after receiving the collected chunks, yield the same data events as before; replace the bare except Exception with logging (e.g., logger.exception or logger.error with exc info) and still yield the error event so failures are both logged and sent to the client.ovos_persona_server/huggingface_tgi.py-93-109 (1)
93-109:⚠️ Potential issue | 🟡 MinorSynchronous iteration in async generator may block event loop.
persona.stream(messages)appears to be a synchronous generator. Iterating it directly in an async generator (line 97) will block the event loop during eachnext()call, degrading concurrency under load.Consider wrapping with
asyncio.to_threador running in an executor ifpersona.streamperforms blocking I/O.💡 Potential fix using run_in_executor
+import asyncio +from functools import partial async def _stream() -> AsyncGenerator[str, None]: accumulated = [] token_count = 0 + loop = asyncio.get_event_loop() try: - for chunk in persona.stream(messages): + stream_iter = iter(persona.stream(messages)) + while True: + chunk = await loop.run_in_executor(None, next, stream_iter, None) + if chunk is None: + break if chunk:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/huggingface_tgi.py` around lines 93 - 109, The async generator _stream currently iterates persona.stream(messages) synchronously which can block the event loop; change the iteration to run the blocking generator on a worker thread or executor (e.g., use asyncio.to_thread or loop.run_in_executor) so each call to persona.stream/messages yields is performed off the event loop, then await results and continue building accumulated/token_count and yielding the same data events; ensure exception handling for the worker task preserves the existing error yield behavior.pyproject.toml-8-8 (1)
8-8:⚠️ Potential issue | 🟡 MinorStale description: Flask → FastAPI.
The description says "simple flask server" but the server uses FastAPI (also noted in AUDIT.md). Update to reflect the actual framework.
📝 Proposed fix
-description = "simple flask server to host OpenVoiceOS persona plugins as a service" +description = "FastAPI server to host OpenVoiceOS persona plugins as a service"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 8, Update the package description string in pyproject.toml to reflect the actual framework: change the description field value that currently reads "simple flask server to host OpenVoiceOS persona plugins as a service" to mention FastAPI instead (e.g., "simple FastAPI server to host OpenVoiceOS persona plugins as a service") so the description matches the implementation and AUDIT.md; edit the description = "... " entry accordingly..github/workflows/notify_matrix.yml-10-10 (1)
10-10:⚠️ Potential issue | 🟡 MinorAlign workflow ref with other workflows: use
@devinstead of@master.This workflow uses
@masterwhile all other workflows in the repository use@dev(release_workflow.yml, publish_stable.yml, pip_audit.yml, lint.yml, license_tests.yml, build_tests.yml). Either update this to@devfor consistency or add a comment explaining why this workflow requires a different ref.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/notify_matrix.yml at line 10, Update the workflow action reference to match other repo workflows by changing the uses ref in the notify_matrix.yml step from OpenVoiceOS/gh-automations/.github/workflows/notify-matrix.yml@master to ...@dev (or, if there is a deliberate reason to keep `@master`, add a brief inline comment above that uses line explaining why this workflow needs `@master`); specifically edit the line containing the uses string so it ends with `@dev` or add the explanatory comment directly above it.QUICK_FACTS.md-3-4 (1)
3-4:⚠️ Potential issue | 🟡 MinorInclude the optional A2A surface in this inventory.
This page reads like the package/API quick reference, but it currently omits the conditional
/a2amount, thea2a_base_urlparameter oncreate_persona_app, the--a2a-base-urlCLI flag, and the addedtest/unittests/test_a2a.pycoverage. Either document the A2A surface here or explicitly scope the page to the seven compatibility routers only.Also applies to: 17-18, 28-29, 30-64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@QUICK_FACTS.md` around lines 3 - 4, Update the QUICK_FACTS.md inventory to include the optional A2A surface: mention the conditional /a2a mount, document the a2a_base_url parameter on create_persona_app and the --a2a-base-url CLI flag, and note that test/unittests/test_a2a.py adds coverage for this surface; alternatively, explicitly state that the page only documents the seven compatibility routers and exclude A2A—be sure to reference the create_persona_app option and CLI flag names so readers can find the conditional behavior.docs/embeddings.md-5-11 (1)
5-11:⚠️ Potential issue | 🟡 MinorThis intro contradicts the table.
Line 5 says there are “three endpoints across two routers”, but Lines 9-11 list
chat_router,ollama_router, andcohere_router. This should say three routers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/embeddings.md` around lines 5 - 11, Update the first sentence in the embeddings intro to say "three endpoints across three routers" (not "two routers") so it matches the table listing chat_router, ollama_router, and cohere_router; verify the surrounding text that references persona.solvers.loaded_modules and the get_embeddings mechanism remains accurate and consistent with ovos_persona_server/chat.py, ovos_persona_server/ollama.py and ovos_persona_server/cohere.py.docs/streaming.md-13-20 (1)
13-20:⚠️ Potential issue | 🟡 MinorMark the example fences with a language.
markdownlintis already flagging these examples (MD040), and Line 13 also trips MD038 because of the trailing space inside thedata:code span. Addingtext/jsonfence labels and trimming that code span will keep the page lint-clean.Also applies to: 37-47, 61-79, 93-96, 110-114, 128-132, 146-150
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/streaming.md` around lines 13 - 20, The fenced examples showing streaming chunks use unlabeled code fences and include a trailing space in the inline code span `data: `; update each example fence to include an explicit language label (e.g., ```text or ```json) and remove the trailing space inside the `data:` code span so it reads `data:` to satisfy markdownlint (MD040 and MD038). Apply the same change to all similar examples in the document (the other chunked examples referenced) so every fenced block is labeled and no `data:` code spans contain trailing spaces.docs/api-compatibility.md-3-3 (1)
3-3:⚠️ Potential issue | 🟡 MinorThe intro over-generalizes model handling.
Line 3 says the request
modelis ignored and the persona name is used in all responses, but the Anthropic section below documents"model": "<requested model>". Please narrow the global claim or move model/auth behavior into provider-specific notes so the page does not contradict itself.Also applies to: 118-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/api-compatibility.md` at line 3, The intro over-generalizes how the request `model` and auth headers are handled; update the intro sentence about the `model` field and auth headers so it no longer states they are universally ignored, and instead say that persona-backed behavior varies by provider and refer readers to provider-specific sections (e.g., the Anthropic section which documents `"model": "<requested model>"`) for exact behavior; alternatively, remove the blanket claim and move the note about `model`/auth handling into each provider subsection (including the Anthropic subsection and the section covering lines referenced previously) so the global intro does not contradict provider-specific docs.docs/deprecation.md-22-25 (1)
22-25:⚠️ Potential issue | 🟡 MinorAnnotate this header example with a language.
This fence is currently tripping
markdownlintMD040. Addingtextis enough here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/deprecation.md` around lines 22 - 25, The fenced code block in the header example is missing a language tag and trips markdownlint MD040; update the triple-backtick fence that contains "Deprecation: true\nLink: </openai/v1/chat/completions>; rel=\"successor-version\"" by adding a language identifier (e.g., text) after the opening ``` so the block becomes ```text, leaving the content unchanged.README.md-254-257 (1)
254-257:⚠️ Potential issue | 🟡 MinorMissing closing backticks for code block.
The code block starting at line 254 is missing its closing triple backticks before the "Then restart the server" line.
📝 Proposed fix
**A2A endpoint not available after starting with `--a2a-base-url`** `a2a-sdk` is not installed. Install it: ```bash uv pip install 'ovos-persona-server[a2a]' +``` Then restart the server.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 254 - 257, The code block in README.md that begins with "uv pip install 'ovos-persona-server[a2a]'" is missing its closing triple backticks; update the README.md by adding a closing ``` after that line so the "Then restart the server." text is rendered outside the code block and the fenced code block is properly terminated.ovos_persona_server/anthropic.py-38-43 (1)
38-43:⚠️ Potential issue | 🟡 MinorPotential
AttributeErrorwhen flattening content blocks.If
msg.contentis a list but contains blocks without atextattribute (e.g., image blocks in future expansions), the generator expression will raiseAttributeError. Consider adding defensive handling.🛡️ Proposed fix to handle missing text attribute
else: - content = " ".join(block.text for block in msg.content) + content = " ".join( + block.text for block in msg.content if hasattr(block, "text") and block.text + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/anthropic.py` around lines 38 - 43, The loop that flattens message content (iterating request.messages and using the generator expression to join block.text) can raise AttributeError if a block lacks a text attribute; update the handling inside that loop (where msg.content is treated as a list) to defensively iterate blocks and use a safe extraction (e.g., check hasattr(block, "text") or isinstance(block, str) and skip or substitute an empty string for non-text blocks) before joining, then append the resulting safe content in messages.append({"role": msg.role, "content": content}) so missing text attributes no longer raise.ovos_persona_server/anthropic.py-105-107 (1)
105-107:⚠️ Potential issue | 🟡 MinorException details exposed in SSE error event.
The raw exception message is interpolated directly into the SSE payload. This could leak internal implementation details or sensitive information to clients.
🛡️ Proposed fix to sanitize error output
except Exception as exc: - yield f"event: error\ndata: {{\"type\":\"error\",\"error\":{{\"message\":\"{exc}\"}}}}\n\n" + yield f"event: error\ndata: {{\"type\":\"error\",\"error\":{{\"message\":\"Internal server error\"}}}}\n\n" returnAlternatively, log the actual exception server-side for debugging while returning a generic message to clients.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/anthropic.py` around lines 105 - 107, The SSE error handler currently interpolates the raw exception (exc) into the SSE payload in the except block; instead, log the full exception server-side (use logger.exception or similar in the same scope) and change the yielded SSE event produced by the except handler (the yield that references exc) to a sanitized generic error message (e.g., "Internal server error" or a short error code) that does not expose internal details; keep the exception variable name exc and the yield location so you update the exact generator/except block accordingly.ovos_persona_server/cohere.py-85-87 (1)
85-87:⚠️ Potential issue | 🟡 MinorUnused variable
gen_idin non-streaming chat path.
gen_idis generated on line 86 but never used in the non-streaming response (lines 94-105). Consider removing it or using it in the response if intended.🧹 Proposed fix to remove unused variable
msg_id = _new_id() - gen_id = _new_id() if not request.stream:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/cohere.py` around lines 85 - 87, Remove the unused gen_id assignment or wire it into the non-streaming response; specifically, delete the gen_id = _new_id() line in ovos_persona_server/cohere.py (or, if gen_id was intended to identify the generated response, include gen_id in the response payload/metadata where msg_id is used in the non-streaming path) so there are no unused variables; update any response construction code that should carry a generation id to reference gen_id instead of leaving it unused.ovos_persona_server/chat.py-388-398 (1)
388-398:⚠️ Potential issue | 🟡 MinorMissing error handling for
get_embeddingscall.If
embed_solver.get_embeddings(t)raises an exception, it will propagate as an unhandled 500 error without a clear message. Consider wrapping in try/except for consistency with other endpoints.🛡️ Proposed fix to add error handling
texts = request_body.input if isinstance(request_body.input, list) else [request_body.input] data = [] - for i, t in enumerate(texts): - vec = embed_solver.get_embeddings(t) - data.append({"object": "embedding", "embedding": vec, "index": i}) + try: + for i, t in enumerate(texts): + vec = embed_solver.get_embeddings(t) + data.append({"object": "embedding", "embedding": vec, "index": i}) + except Exception as exc: + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail=f"Embeddings generation failed: {exc}", + ) from exc return JSONResponse({🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/chat.py` around lines 388 - 398, The call to embed_solver.get_embeddings in the loop can raise and is unhandled; wrap the per-text embedding call inside a try/except (around embed_solver.get_embeddings(t) within the loop over texts) to catch exceptions, log the error, and return a JSONResponse with an error message and appropriate status code (e.g., 500) instead of letting the exception propagate; ensure you still build and return the successful "object":"list" response when no error occurs and reference the existing variables/functions embed_solver.get_embeddings, texts, data, and the JSONResponse return path so the handler consistently mirrors error handling used elsewhere.
🧹 Nitpick comments (16)
ovos_persona_server/aws_bedrock.py (2)
245-248: Token usage computed twice with same expression.
sum(len(m["content"].split()) for m in messages)is evaluated twice (lines 246 and 248). Extract to a local variable for clarity and efficiency.♻️ Proposed fix
text = persona.chat(messages) + input_tokens = sum(len(m["content"].split()) for m in messages) + output_tokens = len((text or "").split()) return JSONResponse({ "output": { "message": { "role": "assistant", "content": [{"text": text or ""}], } }, "stopReason": "end_turn", "usage": { - "inputTokens": sum(len(m["content"].split()) for m in messages), - "outputTokens": len((text or "").split()), - "totalTokens": sum(len(m["content"].split()) for m in messages) + len((text or "").split()), + "inputTokens": input_tokens, + "outputTokens": output_tokens, + "totalTokens": input_tokens + output_tokens, }, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/aws_bedrock.py` around lines 245 - 248, The token counts for input are computed twice in the "usage" dict (the expression sum(len(m["content"].split()) for m in messages) is duplicated); extract that value to a local variable (e.g., input_tokens = sum(...) computed once before building the dict) and then use input_tokens for "inputTokens" and to compute "totalTokens" (add len((text or "").split()) to input_tokens) so the computation is done once and reused in the creation of the "usage" object.
37-61: Format detection relies on body fields rather than model_id prefix.As noted in SUGGESTIONS.md item 5, checking
"messages" in bodyfirst (line 38) could cause misparsing if a request targeting a Titan model accidentally includes amessageskey. Themodel_idparameter is available but not used for disambiguation.Consider checking
model_idprefix first for known families before falling back to body field heuristics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/aws_bedrock.py` around lines 37 - 61, The parsing logic currently assumes Converse format whenever "messages" is in body (using variables body, msgs, result), which can misclassify requests for specific model families; update the code that branches on request format to first inspect the model_id parameter (e.g., check model_id.startswith for known prefixes like "titan", "claude", "gpt" or your project's canonical family strings) to select the correct format handler, and only fall back to the existing body-based heuristics (the current messages/system parsing logic) if model_id is absent or unrecognized; keep the existing handling of body["system"] and content list merging (sys_text, text) but gate that block behind the model_id-based decision..github/workflows/publish_stable.yml (1)
9-16: Consider pinning reusable workflow to a stable ref for production releases.Using
@devfor the reusable workflow in a stable release pipeline introduces risk—breaking changes ingh-automationsdev branch could unexpectedly fail releases. Consider pinning to a tagged version or@masteronce the automations repository stabilizes.The bot actor exclusion on line 9 and removal of
setup_pyinput (now handled viapyproject.toml) are appropriate changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/publish_stable.yml around lines 9 - 16, The reusable workflow reference currently uses a floating dev ref ("uses: OpenVoiceOS/gh-automations/.github/workflows/publish-stable.yml@dev"), which risks breaking stable releases; update that "uses:" invocation to pin to a stable ref (for example a specific tag like `@v1.2.3` or the repository's main branch such as `@master` once it's stable) so the publish_stable workflow consistently runs against a known-good version, and keep the other inputs (branch, version_file, publish_pypi, sync_dev) unchanged.ovos_persona_server/huggingface_tgi.py (2)
132-148: Hardcoded/infovalues may mislead clients.The
/infoendpoint returns static placeholder values (e.g.,max_input_length: 4096,version: "2.0.0") that don't reflect actual persona capabilities. Clients relying on these for context window management could encounter unexpected truncation.Consider documenting that these are placeholders or deriving values from persona configuration if available.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/huggingface_tgi.py` around lines 132 - 148, The /info response currently returns hardcoded placeholder fields (see the JSONResponse returned where persona.name is used) which can mislead clients; update the handler that returns this JSONResponse to populate fields like max_input_length, max_total_tokens, version, sha, model_dtype, model_device_type, etc. from the persona configuration or the actual model metadata if available (fall back to clearly labeled placeholders only when no metadata exists), or add an explicit "placeholder": true/notice field to the response so clients know values are not authoritative. Ensure you reference the JSONResponse return in huggingface_tgi.py and use persona or its metadata methods to derive the real values.
107-109: Consider logging exceptions before returning error to client.The blind
Exceptioncatch is acceptable for SSE resilience, but silently discarding the exception makes debugging difficult. Adding a log statement would aid troubleshooting.🔧 Proposed fix
+import logging + +logger = logging.getLogger(__name__) + except Exception as exc: + logger.exception("Error during TGI streaming") yield f"data:{json.dumps({'error': str(exc)})}\n\n" return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/huggingface_tgi.py` around lines 107 - 109, The except Exception as exc block currently yields the error to the SSE client but discards the exception; update that block to log the exception before yielding by calling the module/class logger (e.g. logger.exception(...) or logger.error(..., exc_info=True)) so the stacktrace and message are recorded; specifically modify the except Exception as exc in huggingface_tgi.py (the block that does yield f"data:{json.dumps({'error': str(exc)})}\n\n") to log exc (using logger.exception or logging.getLogger(__name__).exception) and then continue to yield and return as before.pyproject.toml (1)
14-21: Keywords appear to be from a different project.The keywords (
lang,detect,translate) suggest language detection/translation, but this is a persona/LLM server. Consider updating to relevant terms.📝 Suggested keywords
keywords = [ - "plugin", - "lang", - "detect", - "translate", + "persona", + "llm", + "api", + "openai", + "ollama", "OVOS", "OpenVoiceOS", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 14 - 21, The keywords array in pyproject.toml currently contains unrelated language/translation terms; update the "keywords" list to reflect this project (persona/LLM server) by replacing entries like "lang", "detect", "translate" with relevant terms such as "persona", "llm", "assistant", "server", "chatbot", "conversational-ai", "virtual-assistant", "OpenVoiceOS" (i.e., edit the keywords list symbol in pyproject.toml to contain project-appropriate tags).docs/index.md (1)
33-47: Add language specifier to fenced code block.The architecture diagram should have a language specifier to satisfy markdownlint (MD040). Use
textorplaintextfor ASCII diagrams.📝 Proposed fix
-``` +```text HTTP request └─ FastAPI app [__init__.py:create_persona_app]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/index.md` around lines 33 - 47, Add a language specifier to the fenced code block that contains the ASCII architecture diagram in docs/index.md (the block showing "HTTP request └─ FastAPI app [__init__.py:create_persona_app] ..." and nested routers/OVOSPersonaAgentExecutor/Persona.stream) so markdownlint MD040 is satisfied; change the opening triple backticks to include the specifier (e.g., ```text or ```plaintext) and keep the closing triple backticks unchanged.docs/a2a.md (1)
139-148: Add language specifier to fenced code block.The architecture diagram code block should specify a language (e.g.,
textorplaintext) for consistency and to satisfy markdown linters.📝 Proposed fix
-``` +```text A2A client request └─ POST /a2a/ └─ A2AStarletteApplication [a2a-sdk]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/a2a.md` around lines 139 - 148, Add a language specifier (e.g., "text" or "plaintext") to the fenced code block containing the architecture diagram so markdown linters accept it; update the opening fence for the block that starts with "A2A client request" (the block showing A2AStarletteApplication, DefaultRequestHandler, OVOSPersonaAgentExecutor and Persona.stream) to use ```text (or ```plaintext) instead of bare ``` to mark it as plain text.ovos_persona_server/deprecated_routers.py (1)
64-82: Deprecation warning logged on every request — consider rate-limiting.The middleware logs a
WARNINGon every request to a legacy path. In high-traffic scenarios, this could flood logs. Consider logging only once per path (using a set to track logged paths) or atINFOlevel.♻️ Optional: Log deprecation warning only once per path
+_warned_paths: set = set() + `@app.middleware`("http") async def deprecation_header_middleware(request: Request, call_next: Callable) -> Response: """Inject Deprecation + Link headers for legacy endpoint paths.""" path = request.url.path is_legacy = any( path.startswith(prefix + "/") or path == prefix for prefix in _LEGACY_PREFIX_MAP ) response: Response = await call_next(request) if is_legacy: successor = _build_successor_path(path) - LOG.warning( - "Deprecated path accessed: %s — use %s instead", - path, - successor, - ) + if path not in _warned_paths: + LOG.warning( + "Deprecated path accessed: %s — use %s instead", + path, + successor, + ) + _warned_paths.add(path) response.headers["Deprecation"] = "true" response.headers["Link"] = f'<{successor}>; rel="successor-version"' return response🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/deprecated_routers.py` around lines 64 - 82, The middleware deprecation_header_middleware currently emits LOG.warning for every legacy request and should be rate-limited: add a module-level set (e.g., _logged_deprecated_paths = set()) and, inside deprecation_header_middleware, check whether the accessed legacy path (use the same logic with _LEGACY_PREFIX_MAP and _build_successor_path to form the key) has already been logged; only call LOG.warning and add the path to _logged_deprecated_paths the first time it is seen (or alternatively change LOG.warning to LOG.info if you prefer lower verbosity). Ensure thread-safety if necessary for your runtime (e.g., minimal locking) when updating _logged_deprecated_paths.test/unittests/test_a2a.py (1)
29-29: PEP 484: Use explicitOptionalfor nullable default parameter.The
partsparameter defaults toNonebut the type hint isList[Any], notOptional[List[Any]].📝 Proposed fix
+from typing import Any, List, Optional + class _FakeMessage: - def __init__(self, text: str = "", parts: List[Any] = None, **kwargs: Any) -> None: + def __init__(self, text: str = "", parts: Optional[List[Any]] = None, **kwargs: Any) -> None: if parts is not None: self.parts = parts else: self.parts = [_FakePart(root=_FakeTextPart(text))]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unittests/test_a2a.py` at line 29, The constructor signature declares parts: List[Any] = None but should use an explicit nullable type; update the __init__ method signature to annotate parts as Optional[List[Any]] and add Optional to the typing imports (or use typing.Optional) so the default None is type-correct; keep the default value None and no other behavior changes in __init__.test/unittests/test_compat_routers.py (3)
107-108: Use a more descriptive variable name instead ofl.The variable name
lis ambiguous (can be confused with1orI). Consider usinglineordata_line.♻️ Proposed fix
- lines = [l for l in resp.text.splitlines() if l.startswith("data:")] + lines = [line for line in resp.text.splitlines() if line.startswith("data:")]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unittests/test_compat_routers.py` around lines 107 - 108, Rename the ambiguous loop variable `l` in the list comprehension that builds `lines` from `resp.text.splitlines()` to a clearer name like `line` or `data_line` (e.g., `lines = [line for line in resp.text.splitlines() if line.startswith("data:")]`) and update any references to that variable accordingly so the intent is clear and avoids confusion with `1` or `I`.
198-199: Use a more descriptive variable name instead ofl.Same issue as above - use
lineinstead ofl.♻️ Proposed fix
- events = [l for l in resp.text.splitlines() if l.startswith("event:")] + events = [line for line in resp.text.splitlines() if line.startswith("event:")]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unittests/test_compat_routers.py` around lines 198 - 199, Replace the ambiguous loop variable `l` with a descriptive name `line` in the list comprehensions that build `events` and `event_types` in test_compat_routers (change `[l for l in resp.text.splitlines() if l.startswith("event:")]` to `[line for line in resp.text.splitlines() if line.startswith("event:")]` and similarly update the second comprehension `event_types = [e.split(": ", 1)[1] for e in events]` if it uses a single-letter variable elsewhere), so both comprehensions use `line` for clarity and consistency with surrounding code.
278-280: Use more descriptive variable names instead ofl.Same ambiguous variable name issue in both list comprehensions.
♻️ Proposed fix
- lines = [l.strip() for l in resp.text.splitlines() if l.strip()] - events = [json.loads(l) for l in lines if l.startswith("{")] + lines = [line.strip() for line in resp.text.splitlines() if line.strip()] + events = [json.loads(line) for line in lines if line.startswith("{")]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unittests/test_compat_routers.py` around lines 278 - 280, The two list comprehensions use the ambiguous variable name `l`; update them to use a descriptive name like `line` to improve readability: change the comprehension that builds `lines` from `[l.strip() for l in resp.text.splitlines() if l.strip()]` to use `line`, and likewise change `events = [json.loads(l) for l in lines if l.startswith("{")]` to use `line` so it becomes `events = [json.loads(line) for line in lines if line.startswith("{")]`; keep the surrounding logic (resp, lines, events, assert) unchanged.ovos_persona_server/schemas/anthropic.py (1)
15-19: Consider whether"system"role is needed inAnthropicMessage.Anthropic's API typically handles system prompts via the top-level
systemfield in the request, not as a message role. Including"system"in the allowed roles may accept requests that don't match the actual Anthropic API behavior. However, since_normalise_messages()inanthropic.pyhandles this by prepending the top-levelsystemseparately, this is likely intentional for flexibility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/schemas/anthropic.py` around lines 15 - 19, The allowed "system" role in AnthropicMessage is misleading for Anthropic's API; change the role Literal on class AnthropicMessage to only allow "user" and "assistant" (e.g., Literal["user","assistant"]) and update any code that constructs or validates messages accordingly so system prompts are supplied via the top-level system handling in _normalise_messages; ensure references to "system" messages are removed from AnthropicMessage usages and rely on _normalise_messages to prepend the top-level system content.ovos_persona_server/cohere.py (2)
182-182: Same media type consideration for generate streaming.Apply the same NDJSON media type consideration here for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/cohere.py` at line 182, The response currently returns StreamingResponse(_stream(), media_type="application/json"); update it to use the NDJSON media type (e.g., "application/x-ndjson") for consistency with the other generate streaming implementation, i.e., change the media_type argument on the StreamingResponse call that wraps _stream(), and verify that the _stream() generator yields newline-delimited JSON records so the content-type matches the payload.
129-129: Consider using NDJSON media type for streaming responses.Cohere's streaming API returns newline-delimited JSON (NDJSON). While
application/jsonworks,application/x-ndjsonortext/event-streammay be more accurate for client compatibility.♻️ Proposed fix
- return StreamingResponse(_stream(), media_type="application/json") + return StreamingResponse(_stream(), media_type="application/x-ndjson")Note: Verify Cohere's actual content-type header to ensure compatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_persona_server/cohere.py` at line 129, The streaming response currently returns media_type="application/json"; change it to use an NDJSON-compatible media type by setting StreamingResponse(..., media_type="application/x-ndjson") (or "text/event-stream" if you intend SSE), or better, detect and forward the Cohere response content-type (e.g. read the upstream response.headers.get('content-type') and pass it into StreamingResponse) so the returned media type matches Cohere's actual streaming format; update the return that wraps _stream() in ovos_persona_server/cohere.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ovos_persona_server/a2a.py`:
- Around line 159-180: The code currently buffers all output by calling
list(self._persona.stream(messages)) in execute, preventing incremental SSE
delivery; change to drain persona.stream as chunks arrive by running the
synchronous Persona.stream in a background thread and streaming results into the
async path (e.g., push chunks into a thread-safe queue from a background
callable and consume that queue in execute), then create and await
event_queue.enqueue_event(TaskArtifactUpdateEvent(...)) for each chunk as it is
dequeued (use append/last_chunk logic based on streaming sequence), and handle
errors and a termination sentinel so streaming stops cleanly instead of waiting
for the full list.
In `@ovos_persona_server/deprecated_routers.py`:
- Around line 114-117: The code incorrectly assumes route.path includes the
router prefix; instead of slicing route.path by canonical_prefix, set relative =
route.path directly and build legacy_path = new_prefix + relative (ensuring you
normalize slashes to avoid double or missing slashes). Update the logic around
the variables route.path, canonical_prefix, new_prefix, relative, and
legacy_path to use the route.path as-is when constructing the legacy route.
In `@ovos_persona_server/gemini.py`:
- Around line 115-124: The async generator _stream currently calls the
synchronous persona.stream(messages) directly, blocking the event loop; change
_stream to offload the blocking call to a background thread (e.g., use
asyncio.to_thread or loop.run_in_executor) when iterating persona.stream so
chunks are produced without blocking; specifically, wrap the
persona.stream(messages) iteration (inside async def _stream) in an
asyncio.to_thread call or call persona.stream via to_thread and then iterate its
returned iterator, then keep using _build_response(chunk, model_id) and yielding
the JSON SSE lines as before; follow the same offloading pattern used in a2a.py
to avoid event-loop blocking.
In `@ovos_persona_server/ollama.py`:
- Around line 519-521: The response is missing the required model field in
OllamaEmbedResponse; set the model to the identifier used by the embedder before
returning — e.g., after computing vec from
embed_solver.get_embeddings(input_text) construct OllamaEmbedResponse with
embeddings=[vec] and model=embed_solver.model (or embed_solver.model_name if
that property is used by your embed_solver), falling back to a sensible string
if neither exists; update the return in the block that builds input_text, calls
embed_solver.get_embeddings and returns JSONResponse so schema validation
passes.
- Around line 486-521: In ollama_embeddings, don't join list inputs into one
string; instead detect if request_body.input is a list and for each item call
the chosen embed_solver.get_embeddings(item) to produce a list of vectors,
preserving input order and returning OllamaEmbedResponse(embeddings=[...]) where
embeddings is a List[List[float]] (for a single string input still wrap the
single vector in a list). Update the branch that currently sets input_text and
calls embed_solver.get_embeddings to handle both cases (string -> single vector
wrapped in list; list -> map/loop calling get_embeddings per element), keeping
the rest of the logic (solver selection via persona.solvers.loaded_modules)
unchanged.
- Around line 376-441: The /show handler is incorrectly a GET and both /show and
/ps are missing Ollama-required fields; change the decorator on the show
function from `@ollama_router.get` to `@ollama_router.post`, extend the OllamaModel
and OllamaModelDetails dataclasses to include the missing fields (for show:
parameters, license, template, modelfile, capabilities; for ps/details/runtime:
expires_at, size_vram, context_length or include a runtime sub-object returned
by ps), populate those fields with sensible defaults or derived values in show()
and ps() (use persona.name, parent_model_str, timestamp() for expires_at or null
semantics) and ensure model.model_dump() includes the new fields so the /show
and /ps responses match Ollama's API contract (refer to functions show, ps and
classes OllamaModel, OllamaModelDetails).
In `@pyproject.toml`:
- Around line 22-26: The package runtime is missing uvicorn which is
imported/used by the console entry point; add "uvicorn" to the dependencies
array in pyproject.toml so the runtime dependency is installed. Specifically,
update the dependencies list that currently contains "fastapi", "ovos-persona",
and "pydantic" to include "uvicorn" so that __main__.py's main() (which calls
uvicorn.run()) and the ovos-persona-server entry point work correctly.
---
Outside diff comments:
In `@ovos_persona_server/ollama.py`:
- Line 188: The response uses the wrong content type; update the
StreamingResponse in the function that returns the streaming Ollama chat (the
return that calls streaming_ollama_chat_response()) to use media_type
"application/x-ndjson" instead of "application/json" so the newline-delimited
JSON stream is correctly advertised to clients.
---
Minor comments:
In @.github/workflows/notify_matrix.yml:
- Line 10: Update the workflow action reference to match other repo workflows by
changing the uses ref in the notify_matrix.yml step from
OpenVoiceOS/gh-automations/.github/workflows/notify-matrix.yml@master to ...@dev
(or, if there is a deliberate reason to keep `@master`, add a brief inline comment
above that uses line explaining why this workflow needs `@master`); specifically
edit the line containing the uses string so it ends with `@dev` or add the
explanatory comment directly above it.
In `@docs/api-compatibility.md`:
- Line 3: The intro over-generalizes how the request `model` and auth headers
are handled; update the intro sentence about the `model` field and auth headers
so it no longer states they are universally ignored, and instead say that
persona-backed behavior varies by provider and refer readers to
provider-specific sections (e.g., the Anthropic section which documents
`"model": "<requested model>"`) for exact behavior; alternatively, remove the
blanket claim and move the note about `model`/auth handling into each provider
subsection (including the Anthropic subsection and the section covering lines
referenced previously) so the global intro does not contradict provider-specific
docs.
In `@docs/deprecation.md`:
- Around line 22-25: The fenced code block in the header example is missing a
language tag and trips markdownlint MD040; update the triple-backtick fence that
contains "Deprecation: true\nLink: </openai/v1/chat/completions>;
rel=\"successor-version\"" by adding a language identifier (e.g., text) after
the opening ``` so the block becomes ```text, leaving the content unchanged.
In `@docs/embeddings.md`:
- Around line 5-11: Update the first sentence in the embeddings intro to say
"three endpoints across three routers" (not "two routers") so it matches the
table listing chat_router, ollama_router, and cohere_router; verify the
surrounding text that references persona.solvers.loaded_modules and the
get_embeddings mechanism remains accurate and consistent with
ovos_persona_server/chat.py, ovos_persona_server/ollama.py and
ovos_persona_server/cohere.py.
In `@docs/streaming.md`:
- Around line 13-20: The fenced examples showing streaming chunks use unlabeled
code fences and include a trailing space in the inline code span `data: `;
update each example fence to include an explicit language label (e.g., ```text
or ```json) and remove the trailing space inside the `data:` code span so it
reads `data:` to satisfy markdownlint (MD040 and MD038). Apply the same change
to all similar examples in the document (the other chunked examples referenced)
so every fenced block is labeled and no `data:` code spans contain trailing
spaces.
In `@ovos_persona_server/anthropic.py`:
- Around line 38-43: The loop that flattens message content (iterating
request.messages and using the generator expression to join block.text) can
raise AttributeError if a block lacks a text attribute; update the handling
inside that loop (where msg.content is treated as a list) to defensively iterate
blocks and use a safe extraction (e.g., check hasattr(block, "text") or
isinstance(block, str) and skip or substitute an empty string for non-text
blocks) before joining, then append the resulting safe content in
messages.append({"role": msg.role, "content": content}) so missing text
attributes no longer raise.
- Around line 105-107: The SSE error handler currently interpolates the raw
exception (exc) into the SSE payload in the except block; instead, log the full
exception server-side (use logger.exception or similar in the same scope) and
change the yielded SSE event produced by the except handler (the yield that
references exc) to a sanitized generic error message (e.g., "Internal server
error" or a short error code) that does not expose internal details; keep the
exception variable name exc and the yield location so you update the exact
generator/except block accordingly.
In `@ovos_persona_server/aws_bedrock.py`:
- Around line 174-186: The async generator _stream currently iterates the
synchronous persona.stream(messages) directly (blocking) and swallows
exceptions; fix by moving the synchronous streaming into a short sync helper
that collects chunks and returns them (e.g., def _collect_stream(persona,
messages): accumulated = []; for chunk in persona.stream(messages): if chunk:
accumulated.append(chunk); return accumulated), then call it inside the async
_stream with loop.run_in_executor(None, _collect_stream, persona, messages) to
avoid blocking the event loop; after receiving the collected chunks, yield the
same data events as before; replace the bare except Exception with logging
(e.g., logger.exception or logger.error with exc info) and still yield the error
event so failures are both logged and sent to the client.
In `@ovos_persona_server/chat.py`:
- Around line 388-398: The call to embed_solver.get_embeddings in the loop can
raise and is unhandled; wrap the per-text embedding call inside a try/except
(around embed_solver.get_embeddings(t) within the loop over texts) to catch
exceptions, log the error, and return a JSONResponse with an error message and
appropriate status code (e.g., 500) instead of letting the exception propagate;
ensure you still build and return the successful "object":"list" response when
no error occurs and reference the existing variables/functions
embed_solver.get_embeddings, texts, data, and the JSONResponse return path so
the handler consistently mirrors error handling used elsewhere.
In `@ovos_persona_server/cohere.py`:
- Around line 85-87: Remove the unused gen_id assignment or wire it into the
non-streaming response; specifically, delete the gen_id = _new_id() line in
ovos_persona_server/cohere.py (or, if gen_id was intended to identify the
generated response, include gen_id in the response payload/metadata where msg_id
is used in the non-streaming path) so there are no unused variables; update any
response construction code that should carry a generation id to reference gen_id
instead of leaving it unused.
In `@ovos_persona_server/huggingface_tgi.py`:
- Around line 93-109: The async generator _stream currently iterates
persona.stream(messages) synchronously which can block the event loop; change
the iteration to run the blocking generator on a worker thread or executor
(e.g., use asyncio.to_thread or loop.run_in_executor) so each call to
persona.stream/messages yields is performed off the event loop, then await
results and continue building accumulated/token_count and yielding the same data
events; ensure exception handling for the worker task preserves the existing
error yield behavior.
In `@pyproject.toml`:
- Line 8: Update the package description string in pyproject.toml to reflect the
actual framework: change the description field value that currently reads
"simple flask server to host OpenVoiceOS persona plugins as a service" to
mention FastAPI instead (e.g., "simple FastAPI server to host OpenVoiceOS
persona plugins as a service") so the description matches the implementation and
AUDIT.md; edit the description = "... " entry accordingly.
In `@QUICK_FACTS.md`:
- Around line 3-4: Update the QUICK_FACTS.md inventory to include the optional
A2A surface: mention the conditional /a2a mount, document the a2a_base_url
parameter on create_persona_app and the --a2a-base-url CLI flag, and note that
test/unittests/test_a2a.py adds coverage for this surface; alternatively,
explicitly state that the page only documents the seven compatibility routers
and exclude A2A—be sure to reference the create_persona_app option and CLI flag
names so readers can find the conditional behavior.
In `@README.md`:
- Around line 254-257: The code block in README.md that begins with "uv pip
install 'ovos-persona-server[a2a]'" is missing its closing triple backticks;
update the README.md by adding a closing ``` after that line so the "Then
restart the server." text is rendered outside the code block and the fenced code
block is properly terminated.
---
Nitpick comments:
In @.github/workflows/publish_stable.yml:
- Around line 9-16: The reusable workflow reference currently uses a floating
dev ref ("uses:
OpenVoiceOS/gh-automations/.github/workflows/publish-stable.yml@dev"), which
risks breaking stable releases; update that "uses:" invocation to pin to a
stable ref (for example a specific tag like `@v1.2.3` or the repository's main
branch such as `@master` once it's stable) so the publish_stable workflow
consistently runs against a known-good version, and keep the other inputs
(branch, version_file, publish_pypi, sync_dev) unchanged.
In `@docs/a2a.md`:
- Around line 139-148: Add a language specifier (e.g., "text" or "plaintext") to
the fenced code block containing the architecture diagram so markdown linters
accept it; update the opening fence for the block that starts with "A2A client
request" (the block showing A2AStarletteApplication, DefaultRequestHandler,
OVOSPersonaAgentExecutor and Persona.stream) to use ```text (or ```plaintext)
instead of bare ``` to mark it as plain text.
In `@docs/index.md`:
- Around line 33-47: Add a language specifier to the fenced code block that
contains the ASCII architecture diagram in docs/index.md (the block showing
"HTTP request └─ FastAPI app [__init__.py:create_persona_app] ..." and nested
routers/OVOSPersonaAgentExecutor/Persona.stream) so markdownlint MD040 is
satisfied; change the opening triple backticks to include the specifier (e.g.,
```text or ```plaintext) and keep the closing triple backticks unchanged.
In `@ovos_persona_server/aws_bedrock.py`:
- Around line 245-248: The token counts for input are computed twice in the
"usage" dict (the expression sum(len(m["content"].split()) for m in messages) is
duplicated); extract that value to a local variable (e.g., input_tokens =
sum(...) computed once before building the dict) and then use input_tokens for
"inputTokens" and to compute "totalTokens" (add len((text or "").split()) to
input_tokens) so the computation is done once and reused in the creation of the
"usage" object.
- Around line 37-61: The parsing logic currently assumes Converse format
whenever "messages" is in body (using variables body, msgs, result), which can
misclassify requests for specific model families; update the code that branches
on request format to first inspect the model_id parameter (e.g., check
model_id.startswith for known prefixes like "titan", "claude", "gpt" or your
project's canonical family strings) to select the correct format handler, and
only fall back to the existing body-based heuristics (the current
messages/system parsing logic) if model_id is absent or unrecognized; keep the
existing handling of body["system"] and content list merging (sys_text, text)
but gate that block behind the model_id-based decision.
In `@ovos_persona_server/cohere.py`:
- Line 182: The response currently returns StreamingResponse(_stream(),
media_type="application/json"); update it to use the NDJSON media type (e.g.,
"application/x-ndjson") for consistency with the other generate streaming
implementation, i.e., change the media_type argument on the StreamingResponse
call that wraps _stream(), and verify that the _stream() generator yields
newline-delimited JSON records so the content-type matches the payload.
- Line 129: The streaming response currently returns
media_type="application/json"; change it to use an NDJSON-compatible media type
by setting StreamingResponse(..., media_type="application/x-ndjson") (or
"text/event-stream" if you intend SSE), or better, detect and forward the Cohere
response content-type (e.g. read the upstream
response.headers.get('content-type') and pass it into StreamingResponse) so the
returned media type matches Cohere's actual streaming format; update the return
that wraps _stream() in ovos_persona_server/cohere.py accordingly.
In `@ovos_persona_server/deprecated_routers.py`:
- Around line 64-82: The middleware deprecation_header_middleware currently
emits LOG.warning for every legacy request and should be rate-limited: add a
module-level set (e.g., _logged_deprecated_paths = set()) and, inside
deprecation_header_middleware, check whether the accessed legacy path (use the
same logic with _LEGACY_PREFIX_MAP and _build_successor_path to form the key)
has already been logged; only call LOG.warning and add the path to
_logged_deprecated_paths the first time it is seen (or alternatively change
LOG.warning to LOG.info if you prefer lower verbosity). Ensure thread-safety if
necessary for your runtime (e.g., minimal locking) when updating
_logged_deprecated_paths.
In `@ovos_persona_server/huggingface_tgi.py`:
- Around line 132-148: The /info response currently returns hardcoded
placeholder fields (see the JSONResponse returned where persona.name is used)
which can mislead clients; update the handler that returns this JSONResponse to
populate fields like max_input_length, max_total_tokens, version, sha,
model_dtype, model_device_type, etc. from the persona configuration or the
actual model metadata if available (fall back to clearly labeled placeholders
only when no metadata exists), or add an explicit "placeholder": true/notice
field to the response so clients know values are not authoritative. Ensure you
reference the JSONResponse return in huggingface_tgi.py and use persona or its
metadata methods to derive the real values.
- Around line 107-109: The except Exception as exc block currently yields the
error to the SSE client but discards the exception; update that block to log the
exception before yielding by calling the module/class logger (e.g.
logger.exception(...) or logger.error(..., exc_info=True)) so the stacktrace and
message are recorded; specifically modify the except Exception as exc in
huggingface_tgi.py (the block that does yield f"data:{json.dumps({'error':
str(exc)})}\n\n") to log exc (using logger.exception or
logging.getLogger(__name__).exception) and then continue to yield and return as
before.
In `@ovos_persona_server/schemas/anthropic.py`:
- Around line 15-19: The allowed "system" role in AnthropicMessage is misleading
for Anthropic's API; change the role Literal on class AnthropicMessage to only
allow "user" and "assistant" (e.g., Literal["user","assistant"]) and update any
code that constructs or validates messages accordingly so system prompts are
supplied via the top-level system handling in _normalise_messages; ensure
references to "system" messages are removed from AnthropicMessage usages and
rely on _normalise_messages to prepend the top-level system content.
In `@pyproject.toml`:
- Around line 14-21: The keywords array in pyproject.toml currently contains
unrelated language/translation terms; update the "keywords" list to reflect this
project (persona/LLM server) by replacing entries like "lang", "detect",
"translate" with relevant terms such as "persona", "llm", "assistant", "server",
"chatbot", "conversational-ai", "virtual-assistant", "OpenVoiceOS" (i.e., edit
the keywords list symbol in pyproject.toml to contain project-appropriate tags).
In `@test/unittests/test_a2a.py`:
- Line 29: The constructor signature declares parts: List[Any] = None but should
use an explicit nullable type; update the __init__ method signature to annotate
parts as Optional[List[Any]] and add Optional to the typing imports (or use
typing.Optional) so the default None is type-correct; keep the default value
None and no other behavior changes in __init__.
In `@test/unittests/test_compat_routers.py`:
- Around line 107-108: Rename the ambiguous loop variable `l` in the list
comprehension that builds `lines` from `resp.text.splitlines()` to a clearer
name like `line` or `data_line` (e.g., `lines = [line for line in
resp.text.splitlines() if line.startswith("data:")]`) and update any references
to that variable accordingly so the intent is clear and avoids confusion with
`1` or `I`.
- Around line 198-199: Replace the ambiguous loop variable `l` with a
descriptive name `line` in the list comprehensions that build `events` and
`event_types` in test_compat_routers (change `[l for l in resp.text.splitlines()
if l.startswith("event:")]` to `[line for line in resp.text.splitlines() if
line.startswith("event:")]` and similarly update the second comprehension
`event_types = [e.split(": ", 1)[1] for e in events]` if it uses a single-letter
variable elsewhere), so both comprehensions use `line` for clarity and
consistency with surrounding code.
- Around line 278-280: The two list comprehensions use the ambiguous variable
name `l`; update them to use a descriptive name like `line` to improve
readability: change the comprehension that builds `lines` from `[l.strip() for l
in resp.text.splitlines() if l.strip()]` to use `line`, and likewise change
`events = [json.loads(l) for l in lines if l.startswith("{")]` to use `line` so
it becomes `events = [json.loads(line) for line in lines if
line.startswith("{")]`; keep the surrounding logic (resp, lines, events, assert)
unchanged.
| # Persona.stream is synchronous — run in a thread pool | ||
| chunks = await asyncio.to_thread( | ||
| lambda: list(self._persona.stream(messages)) | ||
| ) | ||
|
|
||
| for i, chunk in enumerate(chunks): | ||
| if not chunk: | ||
| continue | ||
| artifact = Artifact( | ||
| parts=[Part(root=TextPart(text=chunk))], | ||
| artifact_id=str(i), | ||
| name=f"chunk-{i}", | ||
| ) | ||
| await event_queue.enqueue_event( | ||
| TaskArtifactUpdateEvent( | ||
| task_id=context.task_id, | ||
| context_id=context.context_id, | ||
| artifact=artifact, | ||
| append=(i > 0), | ||
| last_chunk=(i == len(chunks) - 1), | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Streaming is defeated by collecting all chunks before emitting events.
The current implementation calls list(self._persona.stream(messages)) which collects all chunks into memory before emitting any TaskArtifactUpdateEvent. This defeats the purpose of streaming — clients won't receive any events until the entire response is generated.
For true SSE streaming, chunks should be emitted as they arrive. However, since persona.stream() is synchronous and runs in asyncio.to_thread, achieving true streaming requires a different approach (e.g., using a queue or async generator with run_in_executor).
If true streaming is not required for the initial implementation, consider documenting this limitation. Otherwise, refactor to emit events incrementally.
💡 Potential approach for true streaming
One approach is to use a thread-safe queue:
import queue
from concurrent.futures import ThreadPoolExecutor
async def execute(self, context: "RequestContext", event_queue: "EventQueue") -> None:
user_text = self._extract_user_text(context.message)
messages = [{"role": "user", "content": user_text}]
chunk_queue: queue.Queue = queue.Queue()
def _stream_to_queue():
try:
for chunk in self._persona.stream(messages):
chunk_queue.put(("chunk", chunk))
chunk_queue.put(("done", None))
except Exception as e:
chunk_queue.put(("error", e))
loop = asyncio.get_event_loop()
executor = ThreadPoolExecutor(max_workers=1)
loop.run_in_executor(executor, _stream_to_queue)
i = 0
while True:
# Poll queue with small sleep to yield control
try:
msg_type, data = chunk_queue.get_nowait()
except queue.Empty:
await asyncio.sleep(0.01)
continue
if msg_type == "done":
break
elif msg_type == "error":
raise data
elif msg_type == "chunk" and data:
# emit artifact event...
i += 1Alternatively, document the current behavior as "buffered streaming" where chunks are sentence-level but the full response is awaited before delivery.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/a2a.py` around lines 159 - 180, The code currently
buffers all output by calling list(self._persona.stream(messages)) in execute,
preventing incremental SSE delivery; change to drain persona.stream as chunks
arrive by running the synchronous Persona.stream in a background thread and
streaming results into the async path (e.g., push chunks into a thread-safe
queue from a background callable and consume that queue in execute), then create
and await event_queue.enqueue_event(TaskArtifactUpdateEvent(...)) for each chunk
as it is dequeued (use append/last_chunk logic based on streaming sequence), and
handle errors and a termination sentinel so streaming stops cleanly instead of
waiting for the full list.
| # route.path already contains the canonical prefix, e.g. | ||
| # '/openai/v1/chat/completions'. Replace it with the legacy prefix. | ||
| relative = route.path[len(canonical_prefix):] # '/chat/completions' | ||
| legacy_path = new_prefix + relative # '/v1/chat/completions' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify what route.path contains in FastAPI routers
ast-grep --pattern $'@$_.post(
$_,
$$$
)'Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1814
🏁 Script executed:
#!/bin/bash
# Read the deprecated_routers.py file around lines 114-117
cat -n ovos_persona_server/deprecated_routers.py | sed -n '100,130p'Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1925
🏁 Script executed:
#!/bin/bash
# Check router definitions in chat.py and ollama.py
echo "=== chat.py router definition ==="
head -50 ovos_persona_server/chat.py | grep -A 5 "chat_router"
echo -e "\n=== ollama.py router definition ==="
head -70 ovos_persona_server/ollama.py | grep -A 5 "ollama_router"Repository: OpenVoiceOS/ovos-persona-server
Length of output: 852
🏁 Script executed:
#!/bin/bash
# Check the _LEGACY_PREFIX_MAP and router registration to understand the flow
grep -A 10 "_LEGACY_PREFIX_MAP\|register_deprecated_routes" ovos_persona_server/deprecated_routers.py | head -40Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1341
🏁 Script executed:
#!/bin/bash
# Look for where register_deprecated_routes is called and what routers are passed
grep -r "register_deprecated_routes" ovos_persona_server/ --include="*.py" -B 2 -A 2Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1507
🏁 Script executed:
#!/bin/bash
# Check the full register_deprecated_routes function to see how it calls _legacy_router
sed -n '80,150p' ovos_persona_server/deprecated_routers.pyRepository: OpenVoiceOS/ovos-persona-server
Length of output: 2807
🏁 Script executed:
#!/bin/bash
# Verify the actual route.path by checking if there's any test or debug output
grep -r "route.path" ovos_persona_server/ --include="*.py" -B 2 -A 2Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1320
🏁 Script executed:
#!/bin/bash
# Create a minimal test to verify FastAPI route.path behavior
python3 << 'EOF'
from fastapi import APIRouter
from fastapi.routing import APIRoute
# Simulate the actual router setup
router = APIRouter(prefix="/openai/v1", tags=["test"])
`@router.post`("/chat/completions")
async def test_route():
return {"msg": "test"}
# Check what route.path contains
for route in router.routes:
if isinstance(route, APIRoute):
print(f"route.path: {route.path}")
print(f"route.methods: {route.methods}")
# Simulate the buggy calculation
canonical_prefix = "/openai/v1"
relative = route.path[len(canonical_prefix):]
print(f"\nWith buggy code:")
print(f" canonical_prefix: {canonical_prefix}")
print(f" route.path[len(canonical_prefix):]: {relative}")
print(f" Expected relative: /chat/completions")
# Simulate the fixed calculation
new_prefix = "/v1"
legacy_path_buggy = new_prefix + relative
legacy_path_fixed = new_prefix + route.path
print(f"\nResult:")
print(f" Buggy legacy_path: {legacy_path_buggy}")
print(f" Fixed legacy_path: {legacy_path_fixed}")
print(f" Expected: /v1/chat/completions")
EOFRepository: OpenVoiceOS/ovos-persona-server
Length of output: 194
🏁 Script executed:
#!/bin/bash
# Search for any tests or existing code that demonstrates route.path behavior
grep -r "\.path" ovos_persona_server/ --include="*.py" | grep -i route | head -20Repository: OpenVoiceOS/ovos-persona-server
Length of output: 383
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that verify the deprecated routes work correctly
find . -name "*test*" -type f -name "*.py" | head -10Repository: OpenVoiceOS/ovos-persona-server
Length of output: 153
🌐 Web query:
FastAPI APIRoute path attribute what does route.path contain with router prefix
💡 Result:
In FastAPI, the route.path attribute of an APIRoute instance contains the declared path from the path operation decorator (e.g., "/items/{item_id}"), without the router prefix. The full path (including prefix) used for matching and OpenAPI generation is available via route.path_format. When an APIRouter with prefix="/api" has @router.get("/items/{item_id}"), then: - route.path = "/items/{item_id}" - route.path_format = "/api/items/{item_id}" This is inherited from Starlette's Route class, where path is the template passed to the constructor, and path_format is the compiled regex pattern with prefix applied during router inclusion. Example from usage: for route in request.app.routes: if isinstance(route, APIRoute): print(route.path) # e.g. "/items/{item_id}" print(route.path_format) # e.g. "/api/items/{item_id}" Sources confirm route.path is used to list the base route path in middleware and iteration over app.routes (result 1), while prefixes are handled separately in APIRouter.include_router (results 6,9,10). Starlette source shows path as input template (result 24), path_format as formatted version (inferred from routing.py snippets).
Citations:
- 1: https://stackoverflow.com/questions/67586256/how-to-get-defined-route-paths-in-fastapi
- 2: https://fastapi.tiangolo.com/reference/apirouter/
- 3: https://fastapi.tiangolo.com/az/reference/apirouter/
- 4: https://fastapi.tiangolo.com/tutorial/bigger-applications/
- 5: https://github.com/fastapi/fastapi/blob/master/fastapi/routing.py
- 6: https://github.com/encode/starlette/blob/master/starlette/routing.py
🏁 Script executed:
#!/bin/bash
# Check test_compat_routers.py to see if it tests the deprecated routes
cat -n test/unittests/test_compat_routers.pyRepository: OpenVoiceOS/ovos-persona-server
Length of output: 21171
🏁 Script executed:
#!/bin/bash
# Run the deprecated route tests to confirm they actually pass
cd test/unittests && python -m pytest test_compat_routers.py::TestDeprecatedOpenAIChatRouter::test_deprecated_chat_still_works -v 2>&1 || trueRepository: OpenVoiceOS/ovos-persona-server
Length of output: 1380
🏁 Script executed:
#!/bin/bash
# Let's check what routes are actually being generated by adding debug output
# First, let's see if there's a way to inspect the actual routes
grep -r "for route in" ovos_persona_server/ --include="*.py" -B 3 -A 10Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1576
🏁 Script executed:
#!/bin/bash
# Let's write a simple test script to verify the actual behavior
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
# Simulate the logic to see what actually happens
canonical_prefix = "/openai/v1"
new_prefix = "/v1"
route_path = "/chat/completions"
print(f"canonical_prefix: {canonical_prefix}")
print(f"new_prefix: {new_prefix}")
print(f"route.path (what we expect): {route_path}")
print()
# Current buggy code
relative = route_path[len(canonical_prefix):]
legacy_path_buggy = new_prefix + relative
print(f"With current code:")
print(f" relative = route.path[len(canonical_prefix):] = {route_path}[{len(canonical_prefix)}:] = '{relative}'")
print(f" legacy_path = new_prefix + relative = '{new_prefix}' + '{relative}' = '{legacy_path_buggy}'")
print()
# Fixed code
legacy_path_fixed = new_prefix + route_path
print(f"With fixed code:")
print(f" legacy_path = new_prefix + route.path = '{new_prefix}' + '{route_path}' = '{legacy_path_fixed}'")
print()
print(f"Test expects: /v1/chat/completions")
print(f"Current code produces: {legacy_path_buggy}")
print(f"Fixed code produces: {legacy_path_fixed}")
print()
if legacy_path_buggy == "/v1/chat/completions":
print("❌ Current code works (but comment is still wrong about route.path)")
else:
print("✓ Current code is buggy as claimed")
EOFRepository: OpenVoiceOS/ovos-persona-server
Length of output: 605
Critical: route.path does not contain the router prefix — legacy routes will have malformed paths.
The comment incorrectly states that route.path already contains the canonical prefix (e.g., /openai/v1/chat/completions). In FastAPI, APIRoute.path contains only the path defined in the route decorator, not the router's prefix. For chat_router with prefix="/openai/v1" and @router.post("/chat/completions"), the route.path is just "/chat/completions".
This causes line 116 to produce incorrect paths:
"/chat/completions"[len("/openai/v1"):]→"letions"(truncates the first 10 characters)legacy_path = "/v1" + "letions"→"/v1letions"(broken route)
Expected: /v1/chat/completions
The fix is to use route.path directly:
Proposed fix
- # route.path already contains the canonical prefix, e.g.
- # '/openai/v1/chat/completions'. Replace it with the legacy prefix.
- relative = route.path[len(canonical_prefix):] # '/chat/completions'
- legacy_path = new_prefix + relative # '/v1/chat/completions'
+ # route.path is the path defined on the route (e.g., '/chat/completions'),
+ # NOT including the router's prefix. Use it directly with the legacy prefix.
+ legacy_path = new_prefix + route.path # '/v1/chat/completions'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/deprecated_routers.py` around lines 114 - 117, The code
incorrectly assumes route.path includes the router prefix; instead of slicing
route.path by canonical_prefix, set relative = route.path directly and build
legacy_path = new_prefix + relative (ensuring you normalize slashes to avoid
double or missing slashes). Update the logic around the variables route.path,
canonical_prefix, new_prefix, relative, and legacy_path to use the route.path
as-is when constructing the legacy route.
| async def _stream() -> AsyncGenerator[str, None]: | ||
| """Yield SSE events with GeminiResponse chunks.""" | ||
| try: | ||
| for chunk in persona.stream(messages): | ||
| if chunk: | ||
| resp = _build_response(chunk, model_id) | ||
| yield f"data: {json.dumps(resp.model_dump())}\n\n" | ||
| except Exception as exc: | ||
| yield f"data: {json.dumps({'error': str(exc)})}\n\n" | ||
|
|
There was a problem hiding this comment.
Blocking synchronous call in async generator — will block the event loop.
persona.stream(messages) is a synchronous call (as confirmed by the A2A module using asyncio.to_thread), but here it's called directly inside an async generator without offloading to a thread. This will block the FastAPI event loop during streaming, degrading concurrency for other requests.
Consider using asyncio.to_thread or run_in_executor to avoid blocking, similar to the pattern in a2a.py.
🐛 Proposed fix using asyncio.to_thread
+import asyncio
+
async def _stream() -> AsyncGenerator[str, None]:
"""Yield SSE events with GeminiResponse chunks."""
try:
- for chunk in persona.stream(messages):
+ # Run synchronous stream in thread pool to avoid blocking event loop
+ chunks = await asyncio.to_thread(lambda: list(persona.stream(messages)))
+ for chunk in chunks:
if chunk:
resp = _build_response(chunk, model_id)
yield f"data: {json.dumps(resp.model_dump())}\n\n"
except Exception as exc:
yield f"data: {json.dumps({'error': str(exc)})}\n\n"Note: This still collects all chunks before yielding (same limitation as A2A). For true streaming, a more complex approach with a thread-safe queue would be needed.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 122-122: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/gemini.py` around lines 115 - 124, The async generator
_stream currently calls the synchronous persona.stream(messages) directly,
blocking the event loop; change _stream to offload the blocking call to a
background thread (e.g., use asyncio.to_thread or loop.run_in_executor) when
iterating persona.stream so chunks are produced without blocking; specifically,
wrap the persona.stream(messages) iteration (inside async def _stream) in an
asyncio.to_thread call or call persona.stream via to_thread and then iterate its
returned iterator, then keep using _build_response(chunk, model_id) and yielding
the JSON SSE lines as before; follow the same offloading pattern used in a2a.py
to avoid event-loop blocking.
| @ollama_router.get("/show") | ||
| async def show(persona: Persona = Depends(get_default_persona)) -> JSONResponse: | ||
| """Show model card (Ollama-compatible stub). | ||
|
|
||
| Args: | ||
| persona: Injected persona instance. | ||
|
|
||
| Returns: | ||
| Static model card for the loaded persona. | ||
| """ | ||
| solvers: List[str] = list(persona.solvers.loaded_modules.keys()) | ||
| models_in_config: set = {persona.config.get(s, {}).get("model") | ||
| for s in persona.solvers.loaded_modules.keys()} | ||
| parent_model_str: str = "|".join(filter(None, models_in_config)) | ||
|
|
||
| details: OllamaModelDetails = OllamaModelDetails( | ||
| format="json", | ||
| family="ovos-persona", | ||
| families=solvers, | ||
| parent_model=parent_model_str, | ||
| parameter_size="", | ||
| quantization_level="", | ||
| ) | ||
| model: OllamaModel = OllamaModel( | ||
| name=persona.name, | ||
| model=persona.name, | ||
| digest="sha256:placeholder_digest", | ||
| size=0, | ||
| modified_at=timestamp(), | ||
| details=details, | ||
| ) | ||
| return JSONResponse(model.model_dump()) | ||
|
|
||
|
|
||
| @ollama_router.get("/ps") | ||
| async def ps(persona: Persona = Depends(get_default_persona)) -> JSONResponse: | ||
| """List running models (Ollama-compatible stub). | ||
|
|
||
| Args: | ||
| persona: Injected persona instance. | ||
|
|
||
| Returns: | ||
| Ollama /api/ps format with the loaded persona listed as running. | ||
| """ | ||
| solvers: List[str] = list(persona.solvers.loaded_modules.keys()) | ||
| models_in_config: set = {persona.config.get(s, {}).get("model") | ||
| for s in persona.solvers.loaded_modules.keys()} | ||
| parent_model_str: str = "|".join(filter(None, models_in_config)) | ||
|
|
||
| details: OllamaModelDetails = OllamaModelDetails( | ||
| format="json", | ||
| family="ovos-persona", | ||
| families=solvers, | ||
| parent_model=parent_model_str, | ||
| parameter_size="", | ||
| quantization_level="", | ||
| ) | ||
| model: OllamaModel = OllamaModel( | ||
| name=persona.name, | ||
| model=persona.name, | ||
| digest="sha256:placeholder_digest", | ||
| size=0, | ||
| modified_at=timestamp(), | ||
| details=details, | ||
| ) | ||
| return JSONResponse({"models": [model.model_dump()]}) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Ollama API /api/show endpoint POST request specification
💡 Result:
The Ollama /api/show endpoint is a POST request used to retrieve detailed information about a specific model. Endpoint: POST http://localhost:11434/api/show (default base URL) Request body (JSON): - model: string (required) - The name of the model to show (e.g., "gemma3", "llama3.2") - verbose: boolean (optional, default false) - If true, returns full data for verbose response fields like detailed tokenizer info Example curl request: curl http://localhost:11434/api/show -d '{ "model": "gemma3" }' Or with verbose: curl http://localhost:11434/api/show -d '{ "model": "gemma3", "verbose": true }' The response is a JSON object containing model details such as parameters, license, template, modelfile, capabilities (e.g., completion, vision), modified_at timestamp, details (format, family, quantization_level, etc.), and model_info (architecture-specific parameters). This specification is from the official Ollama OpenAPI schema and documentation.
Citations:
- 1: https://docs.ollama.com/api-reference/show-model-details
- 2: https://github.com/ollama/ollama/blob/main/docs/api.md
- 3: https://ollama.readthedocs.io/en/api/
🌐 Web query:
Ollama API /api/ps endpoint runtime fields specification
💡 Result:
The Ollama API endpoint GET /api/ps returns a list of models currently loaded into memory. The response is a JSON object with a top-level "models" array, where each item is an object describing a running model. The fields for each model (Ps schema) are: - name: string (e.g., "gemma3" or "mistral:latest") - the loaded model name. - model: string (e.g., "gemma3") - the base model name. - size: integer (e.g., 6591830464) - model size in bytes. - digest: string (e.g., "a2af6cc3eb7fa8be8504abaf9b04e88f17a119ec3f04a3addf55f92841195f5a") - SHA256 digest. - details: object containing: - parent_model: string (e.g., ""). - format: string (e.g., "gguf"). - family: string (e.g., "gemma3"). - families: array of strings (e.g., ["gemma3"]). - parameter_size: string (e.g., "4.3B"). - quantization_level: string (e.g., "Q4_K_M"). - expires_at: string (ISO 8601 timestamp, e.g., "2025-10-17T16:47:07.93355-07:00") - when the model will be unloaded. - size_vram: integer (e.g., 5333539264) - VRAM usage in bytes. - context_length: integer (e.g., 4096) - model's context length. Example response: { "models": [ { "name": "gemma3", "model": "gemma3", "size": 6591830464, "digest": "a2af6cc3eb7fa8be8504abaf9b04e88f17a119ec3f04a3addf55f92841195f5a", "details": { "parent_model": "", "format": "gguf", "family": "gemma3", "families": ["gemma3"], "parameter_size": "4.3B", "quantization_level": "Q4_K_M" }, "expires_at": "2025-10-17T16:47:07.93355-07:00", "size_vram": 5333539264, "context_length": 4096 } ] } This specification is from the official OpenAPI schema and examples in the documentation.
Citations:
- 1: https://docs.ollama.com/api/ps
- 2: https://ollama.readthedocs.io/en/api/
- 3: https://github.com/ollama/ollama/blob/main/docs/api.md
🏁 Script executed:
# Find the OllamaModel and OllamaModelDetails class definitions
rg -A 20 "class OllamaModel" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 4050
🏁 Script executed:
# Also search for OllamaModelDetails
rg -A 20 "class OllamaModelDetails" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 2153
🏁 Script executed:
# Check what the /tags endpoint returns to understand if show and ps reuse it
rg -B 5 -A 20 "@ollama_router.get\(\"/tags\"\)" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 57
🏁 Script executed:
# Check the current decorator for /show
rg -B 2 "async def show" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 274
🏁 Script executed:
# Check the /tags endpoint to understand the reuse claim
rg -B 5 -A 25 "async def tags" ovos_persona_server/ollama.pyRepository: OpenVoiceOS/ovos-persona-server
Length of output: 1541
Change /show from GET to POST and add missing Ollama API fields to both endpoints.
The /show endpoint is incorrectly decorated with @ollama_router.get but should be POST per Ollama's specification. Additionally, both /show and /ps are using the same model shape as /tags, which is incompatible with Ollama's actual API contracts:
/showresponse must includeparameters,license,template,modelfile, andcapabilitiesfields/psresponse must include runtime fields:expires_at,size_vram, andcontext_length
The current OllamaModel and OllamaModelDetails classes lack these fields, so Ollama-compatible clients will break on these endpoints.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 377-377: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[warning] 411-411: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/ollama.py` around lines 376 - 441, The /show handler is
incorrectly a GET and both /show and /ps are missing Ollama-required fields;
change the decorator on the show function from `@ollama_router.get` to
`@ollama_router.post`, extend the OllamaModel and OllamaModelDetails dataclasses
to include the missing fields (for show: parameters, license, template,
modelfile, capabilities; for ps/details/runtime: expires_at, size_vram,
context_length or include a runtime sub-object returned by ps), populate those
fields with sensible defaults or derived values in show() and ps() (use
persona.name, parent_model_str, timestamp() for expires_at or null semantics)
and ensure model.model_dump() includes the new fields so the /show and /ps
responses match Ollama's API contract (refer to functions show, ps and classes
OllamaModel, OllamaModelDetails).
| @ollama_router.post("/embeddings") | ||
| async def ollama_embeddings( | ||
| request_body: OllamaEmbedRequest, | ||
| persona: Persona = Depends(get_default_persona), | ||
| ) -> JSONResponse: | ||
| """Generate embeddings (Ollama-compatible stub). | ||
|
|
||
| Delegates to persona's embeddings solver if available. | ||
|
|
||
| Args: | ||
| request_body: Ollama embed request with model and input. | ||
| persona: Injected persona instance. | ||
|
|
||
| Returns: | ||
| Ollama-format embeddings response or 501 if not supported. | ||
|
|
||
| Raises: | ||
| HTTPException: 501 if no embeddings solver is configured. | ||
| """ | ||
| solver_names = list(persona.solvers.loaded_modules.keys()) | ||
| embed_solver = None | ||
| for name in solver_names: | ||
| solver = persona.solvers.loaded_modules[name] | ||
| if hasattr(solver, "get_embeddings"): | ||
| embed_solver = solver | ||
| break | ||
|
|
||
| if embed_solver is None: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_501_NOT_IMPLEMENTED, | ||
| detail="No embeddings solver configured for this persona.", | ||
| ) | ||
|
|
||
| input_text = request_body.input if isinstance(request_body.input, str) else " ".join(request_body.input) | ||
| vec = embed_solver.get_embeddings(input_text) | ||
| return JSONResponse(OllamaEmbedResponse(embeddings=[vec]).model_dump()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n ovos_persona_server/ollama.py | sed -n '1,50p'Repository: OpenVoiceOS/ovos-persona-server
Length of output: 1814
🏁 Script executed:
# Find the data model definitions for OllamaEmbedRequest and OllamaEmbedResponse
rg -n "class OllamaEmbedRequest|class OllamaEmbedResponse" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 233
🏁 Script executed:
# Get broader context of the ollama.py file
wc -l ovos_persona_server/ollama.pyRepository: OpenVoiceOS/ovos-persona-server
Length of output: 108
🌐 Web query:
Ollama API embeddings endpoint POST /api/embed batch input response format
💡 Result:
The Ollama embeddings endpoint is POST /api/embed (note: the deprecated /api/embeddings uses "prompt" instead of "input"). Batch input: JSON object with "model": string (required), "input": array of strings (for batch), optional "options", "truncate", "keep_alive". Example request: { "model": "embeddinggemma", "input": [ "First sentence", "Second sentence", "Third sentence" ] } Response: JSON object with "model": string, "embeddings": array of arrays (one float array per input string), timings like "total_duration", "load_duration". Example response structure: { "model": "embeddinggemma", "embeddings": [ [0.010071029, -0.0017594862, ...], [...] ], "total_duration": 14143917, ... } Official docs confirm /api/embed supports batch via "input" array, returning parallel embeddings array. Deprecated /api/embeddings used "prompt_batch" / "embedding_batch" in early proposals but current standard is /api/embed.
Citations:
🏁 Script executed:
cat -n ovos_persona_server/schemas/ollama.py | sed -n '204,240p'Repository: OpenVoiceOS/ovos-persona-server
Length of output: 3174
Process each input separately to preserve batch semantics.
The Ollama embeddings endpoint accepts input as either a string or array and returns one embedding per input item. The current code joins array inputs into a single string, causing it to return one embedding instead of a per-item embedding array. This breaks compatibility with batch callers and changes both output cardinality and semantics.
Handle string and array inputs separately: when input is an array, call get_embeddings() for each item and collect the results into a list, matching the expected OllamaEmbedResponse structure of embeddings: List[List[float]] (one embedding vector per input).
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 489-489: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/ollama.py` around lines 486 - 521, In ollama_embeddings,
don't join list inputs into one string; instead detect if request_body.input is
a list and for each item call the chosen embed_solver.get_embeddings(item) to
produce a list of vectors, preserving input order and returning
OllamaEmbedResponse(embeddings=[...]) where embeddings is a List[List[float]]
(for a single string input still wrap the single vector in a list). Update the
branch that currently sets input_text and calls embed_solver.get_embeddings to
handle both cases (string -> single vector wrapped in list; list -> map/loop
calling get_embeddings per element), keeping the rest of the logic (solver
selection via persona.solvers.loaded_modules) unchanged.
| input_text = request_body.input if isinstance(request_body.input, str) else " ".join(request_body.input) | ||
| vec = embed_solver.get_embeddings(input_text) | ||
| return JSONResponse(OllamaEmbedResponse(embeddings=[vec]).model_dump()) |
There was a problem hiding this comment.
Populate the required model field before returning.
Line 521 constructs OllamaEmbedResponse with only embeddings, but model is required by the schema. Any successful embeddings request will fail validation on the happy path.
🐛 Proposed fix
- vec = embed_solver.get_embeddings(input_text)
- return JSONResponse(OllamaEmbedResponse(embeddings=[vec]).model_dump())
+ vec = embed_solver.get_embeddings(input_text)
+ return JSONResponse(
+ OllamaEmbedResponse(model=persona.name, embeddings=[vec]).model_dump()
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ovos_persona_server/ollama.py` around lines 519 - 521, The response is
missing the required model field in OllamaEmbedResponse; set the model to the
identifier used by the embedder before returning — e.g., after computing vec
from embed_solver.get_embeddings(input_text) construct OllamaEmbedResponse with
embeddings=[vec] and model=embed_solver.model (or embed_solver.model_name if
that property is used by your embed_solver), falling back to a sensible string
if neither exists; update the return in the block that builds input_text, calls
embed_solver.get_embeddings and returns JSONResponse so schema validation
passes.
| dependencies = [ | ||
| "fastapi", | ||
| "ovos-persona", | ||
| "pydantic", | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify uvicorn is used in the codebase
rg -n "import uvicorn|from uvicorn" ovos_persona_server/Repository: OpenVoiceOS/ovos-persona-server
Length of output: 124
🏁 Script executed:
# Check __main__.py to see how uvicorn is used
cat -n ovos_persona_server/__main__.pyRepository: OpenVoiceOS/ovos-persona-server
Length of output: 1655
🏁 Script executed:
# Check the full pyproject.toml for uvicorn in all sections
cat -n pyproject.tomlRepository: OpenVoiceOS/ovos-persona-server
Length of output: 1511
Add uvicorn to runtime dependencies.
The console script entry point (ovos-persona-server) calls main() in __main__.py, which imports and uses uvicorn.run() at line 38, but uvicorn is not listed in the dependencies.
Fix
dependencies = [
"fastapi",
"ovos-persona",
"pydantic",
+ "uvicorn",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dependencies = [ | |
| "fastapi", | |
| "ovos-persona", | |
| "pydantic", | |
| ] | |
| dependencies = [ | |
| "fastapi", | |
| "ovos-persona", | |
| "pydantic", | |
| "uvicorn", | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` around lines 22 - 26, The package runtime is missing uvicorn
which is imported/used by the console entry point; add "uvicorn" to the
dependencies array in pyproject.toml so the runtime dependency is installed.
Specifically, update the dependencies list that currently contains "fastapi",
"ovos-persona", and "pydantic" to include "uvicorn" so that __main__.py's main()
(which calls uvicorn.run()) and the ovos-persona-server entry point work
correctly.
Summary by CodeRabbit
New Features
/modelsendpoint for model listing.workflow_dispatch.Documentation
Chores
Tests