Conversation
WalkthroughThe pull request migrates the application from Flask to FastAPI, introducing a modular router-based architecture with dependency injection for persona management. It adds comprehensive Pydantic schemas for OpenAI-compatible and Ollama-compatible endpoints, replaces inline route handlers with separate modules, and updates the CLI to use uvicorn instead of Flask's development server. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as __main__.py
participant App as FastAPI App
participant Persona as Persona Manager
participant ChatRouter as /v1 Router
participant OllamaRouter as /api Router
User->>CLI: python -m ovos_persona_server --persona path/to/persona.json
CLI->>App: create_persona_app(persona_path)
App->>Persona: Load persona from JSON
Persona->>App: Initialize default_persona
App->>ChatRouter: Register router with lifespan
App->>OllamaRouter: Register router with lifespan
CLI->>App: uvicorn.run(host=0.0.0.0, port=8337)
App-->>User: Server listening
User->>ChatRouter: POST /v1/chat/completions
ChatRouter->>Persona: get_default_persona()
Persona-->>ChatRouter: Return Persona instance
ChatRouter->>ChatRouter: Convert messages & generate ID
alt Streaming Requested
ChatRouter->>ChatRouter: persona.stream()
ChatRouter-->>User: Server-Sent Events (chunks)
else Non-Streaming
ChatRouter->>ChatRouter: persona.chat()
ChatRouter-->>User: CreateChatCompletionResponse (JSON)
end
User->>OllamaRouter: POST /api/chat
OllamaRouter->>Persona: get_default_persona()
Persona-->>OllamaRouter: Return Persona instance
OllamaRouter->>OllamaRouter: Convert OllamaChatRequest
alt Streaming Requested
OllamaRouter->>OllamaRouter: persona.stream()
OllamaRouter-->>User: Line-delimited JSON (chunks)
else Non-Streaming
OllamaRouter->>OllamaRouter: persona.chat()
OllamaRouter-->>User: OllamaChatResponse (JSON)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Rationale: The diff spans nine files with significant heterogeneity—a core architectural migration from Flask to FastAPI, introduction of two new modular routers with async streaming logic, two comprehensive schema modules (~40 Pydantic classes total), and a new dependency injection pattern. While individual router endpoints follow similar patterns, the variety of concerns (streaming response handling, error handling, schema validation, lifespan management, Ollama/OpenAI API compatibility layers) and the density of logic in chat.py and ollama.py require careful reasoning across each component. Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (16)
ovos_persona_server/schemas/__init__.py (1)
1-6: Consider re-exporting public schemas.Expose commonly used request/response models here to give users a stable import path (e.g., from .openai_chat import CreateChatCompletionRequest, ...).
ovos_persona_server/persona.py (1)
18-25: Docstring misleads: this dependency doesn’t load the persona.It only returns the preloaded global or raises. Reword to avoid “loads and returns”.
Apply:
- """ - Asynchronously loads and returns the default persona. + """ + Return the default persona initialized at app startup.ovos_persona_server/chat.py (5)
29-31: Rename unused lifespan arg to underscore to silence linters.Apply:
-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]: +async def lifespan(_: FastAPI) -> AsyncGenerator[None, None]:
9-12: Usesecretsfor request IDs (more robust, fewer deps).Replace
random.choiceswithsecrets.token_urlsafe; droprandom/stringimports.Apply:
-import random -import string +from secrets import token_urlsafe @@ - completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28)) + completion_id: str = token_urlsafe(21) # ~28 chars URL-safe @@ - completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28)) + completion_id: str = token_urlsafe(21)Also applies to: 72-73, 238-239
138-139: Only includeusagein the final stream chunk when requested.OpenAI-style SSE includes
usageonly in the last chunk and only whenstream_options.include_usageis true.Apply:
- system_fingerprint=None, - usage=CompletionUsage(prompt_tokens=0, completion_tokens=0, total_tokens=0) + system_fingerprint=None @@ - system_fingerprint=None, - usage=CompletionUsage(prompt_tokens=0, completion_tokens=current_completion_tokens, - total_tokens=current_completion_tokens) + system_fingerprint=NoneThe existing final chunk already adds
usageconditionally (lines 180-188).Also applies to: 156-158, 167-188
193-193: Add SSE headers to avoid buffering.Helps proxies/servers stream correctly.
Apply:
- return StreamingResponse(streaming_chat_response(), media_type="text/event-stream") + return StreamingResponse( + streaming_chat_response(), + media_type="text/event-stream", + headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"} + )And likewise for the legacy endpoint’s StreamingResponse.
Also applies to: 316-316
120-123: Avoid blindexcept Exception; at least log context or narrow expected errors.Catching all exceptions obscures failures. Consider catching known persona errors, or log and attach a trace id.
Example:
- except Exception as e: + except Exception as e: + # log exception details here with request_id/completion_id raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Persona chat failed: {e}") from eSame idea for the legacy non-streaming branch and the streaming
exceptblock.Also applies to: 271-274, 297-299
ovos_persona_server/__init__.py (2)
1-8: Docstring overclaims DB initialization not present in codeMentions “unified SQLite database initialization using SQLAlchemy” but no such logic exists. Align docs to actual behavior to avoid confusion.
37-39: Avoid variable shadowing and use DI soonAssigning to both local and global with the same name (“persona”) harms readability. Also add a TODO reference to migrate to DI.
- # TODO - move to dependency injection - ovos_persona_server.persona.default_persona = persona = Persona(persona["name"], persona) + # TODO: move to dependency injection (FastAPI Depends) instead of a global + persona_obj = Persona(persona_data["name"], persona_data) + ovos_persona_server.persona.default_persona = persona_objovos_persona_server/schemas/ollama.py (2)
85-91: Constrain roles to valid valuesUse
Literal["system","user","assistant","tool"](or an Enum) forroleinstead of barestrto catch client errors early and improve the schema.- role: str = Field(..., description="The role of the message, either 'system', 'user', 'assistant', or 'tool'.") + role: Literal["system","user","assistant","tool"] = Field(..., description="...")
52-60: Guard against double-encoding in arguments serializerIf
argumentsis already a string,json.dumpswill add quotes. Handle both dict and str.- def serialize_arguments(self, arguments: Dict[str, Any]) -> str: + def serialize_arguments(self, arguments: Dict[str, Any]) -> str: """ @@ - return json.dumps(arguments) + if isinstance(arguments, str): + return arguments + return json.dumps(arguments)ovos_persona_server/ollama.py (4)
35-44: lifespan param is unused; rename to _appSilence ARG001 and clarify intent.
-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]: +async def lifespan(_app: FastAPI) -> AsyncGenerator[None, None]:
59-66: FastAPI Depends flagged by Ruff B008 (false positive)Using
Depends(...)in parameters is idiomatic FastAPI. If Ruff enforces B008, add# noqa: B008per param or ignore in tool config for FastAPI modules.
Would you like a small.ruff.tomlsnippet to ignore B008 underovos_persona_server/**? [Based on learnings]Also applies to: 190-198
222-224: Remove unused variablesuffix
suffixis assigned but not used.- suffix: Optional[str] = request_body.suffix
343-350: Typing for set literal may break on Python 3.9Use
Set[str]from typing for broader compatibility and filter outNonebefore building parent_model.- models_in_config: set[Optional[str]] = {persona.config.get(s, {}).get("model") - for s in persona.solvers.loaded_modules.keys()} - parent_model_str: str = "|".join(filter(None, models_in_config)) + from typing import Set + models_in_config: Set[Optional[str]] = {persona.config.get(s, {}).get("model") + for s in persona.solvers.loaded_modules.keys()} + parent_model_str: str = "|".join([m for m in models_in_config if m])ovos_persona_server/schemas/openai_chat.py (1)
84-86: Prefer default_factory for list/dict defaults across modelsSeveral fields default to
[]or{}. While Pydantic v2 protects against shared state,default_factorycommunicates intent and avoids edge cases.Example pattern:
- tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field([], description="...") + tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field(default_factory=list, description="...")Apply similarly for other list/dict fields in this module.
Also applies to: 96-101, 107-121, 147-151, 163-171, 177-178, 180-186, 188-194, 200-210, 232-236, 293-295, 303-305, 311-315, 327-334, 340-369, 372-396, 398-409, 436-445, 447-456, 489-493
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
ovos_persona_server/__init__.py(1 hunks)ovos_persona_server/__main__.py(1 hunks)ovos_persona_server/chat.py(1 hunks)ovos_persona_server/ollama.py(1 hunks)ovos_persona_server/persona.py(1 hunks)ovos_persona_server/schemas/__init__.py(1 hunks)ovos_persona_server/schemas/ollama.py(1 hunks)ovos_persona_server/schemas/openai_chat.py(1 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
ovos_persona_server/__main__.py (1)
ovos_persona_server/__init__.py (1)
create_persona_app(20-67)
ovos_persona_server/ollama.py (2)
ovos_persona_server/persona.py (1)
get_default_persona(17-36)ovos_persona_server/schemas/ollama.py (9)
OllamaChatResponse(8-38)OllamaTagsResponse(194-201)OllamaChatRequest(93-115)OllamaGenerateRequest(118-151)OllamaModelDetails(154-171)OllamaModel(174-191)OllamaChatMessage(72-90)OllamaEmbedRequest(204-222)OllamaEmbedResponse(225-240)
ovos_persona_server/chat.py (2)
ovos_persona_server/persona.py (1)
get_default_persona(17-36)ovos_persona_server/schemas/openai_chat.py (10)
CreateChatCompletionRequest(336-369)CreateChatCompletionResponse(372-395)CreateChatCompletionStreamResponse(398-408)ChatCompletionResponseMessage(297-304)ChatCompletionChoice(307-314)ChatCompletionStreamChoice(317-324)CompletionUsage(327-333)FinishReason(21-27)CreateCompletionRequest(460-480)CreateCompletionResponse(483-493)
🪛 Ruff (0.14.0)
ovos_persona_server/persona.py
35-35: f-string without any placeholders
Remove extraneous f prefix
(F541)
ovos_persona_server/__main__.py
23-23: Possible binding to all interfaces
(S104)
ovos_persona_server/ollama.py
36-36: Unused function argument: app
(ARG001)
60-60: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
165-165: Do not catch blind exception: Exception
(BLE001)
191-191: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
222-222: Local variable suffix is assigned to but never used
Remove assignment to unused variable suffix
(F841)
303-303: Do not catch blind exception: Exception
(BLE001)
328-328: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
ovos_persona_server/chat.py
30-30: Unused function argument: app
(ARG001)
49-49: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
72-72: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
160-160: Do not catch blind exception: Exception
(BLE001)
199-199: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
231-231: f-string without any placeholders
Remove extraneous f prefix
(F541)
234-234: f-string without any placeholders
Remove extraneous f prefix
(F541)
238-238: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
297-297: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (1)
ovos_persona_server/ollama.py (1)
111-131: /chat non-stream: OK shape, but use NDJSON for streaming companionNo change needed here. See streaming comment below.
| def create_persona_app(persona_path: str) -> FastAPI: | ||
| """ | ||
| Creates and configures the FastAPI application for the Persona Server. | ||
|
|
||
| return app.response_class(streaming(), mimetype="text/event-stream") | ||
| Args: | ||
| persona_path (Optional[str]): Optional path to a persona JSON file. | ||
| If provided, it overrides the default | ||
| persona path from settings or environment. | ||
|
|
||
| ############ | ||
| # Ollama api compat | ||
| @app.route("/api/chat", methods=["POST"]) | ||
| def chat(): | ||
| model = request.json.get("model") | ||
| messages = request.json.get("messages") | ||
| tools = request.json.get("tools") | ||
| stream = request.json.get("stream") | ||
| Returns: | ||
| FastAPI: The configured FastAPI application instance. | ||
| """ |
There was a problem hiding this comment.
Harden persona loading and align signature/docs
- Signature requires str but doc says Optional[str]. Pick one and enforce.
- Add explicit error handling for missing/invalid files and invalid JSON.
- Avoid reusing variable name persona for both dict and Persona instance.
Apply:
-def create_persona_app(persona_path: str) -> FastAPI:
+def create_persona_app(persona_path: str) -> FastAPI:
@@
- with open(persona_path) as f:
- persona = json.load(f)
- persona["name"] = persona.get("name") or os.path.basename(persona_path)
+ try:
+ with open(persona_path, encoding="utf-8") as f:
+ persona_data = json.load(f)
+ except FileNotFoundError as e:
+ raise RuntimeError(f"Persona file not found: {persona_path}") from e
+ except json.JSONDecodeError as e:
+ raise RuntimeError(f"Invalid persona JSON at {persona_path}: {e}") from e
+ persona_data["name"] = persona_data.get("name") or os.path.basename(persona_path)Also applies to: 33-36
🤖 Prompt for AI Agents
In ovos_persona_server/__init__.py around lines 20-31 (also apply same changes
to lines 33-36), change the function signature to accept Optional[str] (from
typing import Optional) and update the docstring to match; then add explicit
checks and exception handling: if persona_path is None, use settings/default
path; if the resolved path does not exist raise a ValueError or HTTPException
with a clear message; when reading the file wrap the open/read in try/except to
catch IOErrors and raise a clear error; parse JSON inside a try/except to catch
JSONDecodeError and raise a descriptive error; lastly, avoid reusing the name
persona for two types—use persona_data for the raw dict and persona_obj for the
constructed Persona instance.
| app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=["*"], # Allows all origins | ||
| allow_credentials=True, | ||
| allow_methods=["*"], # Allows all methods (GET, POST, etc.) | ||
| allow_headers=["*"], # Allows all headers | ||
| ) |
There was a problem hiding this comment.
CORS: wildcard origins + credentials is unsafe/invalid
allow_origins=["*"] together with allow_credentials=True is not permitted by browsers and weakens security. Either a) specify explicit origins, or b) keep wildcard but set allow_credentials=False.
Minimal fix:
- allow_origins=["*"], # Allows all origins
- allow_credentials=True,
+ allow_origins=os.getenv("OVOS_ALLOWED_ORIGINS", "*").split(","),
+ allow_credentials=False,I can wire this to settings/env and add tests.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ovos_persona_server/__init__.py around lines 50-56, the CORS middleware
currently uses allow_origins=["*"] while allow_credentials=True which is
invalid/unsafe; change to load an explicit list of allowed origins from
configuration or environment and pass that list to allow_origins (e.g.,
ALLOWED_ORIGINS from settings), keep allow_credentials=True only if
ALLOWED_ORIGINS is not ["*"], otherwise set allow_credentials=False; add
validation to raise or log if ALLOWED_ORIGINS contains "*" while
allow_credentials is True and add tests to cover both the explicit-origins and
wildcard-without-credentials scenarios.
| parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str) | ||
| parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str) | ||
| parser.add_argument("--port", help="Port to run server on", default=8337, type=int) | ||
| args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic | ||
|
|
||
| app = get_app(os.path.expanduser(args.persona)) | ||
| app = create_persona_app(args.persona) | ||
|
|
||
| app.run(host=args.host, port=args.port, debug=False) | ||
| uvicorn.run(app, port=args.port, host=args.host, log_level="debug") |
There was a problem hiding this comment.
Make --persona required to avoid runtime crash.
create_persona_app opens the given path unconditionally; passing None (current default) will crash. Make the CLI arg required.
Apply:
- parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)
+ parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str)Optional:
- Default host to 127.0.0.1 for dev and let users opt-in to 0.0.0.0.
- Make
--log-levelconfigurable (defaultinfo) instead of hardcodeddebug.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str) | |
| parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str) | |
| parser.add_argument("--port", help="Port to run server on", default=8337, type=int) | |
| args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic | |
| app = get_app(os.path.expanduser(args.persona)) | |
| app = create_persona_app(args.persona) | |
| app.run(host=args.host, port=args.port, debug=False) | |
| uvicorn.run(app, port=args.port, host=args.host, log_level="debug") | |
| parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str) | |
| parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str) | |
| parser.add_argument("--port", help="Port to run server on", default=8337, type=int) | |
| args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic | |
| app = create_persona_app(args.persona) | |
| uvicorn.run(app, port=args.port, host=args.host, log_level="debug") |
🧰 Tools
🪛 Ruff (0.14.0)
23-23: Possible binding to all interfaces
(S104)
🤖 Prompt for AI Agents
In ovos_persona_server/__main__.py around lines 22 to 29, the CLI currently
allows --persona to be omitted which causes create_persona_app to crash when
passed None; change the parser.add_argument for --persona to make it required
(remove default and add required=True) so the program fails fast with argparse
if not provided. While editing, optionally change the --host default to
"127.0.0.1" and add a new --log-level argument (default "info") that is passed
into uvicorn.run instead of the hardcoded "debug" to make logging configurable.
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to generate completion: token_ids support not yet implemented") | ||
| elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt): | ||
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to generate completion: token_ids support not yet implemented") | ||
| else: |
There was a problem hiding this comment.
Use 501 for unimplemented token_id prompts and fix f-strings.
These paths aren’t implemented; return Not Implemented and remove extraneous f-strings.
Apply:
- raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to generate completion: token_ids support not yet implemented")
+ raise HTTPException(
+ status_code=status.HTTP_501_NOT_IMPLEMENTED,
+ detail="Failed to generate completion: token_ids support not yet implemented"
+ )
@@
- raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to generate completion: token_ids support not yet implemented")
+ raise HTTPException(
+ status_code=status.HTTP_501_NOT_IMPLEMENTED,
+ detail="Failed to generate completion: token_ids support not yet implemented"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail=f"Failed to generate completion: token_ids support not yet implemented") | |
| elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt): | |
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail=f"Failed to generate completion: token_ids support not yet implemented") | |
| else: | |
| raise HTTPException( | |
| status_code=status.HTTP_501_NOT_IMPLEMENTED, | |
| detail="Failed to generate completion: token_ids support not yet implemented" | |
| ) | |
| elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt): | |
| raise HTTPException( | |
| status_code=status.HTTP_501_NOT_IMPLEMENTED, | |
| detail="Failed to generate completion: token_ids support not yet implemented" | |
| ) | |
| else: |
🧰 Tools
🪛 Ruff (0.14.0)
231-231: f-string without any placeholders
Remove extraneous f prefix
(F541)
234-234: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In ovos_persona_server/chat.py around lines 230 to 235, the branches raising
HTTPException for token_id prompts should return HTTP 501 Not Implemented and
not use unnecessary f-strings; replace
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR with
status.HTTP_501_NOT_IMPLEMENTED (or the corresponding numeric 501 constant) on
both raises and change detail=f"...token_ids support not yet implemented" to a
plain string detail="token_ids support not yet implemented" (remove the
f-prefix) so the responses correctly indicate unimplemented functionality.
| async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]: | ||
| """ | ||
| Asynchronously streams Ollama-compatible chat completion chunks. | ||
| """ | ||
| start_time: float = time.time() | ||
| # Placeholders for metrics in streaming chunks | ||
| streaming_load_duration: int = 0 | ||
| streaming_prompt_eval_count: int = 0 | ||
| streaming_prompt_eval_duration: int = 0 | ||
| streaming_eval_count: int = 0 | ||
| streaming_eval_duration: int = 0 | ||
|
|
||
| try: | ||
| for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit): | ||
| if chunk: | ||
| # Increment eval_count for each chunk (approximate) | ||
| streaming_eval_count += len(chunk.split()) # Simple token count approximation | ||
| yield json.dumps({ | ||
| "model": persona.name, | ||
| "created_at": ts, | ||
| "message": {"role": "assistant", "content": chunk}, | ||
| "done": False, | ||
| # Update total_duration dynamically for streaming chunks | ||
| "total_duration": int((time.time() - start_time) * 1_000_000_000), | ||
| "load_duration": streaming_load_duration, | ||
| "prompt_eval_count": streaming_prompt_eval_count, | ||
| "prompt_eval_duration": streaming_prompt_eval_duration, | ||
| "eval_count": streaming_eval_count, | ||
| "eval_duration": streaming_eval_duration # This would be cumulative in a real scenario | ||
| }) + "\n" | ||
| except Exception as e: | ||
| # Handle streaming errors gracefully | ||
| yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n" | ||
| return # Stop the generator | ||
|
|
||
| end_time: float = time.time() | ||
| streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000) | ||
|
|
||
| yield json.dumps({ | ||
| "model": persona.name, | ||
| "created_at": ts, | ||
| "message": {"role": "assistant", "content": ""}, # Empty content for final done chunk | ||
| "done": True, | ||
| "total_duration": streaming_total_duration, | ||
| "load_duration": streaming_load_duration, | ||
| "prompt_eval_count": streaming_prompt_eval_count, | ||
| "prompt_eval_duration": streaming_prompt_eval_duration, | ||
| "eval_count": streaming_eval_count, | ||
| "eval_duration": streaming_eval_duration, | ||
| "done_reason": done_reason | ||
| }) + "\n" | ||
|
|
||
| return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Streaming content-type should be NDJSON and avoid blocking the event loop
- Ollama streams line-delimited JSON; use
application/x-ndjson. - Iterating a sync generator in an async def can block the loop. Use Starlette’s
iterate_in_threadpool. - Handle
asyncio.CancelledErrorseparately to avoid masking client disconnects.
+from starlette.concurrency import iterate_in_threadpool
+import asyncio
@@
- async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
+ async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
@@
- try:
- for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit):
+ try:
+ async for chunk in iterate_in_threadpool(persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit)):
if chunk:
@@
- except Exception as e:
+ except asyncio.CancelledError:
+ raise
+ except Exception as e:
# Handle streaming errors gracefully
yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
return # Stop the generator
@@
- return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json")
+ return StreamingResponse(streaming_ollama_chat_response(), media_type="application/x-ndjson")As per FastAPI/Starlette best practices for streaming. [Based on learnings]
🧰 Tools
🪛 Ruff (0.14.0)
165-165: Do not catch blind exception: Exception
(BLE001)
| async def streaming_ollama_generate_response() -> AsyncGenerator[str, None]: | ||
| """ | ||
| Asynchronously streams Ollama-compatible generation chunks. | ||
| """ | ||
| start_time: float = time.time() | ||
| # Placeholders for metrics in streaming chunks | ||
| streaming_load_duration: int = 0 | ||
| streaming_prompt_eval_count: int = 0 | ||
| streaming_prompt_eval_duration: int = 0 | ||
| streaming_eval_count: int = 0 | ||
| streaming_eval_duration: int = 0 | ||
|
|
||
| try: | ||
| # Use persona.stream for streaming generation | ||
| for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit): | ||
| if chunk: | ||
| # Increment eval_count for each chunk (approximate) | ||
| streaming_eval_count += len(chunk.split()) # Simple token count approximation | ||
| yield json.dumps({ | ||
| "model": persona.name, | ||
| "created_at": ts, | ||
| "response": chunk, # Ollama /generate uses "response" key for content | ||
| "done": False, | ||
| "total_duration": int((time.time() - start_time) * 1_000_000_000), | ||
| "load_duration": streaming_load_duration, | ||
| "prompt_eval_count": streaming_prompt_eval_count, | ||
| "prompt_eval_duration": streaming_prompt_eval_duration, | ||
| "eval_count": streaming_eval_count, | ||
| "eval_duration": streaming_eval_duration # This would be cumulative in a real scenario | ||
| }) + "\n" | ||
| except Exception as e: | ||
| yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n" | ||
| return | ||
|
|
||
| end_time: float = time.time() | ||
| streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000) | ||
|
|
||
| yield json.dumps({ | ||
| "model": persona.name, | ||
| "created_at": ts, | ||
| "response": "", # Empty content for final done chunk in /generate | ||
| "done": True, | ||
| "total_duration": streaming_total_duration, | ||
| "load_duration": streaming_load_duration, | ||
| "prompt_eval_count": streaming_prompt_eval_count, | ||
| "prompt_eval_duration": streaming_prompt_eval_duration, | ||
| "eval_count": streaming_eval_count, | ||
| "eval_duration": streaming_eval_duration, | ||
| "done_reason": done_reason | ||
| }) + "\n" | ||
|
|
||
| return StreamingResponse(streaming_ollama_generate_response(), media_type="application/json") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
/generate streaming: same NDJSON + threadpool + CancelledError pattern
Apply the same changes as in /chat streaming:
+from starlette.concurrency import iterate_in_threadpool
+import asyncio
@@
- try:
- # Use persona.stream for streaming generation
- for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit):
+ try:
+ async for chunk in iterate_in_threadpool(persona.stream(messages, lang=sess.lang, units=sess.system_unit)):
if chunk:
@@
- except Exception as e:
+ except asyncio.CancelledError:
+ raise
+ except Exception as e:
yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
return🧰 Tools
🪛 Ruff (0.14.0)
303-303: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
ovos_persona_server/ollama.py lines 273-324: the /generate streaming handler
currently iterates persona.stream directly and returns JSON/plain application,
but needs the same NDJSON, threadpool offload, and CancelledError handling used
in /chat streaming; change the generator to run persona.stream in an executor
(e.g., asyncio.to_thread or loop.run_in_executor) so blocking IO doesn't block
the event loop, yield each chunk as NDJSON (one JSON object per line) with
"done": False, catch asyncio.CancelledError separately to stop streaming cleanly
without emitting an error chunk, catch other Exceptions to emit an error NDJSON
chunk and return, and finally emit the final "done": True NDJSON chunk; also
return StreamingResponse with media_type "application/x-ndjson".
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to load persona") |
There was a problem hiding this comment.
Remove unnecessary f-string.
No placeholders used.
Apply:
- raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to load persona")
+ raise HTTPException(
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ detail="Failed to load persona"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail=f"Failed to load persona") | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail="Failed to load persona" | |
| ) |
🧰 Tools
🪛 Ruff (0.14.0)
35-35: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In ovos_persona_server/persona.py around lines 34 to 35, the HTTPException
detail uses an unnecessary f-string with no placeholders; replace the f-string
with a plain string literal "Failed to load persona" (remove the leading f) so
the detail is a normal string.
| model: Optional[str] = Field(..., description="The name of the model that generated the response.") | ||
| created_at: str = Field(..., description="The timestamp when the response was created (ISO 8601 format).") | ||
| message: Dict[str, str] = Field(..., description="The message object containing the role and content of the assistant's reply.") | ||
| done: bool = Field(..., description="Indicates if the response is the final one in a stream.") | ||
| context: Optional[List[int]] = Field(None, description="An encoding of the conversation used, can be sent in the next request to keep conversational memory (list of token IDs).") | ||
| total_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.") | ||
| load_duration: Optional[int] = Field(None, description="Time spent loading the model into memory in nanoseconds.") | ||
| prompt_eval_count: Optional[int] = Field(None, description="Number of tokens in the prompt.") | ||
| prompt_eval_duration: Optional[int] = Field(None, description="Time spent evaluating the prompt in nanoseconds.") | ||
| eval_count: Optional[int] = Field(None, description="Number of tokens in the response.") | ||
| eval_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.") | ||
| done_reason: Optional[str] = Field(None, description="The reason the model stopped generating (e.g., 'stop', 'unload').") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Required fields typed as Optional with Field(...)
Several fields are marked Optional[...] but declared required with Field(...). This confuses validation and OpenAPI. Make them required (drop Optional) or truly optional (default None).
Examples to fix:
- model: Optional[str] = Field(..., description="The name of the model ...")
+ model: str = Field(..., description="The name of the model ...")
- model: Optional[str] = Field(..., description="The model to use for the chat.")
+ model: str = Field(..., description="The model to use for the chat.")
- model: Optional[str] = Field(..., description="The model name to use for generation.")
+ model: str = Field(..., description="The model name to use for generation.")
- model: Optional[str] = Field(..., description="The base model name.")
+ model: str = Field(..., description="The base model name.")
- model: Optional[str] = Field(..., description="The model name to use for generating embeddings.")
+ model: str = Field(..., description="The model name to use for generating embeddings.")
- model: Optional[str] = Field(..., description="The name of the model that generated the embeddings.")
+ model: str = Field(..., description="The name of the model that generated the embeddings.")As per coding guidelines for Pydantic v2. [Based on learnings]
Also applies to: 108-116, 139-151, 186-191, 201-201, 218-223, 236-241
🤖 Prompt for AI Agents
In ovos_persona_server/schemas/ollama.py around lines 27 to 38 (and similarly at
108-116, 139-151, 186-191, 201, 218-223, 236-241), several fields are declared
as Optional[...] but use Field(...) which makes them required in Pydantic v2;
update each field to either be truly optional by setting Optional[...] with a
default of None (e.g., Field(None, ...)) or make them required by removing
Optional[...] and keeping Field(...), ensuring the type and default align so
OpenAPI/validation correctly reflects required vs optional.
| class CreateChatCompletionRequest(BaseModel): | ||
| """ | ||
| Request body for creating a chat completion. | ||
| """ | ||
| messages: List[ | ||
| Union[ | ||
| ChatCompletionRequestSystemMessage, | ||
| ChatCompletionRequestUserMessage, | ||
| ChatCompletionRequestAssistantMessage, # This one is not explicitly defined in the OpenAPI, but used in examples. Adding for completeness. | ||
| ChatCompletionRequestToolMessage, | ||
| ChatCompletionRequestFunctionMessage, # Deprecated | ||
| ChatCompletionRequestDeveloperMessage | ||
| ] | ||
| ] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. [Example Python code](https://cookbook.openai.com/examples/how_to_format_messages_for_chat_completions).") |
There was a problem hiding this comment.
Default for messages uses wrong model and a mutable value
- Union excludes
ChatCompletionRequestMessage; default should beChatCompletionRequestUserMessage. - Avoid mutable defaults; use
default_factory.
- ] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. ...")
+ ] = Field(
+ default_factory=lambda: [ChatCompletionRequestUserMessage(content="hello world")],
+ description="A list of messages comprising the conversation so far. ..."
+ )As per Pydantic v2 best practices. [Based on learnings]
🤖 Prompt for AI Agents
In ovos_persona_server/schemas/openai_chat.py around lines 336-349, the messages
field currently uses a mutable list literal and the wrong default model; change
the Field to use default_factory returning a list with a
ChatCompletionRequestUserMessage instance (e.g. default_factory=lambda:
[ChatCompletionRequestUserMessage(role="user", content="hello world")]) instead
of the inline list, and ensure the Union of allowed message types remains
correct (use ChatCompletionRequestUserMessage as the default element type). This
removes the mutable default and fixes the default message model.
| fastapi | ||
| ovos-persona | ||
| pydantic |
There was a problem hiding this comment.
Add uvicorn and pin runtime deps for reproducibility.
uvicorn is required by __main__.py but not declared; installs will fail to run the server. Also consider pinning versions to stable ranges.
Apply:
fastapi
ovos-persona
pydantic
+uvicorn[standard]Optionally pin:
-fastapi
-pydantic
+fastapi>=0.119,<0.120
+pydantic>=2.11,<3
+# keep ovos-persona as constrained by your release process
+# and consider pinning uvicorn too
+uvicorn[standard]>=0.30,<0.31Based on learnings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fastapi | |
| ovos-persona | |
| pydantic | |
| fastapi | |
| ovos-persona | |
| pydantic | |
| uvicorn[standard] |
🤖 Prompt for AI Agents
In requirements.txt lines 1-3, uvicorn is missing and runtime deps are unpinned
which hurts reproducibility; update the file to add uvicorn (required by
__main__.py) and pin the runtime packages to stable version ranges (e.g.,
specific major.minor or caret-style ranges) so installs are reproducible — add a
uvicorn entry and replace bare package names with pinned versions or ranges for
fastapi, ovos-persona, pydantic (and uvicorn) consistent with tested
compatibility.
split from #11
Summary by CodeRabbit
New Features
Chores