refactor: migrate to fastapi by JarbasAl · Pull Request #14 · OpenVoiceOS/ovos-persona-server

JarbasAl · 2025-10-17T18:43:07Z

split from #11

Summary by CodeRabbit

New Features
- Added OpenAI-compatible chat completions and legacy completions endpoints with streaming support
- Added Ollama-compatible API endpoints for chat, generation, and model discovery
Chores
- Migrated from Flask to FastAPI framework for improved performance and maintainability
- Enhanced command-line interface with explicit argument parsing and type hints

coderabbitai · 2025-10-17T18:43:29Z

Walkthrough

The pull request migrates the application from Flask to FastAPI, introducing a modular router-based architecture with dependency injection for persona management. It adds comprehensive Pydantic schemas for OpenAI-compatible and Ollama-compatible endpoints, replaces inline route handlers with separate modules, and updates the CLI to use uvicorn instead of Flask's development server.

Changes

Cohort / File(s)	Summary
Core Application Refactoring `ovos_persona_server/__init__.py`, `ovos_persona_server/__main__.py`	Replaced Flask-based setup with FastAPI-based `create_persona_app` function. Removed `get_app` signature and replaced with `create_persona_app(persona_path: str) -> FastAPI`. Updated CLI to use uvicorn.run with explicit argument parsing and type hints.
Persona Dependency Management `ovos_persona_server/persona.py`	New module providing a FastAPI dependency pattern with module-level `default_persona` variable and async `get_default_persona()` function for endpoint injection. Returns HTTP 500 on missing persona.
Chat Completions Router `ovos_persona_server/chat.py`	New FastAPI router module exposing OpenAI-compatible endpoints: POST `/v1/chat/completions` and POST `/v1/completions`. Supports both streaming (Server-Sent Events) and non-streaming responses with token accounting, unique IDs, and timestamps. Includes lifespan context manager for persona initialization.
Ollama Compatibility Router `ovos_persona_server/ollama.py`	New FastAPI router module exposing Ollama-compatible endpoints: POST `/api/chat`, POST `/api/generate`, and GET `/api/tags`. Supports streaming and non-streaming responses, converts Ollama request objects to persona-compatible format, includes error handling, and returns model metadata.
Schema Definitions `ovos_persona_server/schemas/__init__.py`	Module docstring documenting the schemas package for OpenAI-compatible and Ollama API interactions.
OpenAI Schema Models `ovos_persona_server/schemas/openai_chat.py`	Comprehensive Pydantic models for OpenAI API compatibility: `CreateChatCompletionRequest`, `CreateChatCompletionResponse`, `CreateChatCompletionStreamResponse`, message types (system, user, assistant, tool, function, developer), tool/function calls, content parts (text, image), usage statistics, and legacy completions models. ~30+ classes with detailed typing and Field metadata.
Ollama Schema Models `ovos_persona_server/schemas/ollama.py`	Pydantic models for Ollama API compatibility: `OllamaChatRequest`, `OllamaChatResponse`, `OllamaGenerateRequest`, `OllamaTagsResponse`, message types, tool/function calls, model metadata, and embedding schemas. ~11 classes with support for multimodal messages and tools.
Dependencies `requirements.txt`	Replaced `flask` with `fastapi`; added `pydantic`; retained `ovos-persona`.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as __main__.py
    participant App as FastAPI App
    participant Persona as Persona Manager
    participant ChatRouter as /v1 Router
    participant OllamaRouter as /api Router

    User->>CLI: python -m ovos_persona_server --persona path/to/persona.json
    CLI->>App: create_persona_app(persona_path)
    App->>Persona: Load persona from JSON
    Persona->>App: Initialize default_persona
    App->>ChatRouter: Register router with lifespan
    App->>OllamaRouter: Register router with lifespan
    CLI->>App: uvicorn.run(host=0.0.0.0, port=8337)
    App-->>User: Server listening

    User->>ChatRouter: POST /v1/chat/completions
    ChatRouter->>Persona: get_default_persona()
    Persona-->>ChatRouter: Return Persona instance
    ChatRouter->>ChatRouter: Convert messages & generate ID
    alt Streaming Requested
        ChatRouter->>ChatRouter: persona.stream()
        ChatRouter-->>User: Server-Sent Events (chunks)
    else Non-Streaming
        ChatRouter->>ChatRouter: persona.chat()
        ChatRouter-->>User: CreateChatCompletionResponse (JSON)
    end

    User->>OllamaRouter: POST /api/chat
    OllamaRouter->>Persona: get_default_persona()
    Persona-->>OllamaRouter: Return Persona instance
    OllamaRouter->>OllamaRouter: Convert OllamaChatRequest
    alt Streaming Requested
        OllamaRouter->>OllamaRouter: persona.stream()
        OllamaRouter-->>User: Line-delimited JSON (chunks)
    else Non-Streaming
        OllamaRouter->>OllamaRouter: persona.chat()
        OllamaRouter-->>User: OllamaChatResponse (JSON)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Rationale: The diff spans nine files with significant heterogeneity—a core architectural migration from Flask to FastAPI, introduction of two new modular routers with async streaming logic, two comprehensive schema modules (~40 Pydantic classes total), and a new dependency injection pattern. While individual router endpoints follow similar patterns, the variety of concerns (streaming response handling, error handling, schema validation, lifespan management, Ollama/OpenAI API compatibility layers) and the density of logic in chat.py and ollama.py require careful reasoning across each component.

Poem

🐰 FastAPI paths now lead the way,
Where schemas dance in Pydantic's bay!
From Flask's old chains we've hopped so free,
With routers modular, clean, and spry!
Persona dependency flows on high,
As streaming responses gently fly! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "refactor: migrate to fastapi" is directly aligned with the main changes across the entire changeset. The title clearly and concisely summarizes the primary objective—migrating the project from Flask to FastAPI—which is the overarching theme connecting all modifications. The changes span multiple files including the main entry points (`__init__.py`, `__main__.py`), new FastAPI routers (`chat.py`, `ollama.py`), dependency injection setup (`persona.py`), schema definitions, and requirements updates. The title is specific and avoids vague language, using the conventional "refactor:" prefix to indicate the nature of the change. A teammate scanning the project history would immediately understand that this PR involves a framework migration to FastAPI.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fastapi

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (16)

ovos_persona_server/schemas/__init__.py (1)

1-6: Consider re-exporting public schemas.

Expose commonly used request/response models here to give users a stable import path (e.g., from .openai_chat import CreateChatCompletionRequest, ...).
ovos_persona_server/persona.py (1)
18-25: Docstring misleads: this dependency doesn’t load the persona.

It only returns the preloaded global or raises. Reword to avoid “loads and returns”.

Apply:
-    """
-    Asynchronously loads and returns the default persona.
+    """
+    Return the default persona initialized at app startup.
ovos_persona_server/chat.py (5)
29-31: Rename unused lifespan arg to underscore to silence linters.

Apply:
-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+async def lifespan(_: FastAPI) -> AsyncGenerator[None, None]:
9-12: Use secrets for request IDs (more robust, fewer deps).

Replace random.choices with secrets.token_urlsafe; drop random/string imports.

Apply:
-import random
-import string
+from secrets import token_urlsafe
@@
-    completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28))
+    completion_id: str = token_urlsafe(21)  # ~28 chars URL-safe
@@
-    completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28))
+    completion_id: str = token_urlsafe(21)
Also applies to: 72-73, 238-239

138-139: Only include usage in the final stream chunk when requested.

OpenAI-style SSE includes usage only in the last chunk and only when stream_options.include_usage is true.

Apply:
-            system_fingerprint=None,
-            usage=CompletionUsage(prompt_tokens=0, completion_tokens=0, total_tokens=0)
+            system_fingerprint=None
@@
-                        system_fingerprint=None,
-                        usage=CompletionUsage(prompt_tokens=0, completion_tokens=current_completion_tokens,
-                                              total_tokens=current_completion_tokens)
+                        system_fingerprint=None
The existing final chunk already adds usage conditionally (lines 180-188).

Also applies to: 156-158, 167-188

193-193: Add SSE headers to avoid buffering.

Helps proxies/servers stream correctly.

Apply:
-    return StreamingResponse(streaming_chat_response(), media_type="text/event-stream")
+    return StreamingResponse(
+        streaming_chat_response(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"}
+    )
And likewise for the legacy endpoint’s StreamingResponse.

Also applies to: 316-316

120-123: Avoid blind except Exception; at least log context or narrow expected errors.

Catching all exceptions obscures failures. Consider catching known persona errors, or log and attach a trace id.

Example:
-    except Exception as e:
+    except Exception as e:
+        # log exception details here with request_id/completion_id
         raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
                             detail=f"Persona chat failed: {e}") from e
Same idea for the legacy non-streaming branch and the streaming except block.

Also applies to: 271-274, 297-299
ovos_persona_server/__init__.py (2)
1-8: Docstring overclaims DB initialization not present in code

Mentions “unified SQLite database initialization using SQLAlchemy” but no such logic exists. Align docs to actual behavior to avoid confusion.

37-39: Avoid variable shadowing and use DI soon

Assigning to both local and global with the same name (“persona”) harms readability. Also add a TODO reference to migrate to DI.
-    # TODO - move to dependency injection
-    ovos_persona_server.persona.default_persona = persona = Persona(persona["name"], persona)
+    # TODO: move to dependency injection (FastAPI Depends) instead of a global
+    persona_obj = Persona(persona_data["name"], persona_data)
+    ovos_persona_server.persona.default_persona = persona_obj
ovos_persona_server/schemas/ollama.py (2)
85-91: Constrain roles to valid values

Use Literal["system","user","assistant","tool"] (or an Enum) for role instead of bare str to catch client errors early and improve the schema.
-    role: str = Field(..., description="The role of the message, either 'system', 'user', 'assistant', or 'tool'.")
+    role: Literal["system","user","assistant","tool"] = Field(..., description="...")
52-60: Guard against double-encoding in arguments serializer

If arguments is already a string, json.dumps will add quotes. Handle both dict and str.
-    def serialize_arguments(self, arguments: Dict[str, Any]) -> str:
+    def serialize_arguments(self, arguments: Dict[str, Any]) -> str:
         """
@@
-        return json.dumps(arguments)
+        if isinstance(arguments, str):
+            return arguments
+        return json.dumps(arguments)
ovos_persona_server/ollama.py (4)
35-44: lifespan param is unused; rename to _app

Silence ARG001 and clarify intent.
-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+async def lifespan(_app: FastAPI) -> AsyncGenerator[None, None]:
59-66: FastAPI Depends flagged by Ruff B008 (false positive)

Using Depends(...) in parameters is idiomatic FastAPI. If Ruff enforces B008, add # noqa: B008 per param or ignore in tool config for FastAPI modules.
Would you like a small .ruff.toml snippet to ignore B008 under ovos_persona_server/**? [Based on learnings]

Also applies to: 190-198

222-224: Remove unused variable suffix

suffix is assigned but not used.
-    suffix: Optional[str] = request_body.suffix
343-350: Typing for set literal may break on Python 3.9

Use Set[str] from typing for broader compatibility and filter out None before building parent_model.
-    models_in_config: set[Optional[str]] = {persona.config.get(s, {}).get("model")
-                                            for s in persona.solvers.loaded_modules.keys()}
-    parent_model_str: str = "|".join(filter(None, models_in_config))
+    from typing import Set
+    models_in_config: Set[Optional[str]] = {persona.config.get(s, {}).get("model")
+                                            for s in persona.solvers.loaded_modules.keys()}
+    parent_model_str: str = "|".join([m for m in models_in_config if m])
ovos_persona_server/schemas/openai_chat.py (1)
84-86: Prefer default_factory for list/dict defaults across models

Several fields default to [] or {}. While Pydantic v2 protects against shared state, default_factory communicates intent and avoids edge cases.

Example pattern:
-    tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field([], description="...")
+    tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field(default_factory=list, description="...")
Apply similarly for other list/dict fields in this module.

Also applies to: 96-101, 107-121, 147-151, 163-171, 177-178, 180-186, 188-194, 200-210, 232-236, 293-295, 303-305, 311-315, 327-334, 340-369, 372-396, 398-409, 436-445, 447-456, 489-493

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc14232 and 5a35623.

📒 Files selected for processing (9)

ovos_persona_server/__init__.py (1 hunks)
ovos_persona_server/__main__.py (1 hunks)
ovos_persona_server/chat.py (1 hunks)
ovos_persona_server/ollama.py (1 hunks)
ovos_persona_server/persona.py (1 hunks)
ovos_persona_server/schemas/__init__.py (1 hunks)
ovos_persona_server/schemas/ollama.py (1 hunks)
ovos_persona_server/schemas/openai_chat.py (1 hunks)
requirements.txt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

ovos_persona_server/__main__.py (1)

ovos_persona_server/__init__.py (1)

create_persona_app (20-67)

ovos_persona_server/ollama.py (2)

ovos_persona_server/persona.py (1)

get_default_persona (17-36)

ovos_persona_server/schemas/ollama.py (9)

OllamaChatResponse (8-38)

OllamaTagsResponse (194-201)

OllamaChatRequest (93-115)

OllamaGenerateRequest (118-151)

OllamaModelDetails (154-171)

OllamaModel (174-191)

OllamaChatMessage (72-90)

OllamaEmbedRequest (204-222)

OllamaEmbedResponse (225-240)

ovos_persona_server/chat.py (2)

ovos_persona_server/persona.py (1)

get_default_persona (17-36)

ovos_persona_server/schemas/openai_chat.py (10)

CreateChatCompletionRequest (336-369)

CreateChatCompletionResponse (372-395)

CreateChatCompletionStreamResponse (398-408)

ChatCompletionResponseMessage (297-304)

ChatCompletionChoice (307-314)

ChatCompletionStreamChoice (317-324)

CompletionUsage (327-333)

FinishReason (21-27)

CreateCompletionRequest (460-480)

CreateCompletionResponse (483-493)

🪛 Ruff (0.14.0)

ovos_persona_server/persona.py

35-35: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_persona_server/__main__.py

23-23: Possible binding to all interfaces

(S104)

ovos_persona_server/ollama.py

36-36: Unused function argument: app

(ARG001)

60-60: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

165-165: Do not catch blind exception: Exception

(BLE001)

191-191: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

222-222: Local variable suffix is assigned to but never used

Remove assignment to unused variable suffix

(F841)

303-303: Do not catch blind exception: Exception

(BLE001)

328-328: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

ovos_persona_server/chat.py

30-30: Unused function argument: app

(ARG001)

49-49: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

72-72: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

160-160: Do not catch blind exception: Exception

(BLE001)

199-199: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)

234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

238-238: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

297-297: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)

ovos_persona_server/ollama.py (1)

111-131: /chat non-stream: OK shape, but use NDJSON for streaming companion

No change needed here. See streaming comment below.

coderabbitai · 2025-10-17T18:49:11Z

ovos_persona_server/__init__.py

+def create_persona_app(persona_path: str) -> FastAPI:
+    """
+    Creates and configures the FastAPI application for the Persona Server.

-        return app.response_class(streaming(), mimetype="text/event-stream")
+    Args:
+        persona_path (Optional[str]): Optional path to a persona JSON file.
+                                      If provided, it overrides the default
+                                      persona path from settings or environment.

-    ############
-    # Ollama api compat
-    @app.route("/api/chat", methods=["POST"])
-    def chat():
-        model = request.json.get("model")
-        messages = request.json.get("messages")
-        tools = request.json.get("tools")
-        stream = request.json.get("stream")
+    Returns:
+        FastAPI: The configured FastAPI application instance.
+    """


⚠️ Potential issue | 🟠 Major

Harden persona loading and align signature/docs

Signature requires str but doc says Optional[str]. Pick one and enforce.

Add explicit error handling for missing/invalid files and invalid JSON.

Avoid reusing variable name persona for both dict and Persona instance.

Apply:

-def create_persona_app(persona_path: str) -> FastAPI: +def create_persona_app(persona_path: str) -> FastAPI: @@ - with open(persona_path) as f: - persona = json.load(f) - persona["name"] = persona.get("name") or os.path.basename(persona_path) + try: + with open(persona_path, encoding="utf-8") as f: + persona_data = json.load(f) + except FileNotFoundError as e: + raise RuntimeError(f"Persona file not found: {persona_path}") from e + except json.JSONDecodeError as e: + raise RuntimeError(f"Invalid persona JSON at {persona_path}: {e}") from e + persona_data["name"] = persona_data.get("name") or os.path.basename(persona_path)

Also applies to: 33-36

🤖 Prompt for AI Agents

In ovos_persona_server/__init__.py around lines 20-31 (also apply same changes to lines 33-36), change the function signature to accept Optional[str] (from typing import Optional) and update the docstring to match; then add explicit checks and exception handling: if persona_path is None, use settings/default path; if the resolved path does not exist raise a ValueError or HTTPException with a clear message; when reading the file wrap the open/read in try/except to catch IOErrors and raise a clear error; parse JSON inside a try/except to catch JSONDecodeError and raise a descriptive error; lastly, avoid reusing the name persona for two types—use persona_data for the raw dict and persona_obj for the constructed Persona instance.

coderabbitai · 2025-10-17T18:49:11Z

ovos_persona_server/__init__.py

+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],  # Allows all origins
+        allow_credentials=True,
+        allow_methods=["*"],  # Allows all methods (GET, POST, etc.)
+        allow_headers=["*"],  # Allows all headers
+    )


⚠️ Potential issue | 🟠 Major

CORS: wildcard origins + credentials is unsafe/invalid

allow_origins=["*"] together with allow_credentials=True is not permitted by browsers and weakens security. Either a) specify explicit origins, or b) keep wildcard but set allow_credentials=False.

Minimal fix:

- allow_origins=["*"], # Allows all origins - allow_credentials=True, + allow_origins=os.getenv("OVOS_ALLOWED_ORIGINS", "*").split(","), + allow_credentials=False,

I can wire this to settings/env and add tests.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In ovos_persona_server/__init__.py around lines 50-56, the CORS middleware currently uses allow_origins=["*"] while allow_credentials=True which is invalid/unsafe; change to load an explicit list of allowed origins from configuration or environment and pass that list to allow_origins (e.g., ALLOWED_ORIGINS from settings), keep allow_credentials=True only if ALLOWED_ORIGINS is not ["*"], otherwise set allow_credentials=False; add validation to raise or log if ALLOWED_ORIGINS contains "*" while allow_credentials is True and add tests to cover both the explicit-origins and wildcard-without-credentials scenarios.

coderabbitai · 2025-10-17T18:49:11Z

ovos_persona_server/__main__.py

+    parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)
+    parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)
+    parser.add_argument("--port", help="Port to run server on", default=8337, type=int)
+    args: Any = parser.parse_args()  # Using Any for args as argparse.Namespace is dynamic

-    app = get_app(os.path.expanduser(args.persona))
+    app = create_persona_app(args.persona)

-    app.run(host=args.host, port=args.port, debug=False)
+    uvicorn.run(app, port=args.port, host=args.host, log_level="debug")


⚠️ Potential issue | 🔴 Critical

Make --persona required to avoid runtime crash.

create_persona_app opens the given path unconditionally; passing None (current default) will crash. Make the CLI arg required.

Apply:

- parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str) + parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str)

Optional:

Default host to 127.0.0.1 for dev and let users opt-in to 0.0.0.0.

Make --log-level configurable (default info) instead of hardcoded debug.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)

parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)

parser.add_argument("--port", help="Port to run server on", default=8337, type=int)

args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic

app = get_app(os.path.expanduser(args.persona))

app = create_persona_app(args.persona)

app.run(host=args.host, port=args.port, debug=False)

uvicorn.run(app, port=args.port, host=args.host, log_level="debug")

parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str)

parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)

parser.add_argument("--port", help="Port to run server on", default=8337, type=int)

args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic

app = create_persona_app(args.persona)

uvicorn.run(app, port=args.port, host=args.host, log_level="debug")

🧰 Tools

🪛 Ruff (0.14.0)

23-23: Possible binding to all interfaces

(S104)

🤖 Prompt for AI Agents

In ovos_persona_server/__main__.py around lines 22 to 29, the CLI currently allows --persona to be omitted which causes create_persona_app to crash when passed None; change the parser.add_argument for --persona to make it required (remove default and add required=True) so the program fails fast with argparse if not provided. While editing, optionally change the --host default to "127.0.0.1" and add a new --log-level argument (default "info") that is passed into uvicorn.run instead of the hardcoded "debug" to make logging configurable.

coderabbitai · 2025-10-17T18:49:11Z

ovos_persona_server/chat.py

+        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                            detail=f"Failed to generate completion: token_ids support not yet implemented")
+    elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):
+        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                            detail=f"Failed to generate completion: token_ids support not yet implemented")
+    else:


⚠️ Potential issue | 🟡 Minor

Use 501 for unimplemented token_id prompts and fix f-strings.

These paths aren’t implemented; return Not Implemented and remove extraneous f-strings.

Apply:

- raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to generate completion: token_ids support not yet implemented") + raise HTTPException( + status_code=status.HTTP_501_NOT_IMPLEMENTED, + detail="Failed to generate completion: token_ids support not yet implemented" + ) @@ - raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to generate completion: token_ids support not yet implemented") + raise HTTPException( + status_code=status.HTTP_501_NOT_IMPLEMENTED, + detail="Failed to generate completion: token_ids support not yet implemented" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,

detail=f"Failed to generate completion: token_ids support not yet implemented")

elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):

raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,

detail=f"Failed to generate completion: token_ids support not yet implemented")

else:

raise HTTPException(

status_code=status.HTTP_501_NOT_IMPLEMENTED,

detail="Failed to generate completion: token_ids support not yet implemented"

)

elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):

raise HTTPException(

status_code=status.HTTP_501_NOT_IMPLEMENTED,

detail="Failed to generate completion: token_ids support not yet implemented"

)

else:

🧰 Tools

🪛 Ruff (0.14.0)

231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)

234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In ovos_persona_server/chat.py around lines 230 to 235, the branches raising HTTPException for token_id prompts should return HTTP 501 Not Implemented and not use unnecessary f-strings; replace status_code=status.HTTP_500_INTERNAL_SERVER_ERROR with status.HTTP_501_NOT_IMPLEMENTED (or the corresponding numeric 501 constant) on both raises and change detail=f"...token_ids support not yet implemented" to a plain string detail="token_ids support not yet implemented" (remove the f-prefix) so the responses correctly indicate unimplemented functionality.

coderabbitai · 2025-10-17T18:49:11Z

ovos_persona_server/ollama.py

+    async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
+        """
+        Asynchronously streams Ollama-compatible chat completion chunks.
+        """
+        start_time: float = time.time()
+        # Placeholders for metrics in streaming chunks
+        streaming_load_duration: int = 0
+        streaming_prompt_eval_count: int = 0
+        streaming_prompt_eval_duration: int = 0
+        streaming_eval_count: int = 0
+        streaming_eval_duration: int = 0
+
+        try:
+            for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit):
+                if chunk:
+                    # Increment eval_count for each chunk (approximate)
+                    streaming_eval_count += len(chunk.split())  # Simple token count approximation
+                    yield json.dumps({
+                        "model": persona.name,
+                        "created_at": ts,
+                        "message": {"role": "assistant", "content": chunk},
+                        "done": False,
+                        # Update total_duration dynamically for streaming chunks
+                        "total_duration": int((time.time() - start_time) * 1_000_000_000),
+                        "load_duration": streaming_load_duration,
+                        "prompt_eval_count": streaming_prompt_eval_count,
+                        "prompt_eval_duration": streaming_prompt_eval_duration,
+                        "eval_count": streaming_eval_count,
+                        "eval_duration": streaming_eval_duration  # This would be cumulative in a real scenario
+                    }) + "\n"
+        except Exception as e:
+            # Handle streaming errors gracefully
+            yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
+            return  # Stop the generator
+
+        end_time: float = time.time()
+        streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000)
+
+        yield json.dumps({
+            "model": persona.name,
+            "created_at": ts,
+            "message": {"role": "assistant", "content": ""},  # Empty content for final done chunk
+            "done": True,
+            "total_duration": streaming_total_duration,
+            "load_duration": streaming_load_duration,
+            "prompt_eval_count": streaming_prompt_eval_count,
+            "prompt_eval_duration": streaming_prompt_eval_duration,
+            "eval_count": streaming_eval_count,
+            "eval_duration": streaming_eval_duration,
+            "done_reason": done_reason
+        }) + "\n"
+
+    return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json")


🛠️ Refactor suggestion | 🟠 Major

Streaming content-type should be NDJSON and avoid blocking the event loop

Ollama streams line-delimited JSON; use application/x-ndjson.

Iterating a sync generator in an async def can block the loop. Use Starlette’s iterate_in_threadpool.

Handle asyncio.CancelledError separately to avoid masking client disconnects.

+from starlette.concurrency import iterate_in_threadpool +import asyncio @@ - async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]: + async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]: @@ - try: - for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit): + try: + async for chunk in iterate_in_threadpool(persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit)): if chunk: @@ - except Exception as e: + except asyncio.CancelledError: + raise + except Exception as e: # Handle streaming errors gracefully yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n" return # Stop the generator @@ - return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json") + return StreamingResponse(streaming_ollama_chat_response(), media_type="application/x-ndjson")

As per FastAPI/Starlette best practices for streaming. [Based on learnings]

🧰 Tools

🪛 Ruff (0.14.0)

165-165: Do not catch blind exception: Exception

(BLE001)

coderabbitai · 2025-10-17T18:49:12Z

ovos_persona_server/ollama.py

+    async def streaming_ollama_generate_response() -> AsyncGenerator[str, None]:
+        """
+        Asynchronously streams Ollama-compatible generation chunks.
+        """
+        start_time: float = time.time()
+        # Placeholders for metrics in streaming chunks
+        streaming_load_duration: int = 0
+        streaming_prompt_eval_count: int = 0
+        streaming_prompt_eval_duration: int = 0
+        streaming_eval_count: int = 0
+        streaming_eval_duration: int = 0
+
+        try:
+            # Use persona.stream for streaming generation
+            for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit):
+                if chunk:
+                    # Increment eval_count for each chunk (approximate)
+                    streaming_eval_count += len(chunk.split())  # Simple token count approximation
+                    yield json.dumps({
+                        "model": persona.name,
+                        "created_at": ts,
+                        "response": chunk,  # Ollama /generate uses "response" key for content
+                        "done": False,
+                        "total_duration": int((time.time() - start_time) * 1_000_000_000),
+                        "load_duration": streaming_load_duration,
+                        "prompt_eval_count": streaming_prompt_eval_count,
+                        "prompt_eval_duration": streaming_prompt_eval_duration,
+                        "eval_count": streaming_eval_count,
+                        "eval_duration": streaming_eval_duration  # This would be cumulative in a real scenario
+                    }) + "\n"
+        except Exception as e:
+            yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
+            return
+
+        end_time: float = time.time()
+        streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000)
+
+        yield json.dumps({
+            "model": persona.name,
+            "created_at": ts,
+            "response": "",  # Empty content for final done chunk in /generate
+            "done": True,
+            "total_duration": streaming_total_duration,
+            "load_duration": streaming_load_duration,
+            "prompt_eval_count": streaming_prompt_eval_count,
+            "prompt_eval_duration": streaming_prompt_eval_duration,
+            "eval_count": streaming_eval_count,
+            "eval_duration": streaming_eval_duration,
+            "done_reason": done_reason
+        }) + "\n"
+
+    return StreamingResponse(streaming_ollama_generate_response(), media_type="application/json")


🛠️ Refactor suggestion | 🟠 Major

/generate streaming: same NDJSON + threadpool + CancelledError pattern

Apply the same changes as in /chat streaming:

+from starlette.concurrency import iterate_in_threadpool +import asyncio @@ - try: - # Use persona.stream for streaming generation - for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit): + try: + async for chunk in iterate_in_threadpool(persona.stream(messages, lang=sess.lang, units=sess.system_unit)): if chunk: @@ - except Exception as e: + except asyncio.CancelledError: + raise + except Exception as e: yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n" return

🧰 Tools

🪛 Ruff (0.14.0)

303-303: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

ovos_persona_server/ollama.py lines 273-324: the /generate streaming handler currently iterates persona.stream directly and returns JSON/plain application, but needs the same NDJSON, threadpool offload, and CancelledError handling used in /chat streaming; change the generator to run persona.stream in an executor (e.g., asyncio.to_thread or loop.run_in_executor) so blocking IO doesn't block the event loop, yield each chunk as NDJSON (one JSON object per line) with "done": False, catch asyncio.CancelledError separately to stop streaming cleanly without emitting an error chunk, catch other Exceptions to emit an error NDJSON chunk and return, and finally emit the final "done": True NDJSON chunk; also return StreamingResponse with media_type "application/x-ndjson".

coderabbitai · 2025-10-17T18:49:12Z

ovos_persona_server/persona.py

+        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                            detail=f"Failed to load persona")


⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string.

No placeholders used.

Apply:

- raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to load persona") + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Failed to load persona" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,

detail=f"Failed to load persona")

raise HTTPException(

status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,

detail="Failed to load persona"

)

🧰 Tools

🪛 Ruff (0.14.0)

35-35: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In ovos_persona_server/persona.py around lines 34 to 35, the HTTPException detail uses an unnecessary f-string with no placeholders; replace the f-string with a plain string literal "Failed to load persona" (remove the leading f) so the detail is a normal string.

coderabbitai · 2025-10-17T18:49:12Z

ovos_persona_server/schemas/ollama.py

+    model: Optional[str] = Field(..., description="The name of the model that generated the response.")
+    created_at: str = Field(..., description="The timestamp when the response was created (ISO 8601 format).")
+    message: Dict[str, str] = Field(..., description="The message object containing the role and content of the assistant's reply.")
+    done: bool = Field(..., description="Indicates if the response is the final one in a stream.")
+    context: Optional[List[int]] = Field(None, description="An encoding of the conversation used, can be sent in the next request to keep conversational memory (list of token IDs).")
+    total_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.")
+    load_duration: Optional[int] = Field(None, description="Time spent loading the model into memory in nanoseconds.")
+    prompt_eval_count: Optional[int] = Field(None, description="Number of tokens in the prompt.")
+    prompt_eval_duration: Optional[int] = Field(None, description="Time spent evaluating the prompt in nanoseconds.")
+    eval_count: Optional[int] = Field(None, description="Number of tokens in the response.")
+    eval_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.")
+    done_reason: Optional[str] = Field(None, description="The reason the model stopped generating (e.g., 'stop', 'unload').")


🛠️ Refactor suggestion | 🟠 Major

Required fields typed as Optional with Field(...)

Several fields are marked Optional[...] but declared required with Field(...). This confuses validation and OpenAPI. Make them required (drop Optional) or truly optional (default None).

Examples to fix:

- model: Optional[str] = Field(..., description="The name of the model ...") + model: str = Field(..., description="The name of the model ...") - model: Optional[str] = Field(..., description="The model to use for the chat.") + model: str = Field(..., description="The model to use for the chat.") - model: Optional[str] = Field(..., description="The model name to use for generation.") + model: str = Field(..., description="The model name to use for generation.") - model: Optional[str] = Field(..., description="The base model name.") + model: str = Field(..., description="The base model name.") - model: Optional[str] = Field(..., description="The model name to use for generating embeddings.") + model: str = Field(..., description="The model name to use for generating embeddings.") - model: Optional[str] = Field(..., description="The name of the model that generated the embeddings.") + model: str = Field(..., description="The name of the model that generated the embeddings.")

As per coding guidelines for Pydantic v2. [Based on learnings]

Also applies to: 108-116, 139-151, 186-191, 201-201, 218-223, 236-241

🤖 Prompt for AI Agents

In ovos_persona_server/schemas/ollama.py around lines 27 to 38 (and similarly at 108-116, 139-151, 186-191, 201, 218-223, 236-241), several fields are declared as Optional[...] but use Field(...) which makes them required in Pydantic v2; update each field to either be truly optional by setting Optional[...] with a default of None (e.g., Field(None, ...)) or make them required by removing Optional[...] and keeping Field(...), ensuring the type and default align so OpenAPI/validation correctly reflects required vs optional.

coderabbitai · 2025-10-17T18:49:12Z

ovos_persona_server/schemas/openai_chat.py

+class CreateChatCompletionRequest(BaseModel):
+    """
+    Request body for creating a chat completion.
+    """
+    messages: List[
+        Union[
+            ChatCompletionRequestSystemMessage,
+            ChatCompletionRequestUserMessage,
+            ChatCompletionRequestAssistantMessage,  # This one is not explicitly defined in the OpenAPI, but used in examples. Adding for completeness.
+            ChatCompletionRequestToolMessage,
+            ChatCompletionRequestFunctionMessage,  # Deprecated
+            ChatCompletionRequestDeveloperMessage
+        ]
+    ] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. [Example Python code](https://cookbook.openai.com/examples/how_to_format_messages_for_chat_completions).")


⚠️ Potential issue | 🔴 Critical

Default for messages uses wrong model and a mutable value

Union excludes ChatCompletionRequestMessage; default should be ChatCompletionRequestUserMessage.

Avoid mutable defaults; use default_factory.

- ] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. ...") + ] = Field( + default_factory=lambda: [ChatCompletionRequestUserMessage(content="hello world")], + description="A list of messages comprising the conversation so far. ..." + )

As per Pydantic v2 best practices. [Based on learnings]

🤖 Prompt for AI Agents

In ovos_persona_server/schemas/openai_chat.py around lines 336-349, the messages field currently uses a mutable list literal and the wrong default model; change the Field to use default_factory returning a list with a ChatCompletionRequestUserMessage instance (e.g. default_factory=lambda: [ChatCompletionRequestUserMessage(role="user", content="hello world")]) instead of the inline list, and ensure the Union of allowed message types remains correct (use ChatCompletionRequestUserMessage as the default element type). This removes the mutable default and fixes the default message model.

coderabbitai · 2025-10-17T18:49:12Z

requirements.txt

+fastapi
 ovos-persona
+pydantic


⚠️ Potential issue | 🟠 Major

Add uvicorn and pin runtime deps for reproducibility.

uvicorn is required by __main__.py but not declared; installs will fail to run the server. Also consider pinning versions to stable ranges.

Apply:

fastapi ovos-persona pydantic +uvicorn[standard]

Optionally pin:

-fastapi -pydantic +fastapi>=0.119,<0.120 +pydantic>=2.11,<3 +# keep ovos-persona as constrained by your release process +# and consider pinning uvicorn too +uvicorn[standard]>=0.30,<0.31

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fastapi

ovos-persona

pydantic

fastapi

ovos-persona

pydantic

uvicorn[standard]

🤖 Prompt for AI Agents

In requirements.txt lines 1-3, uvicorn is missing and runtime deps are unpinned which hurts reproducibility; update the file to add uvicorn (required by __main__.py) and pin the runtime packages to stable version ranges (e.g., specific major.minor or caret-style ranges) so installs are reproducible — add a uvicorn entry and replace bare package names with pinned versions or ranges for fastapi, ovos-persona, pydantic (and uvicorn) consistent with tested compatibility.

refactor: migrate to fastapi

5a35623

JarbasAl mentioned this pull request Oct 17, 2025

A2a #15

Open

JarbasAl merged commit 3ec8f26 into dev Oct 17, 2025
2 of 3 checks passed

JarbasAl deleted the fastapi branch October 17, 2025 18:46

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

		raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
		detail=f"Failed to load persona")

Uh oh!

Conversation

JarbasAl commented Oct 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Oct 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading