Skip to content

refactor: migrate to fastapi#14

Merged
JarbasAl merged 1 commit intodevfrom
fastapi
Oct 17, 2025
Merged

refactor: migrate to fastapi#14
JarbasAl merged 1 commit intodevfrom
fastapi

Conversation

@JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Oct 17, 2025

split from #11

Summary by CodeRabbit

  • New Features

    • Added OpenAI-compatible chat completions and legacy completions endpoints with streaming support
    • Added Ollama-compatible API endpoints for chat, generation, and model discovery
  • Chores

    • Migrated from Flask to FastAPI framework for improved performance and maintainability
    • Enhanced command-line interface with explicit argument parsing and type hints

@coderabbitai
Copy link

coderabbitai bot commented Oct 17, 2025

Walkthrough

The pull request migrates the application from Flask to FastAPI, introducing a modular router-based architecture with dependency injection for persona management. It adds comprehensive Pydantic schemas for OpenAI-compatible and Ollama-compatible endpoints, replaces inline route handlers with separate modules, and updates the CLI to use uvicorn instead of Flask's development server.

Changes

Cohort / File(s) Summary
Core Application Refactoring
ovos_persona_server/__init__.py, ovos_persona_server/__main__.py
Replaced Flask-based setup with FastAPI-based create_persona_app function. Removed get_app signature and replaced with create_persona_app(persona_path: str) -> FastAPI. Updated CLI to use uvicorn.run with explicit argument parsing and type hints.
Persona Dependency Management
ovos_persona_server/persona.py
New module providing a FastAPI dependency pattern with module-level default_persona variable and async get_default_persona() function for endpoint injection. Returns HTTP 500 on missing persona.
Chat Completions Router
ovos_persona_server/chat.py
New FastAPI router module exposing OpenAI-compatible endpoints: POST /v1/chat/completions and POST /v1/completions. Supports both streaming (Server-Sent Events) and non-streaming responses with token accounting, unique IDs, and timestamps. Includes lifespan context manager for persona initialization.
Ollama Compatibility Router
ovos_persona_server/ollama.py
New FastAPI router module exposing Ollama-compatible endpoints: POST /api/chat, POST /api/generate, and GET /api/tags. Supports streaming and non-streaming responses, converts Ollama request objects to persona-compatible format, includes error handling, and returns model metadata.
Schema Definitions
ovos_persona_server/schemas/__init__.py
Module docstring documenting the schemas package for OpenAI-compatible and Ollama API interactions.
OpenAI Schema Models
ovos_persona_server/schemas/openai_chat.py
Comprehensive Pydantic models for OpenAI API compatibility: CreateChatCompletionRequest, CreateChatCompletionResponse, CreateChatCompletionStreamResponse, message types (system, user, assistant, tool, function, developer), tool/function calls, content parts (text, image), usage statistics, and legacy completions models. ~30+ classes with detailed typing and Field metadata.
Ollama Schema Models
ovos_persona_server/schemas/ollama.py
Pydantic models for Ollama API compatibility: OllamaChatRequest, OllamaChatResponse, OllamaGenerateRequest, OllamaTagsResponse, message types, tool/function calls, model metadata, and embedding schemas. ~11 classes with support for multimodal messages and tools.
Dependencies
requirements.txt
Replaced flask with fastapi; added pydantic; retained ovos-persona.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as __main__.py
    participant App as FastAPI App
    participant Persona as Persona Manager
    participant ChatRouter as /v1 Router
    participant OllamaRouter as /api Router

    User->>CLI: python -m ovos_persona_server --persona path/to/persona.json
    CLI->>App: create_persona_app(persona_path)
    App->>Persona: Load persona from JSON
    Persona->>App: Initialize default_persona
    App->>ChatRouter: Register router with lifespan
    App->>OllamaRouter: Register router with lifespan
    CLI->>App: uvicorn.run(host=0.0.0.0, port=8337)
    App-->>User: Server listening

    User->>ChatRouter: POST /v1/chat/completions
    ChatRouter->>Persona: get_default_persona()
    Persona-->>ChatRouter: Return Persona instance
    ChatRouter->>ChatRouter: Convert messages & generate ID
    alt Streaming Requested
        ChatRouter->>ChatRouter: persona.stream()
        ChatRouter-->>User: Server-Sent Events (chunks)
    else Non-Streaming
        ChatRouter->>ChatRouter: persona.chat()
        ChatRouter-->>User: CreateChatCompletionResponse (JSON)
    end

    User->>OllamaRouter: POST /api/chat
    OllamaRouter->>Persona: get_default_persona()
    Persona-->>OllamaRouter: Return Persona instance
    OllamaRouter->>OllamaRouter: Convert OllamaChatRequest
    alt Streaming Requested
        OllamaRouter->>OllamaRouter: persona.stream()
        OllamaRouter-->>User: Line-delimited JSON (chunks)
    else Non-Streaming
        OllamaRouter->>OllamaRouter: persona.chat()
        OllamaRouter-->>User: OllamaChatResponse (JSON)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Rationale: The diff spans nine files with significant heterogeneity—a core architectural migration from Flask to FastAPI, introduction of two new modular routers with async streaming logic, two comprehensive schema modules (~40 Pydantic classes total), and a new dependency injection pattern. While individual router endpoints follow similar patterns, the variety of concerns (streaming response handling, error handling, schema validation, lifespan management, Ollama/OpenAI API compatibility layers) and the density of logic in chat.py and ollama.py require careful reasoning across each component.

Poem

🐰 FastAPI paths now lead the way,
Where schemas dance in Pydantic's bay!
From Flask's old chains we've hopped so free,
With routers modular, clean, and spry!
Persona dependency flows on high,
As streaming responses gently fly!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "refactor: migrate to fastapi" is directly aligned with the main changes across the entire changeset. The title clearly and concisely summarizes the primary objective—migrating the project from Flask to FastAPI—which is the overarching theme connecting all modifications. The changes span multiple files including the main entry points (__init__.py, __main__.py), new FastAPI routers (chat.py, ollama.py), dependency injection setup (persona.py), schema definitions, and requirements updates. The title is specific and avoids vague language, using the conventional "refactor:" prefix to indicate the nature of the change. A teammate scanning the project history would immediately understand that this PR involves a framework migration to FastAPI.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fastapi

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JarbasAl JarbasAl mentioned this pull request Oct 17, 2025
@JarbasAl JarbasAl merged commit 3ec8f26 into dev Oct 17, 2025
2 of 3 checks passed
@JarbasAl JarbasAl deleted the fastapi branch October 17, 2025 18:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (16)
ovos_persona_server/schemas/__init__.py (1)

1-6: Consider re-exporting public schemas.

Expose commonly used request/response models here to give users a stable import path (e.g., from .openai_chat import CreateChatCompletionRequest, ...).

ovos_persona_server/persona.py (1)

18-25: Docstring misleads: this dependency doesn’t load the persona.

It only returns the preloaded global or raises. Reword to avoid “loads and returns”.

Apply:

-    """
-    Asynchronously loads and returns the default persona.
+    """
+    Return the default persona initialized at app startup.
ovos_persona_server/chat.py (5)

29-31: Rename unused lifespan arg to underscore to silence linters.

Apply:

-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+async def lifespan(_: FastAPI) -> AsyncGenerator[None, None]:

9-12: Use secrets for request IDs (more robust, fewer deps).

Replace random.choices with secrets.token_urlsafe; drop random/string imports.

Apply:

-import random
-import string
+from secrets import token_urlsafe
@@
-    completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28))
+    completion_id: str = token_urlsafe(21)  # ~28 chars URL-safe
@@
-    completion_id: str = ''.join(random.choices(string.ascii_letters + string.digits, k=28))
+    completion_id: str = token_urlsafe(21)

Also applies to: 72-73, 238-239


138-139: Only include usage in the final stream chunk when requested.

OpenAI-style SSE includes usage only in the last chunk and only when stream_options.include_usage is true.

Apply:

-            system_fingerprint=None,
-            usage=CompletionUsage(prompt_tokens=0, completion_tokens=0, total_tokens=0)
+            system_fingerprint=None
@@
-                        system_fingerprint=None,
-                        usage=CompletionUsage(prompt_tokens=0, completion_tokens=current_completion_tokens,
-                                              total_tokens=current_completion_tokens)
+                        system_fingerprint=None

The existing final chunk already adds usage conditionally (lines 180-188).

Also applies to: 156-158, 167-188


193-193: Add SSE headers to avoid buffering.

Helps proxies/servers stream correctly.

Apply:

-    return StreamingResponse(streaming_chat_response(), media_type="text/event-stream")
+    return StreamingResponse(
+        streaming_chat_response(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"}
+    )

And likewise for the legacy endpoint’s StreamingResponse.

Also applies to: 316-316


120-123: Avoid blind except Exception; at least log context or narrow expected errors.

Catching all exceptions obscures failures. Consider catching known persona errors, or log and attach a trace id.

Example:

-    except Exception as e:
+    except Exception as e:
+        # log exception details here with request_id/completion_id
         raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
                             detail=f"Persona chat failed: {e}") from e

Same idea for the legacy non-streaming branch and the streaming except block.

Also applies to: 271-274, 297-299

ovos_persona_server/__init__.py (2)

1-8: Docstring overclaims DB initialization not present in code

Mentions “unified SQLite database initialization using SQLAlchemy” but no such logic exists. Align docs to actual behavior to avoid confusion.


37-39: Avoid variable shadowing and use DI soon

Assigning to both local and global with the same name (“persona”) harms readability. Also add a TODO reference to migrate to DI.

-    # TODO - move to dependency injection
-    ovos_persona_server.persona.default_persona = persona = Persona(persona["name"], persona)
+    # TODO: move to dependency injection (FastAPI Depends) instead of a global
+    persona_obj = Persona(persona_data["name"], persona_data)
+    ovos_persona_server.persona.default_persona = persona_obj
ovos_persona_server/schemas/ollama.py (2)

85-91: Constrain roles to valid values

Use Literal["system","user","assistant","tool"] (or an Enum) for role instead of bare str to catch client errors early and improve the schema.

-    role: str = Field(..., description="The role of the message, either 'system', 'user', 'assistant', or 'tool'.")
+    role: Literal["system","user","assistant","tool"] = Field(..., description="...")

52-60: Guard against double-encoding in arguments serializer

If arguments is already a string, json.dumps will add quotes. Handle both dict and str.

-    def serialize_arguments(self, arguments: Dict[str, Any]) -> str:
+    def serialize_arguments(self, arguments: Dict[str, Any]) -> str:
         """
@@
-        return json.dumps(arguments)
+        if isinstance(arguments, str):
+            return arguments
+        return json.dumps(arguments)
ovos_persona_server/ollama.py (4)

35-44: lifespan param is unused; rename to _app

Silence ARG001 and clarify intent.

-async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+async def lifespan(_app: FastAPI) -> AsyncGenerator[None, None]:

59-66: FastAPI Depends flagged by Ruff B008 (false positive)

Using Depends(...) in parameters is idiomatic FastAPI. If Ruff enforces B008, add # noqa: B008 per param or ignore in tool config for FastAPI modules.
Would you like a small .ruff.toml snippet to ignore B008 under ovos_persona_server/**? [Based on learnings]

Also applies to: 190-198


222-224: Remove unused variable suffix

suffix is assigned but not used.

-    suffix: Optional[str] = request_body.suffix

343-350: Typing for set literal may break on Python 3.9

Use Set[str] from typing for broader compatibility and filter out None before building parent_model.

-    models_in_config: set[Optional[str]] = {persona.config.get(s, {}).get("model")
-                                            for s in persona.solvers.loaded_modules.keys()}
-    parent_model_str: str = "|".join(filter(None, models_in_config))
+    from typing import Set
+    models_in_config: Set[Optional[str]] = {persona.config.get(s, {}).get("model")
+                                            for s in persona.solvers.loaded_modules.keys()}
+    parent_model_str: str = "|".join([m for m in models_in_config if m])
ovos_persona_server/schemas/openai_chat.py (1)

84-86: Prefer default_factory for list/dict defaults across models

Several fields default to [] or {}. While Pydantic v2 protects against shared state, default_factory communicates intent and avoids edge cases.

Example pattern:

-    tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field([], description="...")
+    tool_calls: Optional[List[ChatCompletionMessageToolCall]] = Field(default_factory=list, description="...")

Apply similarly for other list/dict fields in this module.

Also applies to: 96-101, 107-121, 147-151, 163-171, 177-178, 180-186, 188-194, 200-210, 232-236, 293-295, 303-305, 311-315, 327-334, 340-369, 372-396, 398-409, 436-445, 447-456, 489-493

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc14232 and 5a35623.

📒 Files selected for processing (9)
  • ovos_persona_server/__init__.py (1 hunks)
  • ovos_persona_server/__main__.py (1 hunks)
  • ovos_persona_server/chat.py (1 hunks)
  • ovos_persona_server/ollama.py (1 hunks)
  • ovos_persona_server/persona.py (1 hunks)
  • ovos_persona_server/schemas/__init__.py (1 hunks)
  • ovos_persona_server/schemas/ollama.py (1 hunks)
  • ovos_persona_server/schemas/openai_chat.py (1 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
ovos_persona_server/__main__.py (1)
ovos_persona_server/__init__.py (1)
  • create_persona_app (20-67)
ovos_persona_server/ollama.py (2)
ovos_persona_server/persona.py (1)
  • get_default_persona (17-36)
ovos_persona_server/schemas/ollama.py (9)
  • OllamaChatResponse (8-38)
  • OllamaTagsResponse (194-201)
  • OllamaChatRequest (93-115)
  • OllamaGenerateRequest (118-151)
  • OllamaModelDetails (154-171)
  • OllamaModel (174-191)
  • OllamaChatMessage (72-90)
  • OllamaEmbedRequest (204-222)
  • OllamaEmbedResponse (225-240)
ovos_persona_server/chat.py (2)
ovos_persona_server/persona.py (1)
  • get_default_persona (17-36)
ovos_persona_server/schemas/openai_chat.py (10)
  • CreateChatCompletionRequest (336-369)
  • CreateChatCompletionResponse (372-395)
  • CreateChatCompletionStreamResponse (398-408)
  • ChatCompletionResponseMessage (297-304)
  • ChatCompletionChoice (307-314)
  • ChatCompletionStreamChoice (317-324)
  • CompletionUsage (327-333)
  • FinishReason (21-27)
  • CreateCompletionRequest (460-480)
  • CreateCompletionResponse (483-493)
🪛 Ruff (0.14.0)
ovos_persona_server/persona.py

35-35: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_persona_server/__main__.py

23-23: Possible binding to all interfaces

(S104)

ovos_persona_server/ollama.py

36-36: Unused function argument: app

(ARG001)


60-60: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


165-165: Do not catch blind exception: Exception

(BLE001)


191-191: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


222-222: Local variable suffix is assigned to but never used

Remove assignment to unused variable suffix

(F841)


303-303: Do not catch blind exception: Exception

(BLE001)


328-328: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

ovos_persona_server/chat.py

30-30: Unused function argument: app

(ARG001)


49-49: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


72-72: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


160-160: Do not catch blind exception: Exception

(BLE001)


199-199: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)


234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)


238-238: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


297-297: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)
ovos_persona_server/ollama.py (1)

111-131: /chat non-stream: OK shape, but use NDJSON for streaming companion

No change needed here. See streaming comment below.

Comment on lines +20 to +31
def create_persona_app(persona_path: str) -> FastAPI:
"""
Creates and configures the FastAPI application for the Persona Server.

return app.response_class(streaming(), mimetype="text/event-stream")
Args:
persona_path (Optional[str]): Optional path to a persona JSON file.
If provided, it overrides the default
persona path from settings or environment.

############
# Ollama api compat
@app.route("/api/chat", methods=["POST"])
def chat():
model = request.json.get("model")
messages = request.json.get("messages")
tools = request.json.get("tools")
stream = request.json.get("stream")
Returns:
FastAPI: The configured FastAPI application instance.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden persona loading and align signature/docs

  • Signature requires str but doc says Optional[str]. Pick one and enforce.
  • Add explicit error handling for missing/invalid files and invalid JSON.
  • Avoid reusing variable name persona for both dict and Persona instance.

Apply:

-def create_persona_app(persona_path: str) -> FastAPI:
+def create_persona_app(persona_path: str) -> FastAPI:
@@
-    with open(persona_path) as f:
-        persona = json.load(f)
-        persona["name"] = persona.get("name") or os.path.basename(persona_path)
+    try:
+        with open(persona_path, encoding="utf-8") as f:
+            persona_data = json.load(f)
+    except FileNotFoundError as e:
+        raise RuntimeError(f"Persona file not found: {persona_path}") from e
+    except json.JSONDecodeError as e:
+        raise RuntimeError(f"Invalid persona JSON at {persona_path}: {e}") from e
+    persona_data["name"] = persona_data.get("name") or os.path.basename(persona_path)

Also applies to: 33-36

🤖 Prompt for AI Agents
In ovos_persona_server/__init__.py around lines 20-31 (also apply same changes
to lines 33-36), change the function signature to accept Optional[str] (from
typing import Optional) and update the docstring to match; then add explicit
checks and exception handling: if persona_path is None, use settings/default
path; if the resolved path does not exist raise a ValueError or HTTPException
with a clear message; when reading the file wrap the open/read in try/except to
catch IOErrors and raise a clear error; parse JSON inside a try/except to catch
JSONDecodeError and raise a descriptive error; lastly, avoid reusing the name
persona for two types—use persona_data for the raw dict and persona_obj for the
constructed Persona instance.

Comment on lines +50 to +56
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Allows all origins
allow_credentials=True,
allow_methods=["*"], # Allows all methods (GET, POST, etc.)
allow_headers=["*"], # Allows all headers
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CORS: wildcard origins + credentials is unsafe/invalid

allow_origins=["*"] together with allow_credentials=True is not permitted by browsers and weakens security. Either a) specify explicit origins, or b) keep wildcard but set allow_credentials=False.

Minimal fix:

-        allow_origins=["*"],  # Allows all origins
-        allow_credentials=True,
+        allow_origins=os.getenv("OVOS_ALLOWED_ORIGINS", "*").split(","),
+        allow_credentials=False,

I can wire this to settings/env and add tests.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In ovos_persona_server/__init__.py around lines 50-56, the CORS middleware
currently uses allow_origins=["*"] while allow_credentials=True which is
invalid/unsafe; change to load an explicit list of allowed origins from
configuration or environment and pass that list to allow_origins (e.g.,
ALLOWED_ORIGINS from settings), keep allow_credentials=True only if
ALLOWED_ORIGINS is not ["*"], otherwise set allow_credentials=False; add
validation to raise or log if ALLOWED_ORIGINS contains "*" while
allow_credentials is True and add tests to cover both the explicit-origins and
wildcard-without-credentials scenarios.

Comment on lines +22 to +29
parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)
parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)
parser.add_argument("--port", help="Port to run server on", default=8337, type=int)
args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic

app = get_app(os.path.expanduser(args.persona))
app = create_persona_app(args.persona)

app.run(host=args.host, port=args.port, debug=False)
uvicorn.run(app, port=args.port, host=args.host, log_level="debug")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make --persona required to avoid runtime crash.

create_persona_app opens the given path unconditionally; passing None (current default) will crash. Make the CLI arg required.

Apply:

-    parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)
+    parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str)

Optional:

  • Default host to 127.0.0.1 for dev and let users opt-in to 0.0.0.0.
  • Make --log-level configurable (default info) instead of hardcoded debug.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser.add_argument("--persona", help="Path to persona .json file", default=None, type=str)
parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)
parser.add_argument("--port", help="Port to run server on", default=8337, type=int)
args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic
app = get_app(os.path.expanduser(args.persona))
app = create_persona_app(args.persona)
app.run(host=args.host, port=args.port, debug=False)
uvicorn.run(app, port=args.port, host=args.host, log_level="debug")
parser.add_argument("--persona", help="Path to persona .json file", required=True, type=str)
parser.add_argument("--host", help="Host address to bind to", default="0.0.0.0", type=str)
parser.add_argument("--port", help="Port to run server on", default=8337, type=int)
args: Any = parser.parse_args() # Using Any for args as argparse.Namespace is dynamic
app = create_persona_app(args.persona)
uvicorn.run(app, port=args.port, host=args.host, log_level="debug")
🧰 Tools
🪛 Ruff (0.14.0)

23-23: Possible binding to all interfaces

(S104)

🤖 Prompt for AI Agents
In ovos_persona_server/__main__.py around lines 22 to 29, the CLI currently
allows --persona to be omitted which causes create_persona_app to crash when
passed None; change the parser.add_argument for --persona to make it required
(remove default and add required=True) so the program fails fast with argparse
if not provided. While editing, optionally change the --host default to
"127.0.0.1" and add a new --log-level argument (default "info") that is passed
into uvicorn.run instead of the hardcoded "debug" to make logging configurable.

Comment on lines +230 to +235
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate completion: token_ids support not yet implemented")
elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate completion: token_ids support not yet implemented")
else:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use 501 for unimplemented token_id prompts and fix f-strings.

These paths aren’t implemented; return Not Implemented and remove extraneous f-strings.

Apply:

-        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-                            detail=f"Failed to generate completion: token_ids support not yet implemented")
+        raise HTTPException(
+            status_code=status.HTTP_501_NOT_IMPLEMENTED,
+            detail="Failed to generate completion: token_ids support not yet implemented"
+        )
@@
-        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-                            detail=f"Failed to generate completion: token_ids support not yet implemented")
+        raise HTTPException(
+            status_code=status.HTTP_501_NOT_IMPLEMENTED,
+            detail="Failed to generate completion: token_ids support not yet implemented"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate completion: token_ids support not yet implemented")
elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to generate completion: token_ids support not yet implemented")
else:
raise HTTPException(
status_code=status.HTTP_501_NOT_IMPLEMENTED,
detail="Failed to generate completion: token_ids support not yet implemented"
)
elif isinstance(prompt, list) and all(isinstance(p, list) and all(isinstance(i, int) for i in p) for p in prompt):
raise HTTPException(
status_code=status.HTTP_501_NOT_IMPLEMENTED,
detail="Failed to generate completion: token_ids support not yet implemented"
)
else:
🧰 Tools
🪛 Ruff (0.14.0)

231-231: f-string without any placeholders

Remove extraneous f prefix

(F541)


234-234: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In ovos_persona_server/chat.py around lines 230 to 235, the branches raising
HTTPException for token_id prompts should return HTTP 501 Not Implemented and
not use unnecessary f-strings; replace
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR with
status.HTTP_501_NOT_IMPLEMENTED (or the corresponding numeric 501 constant) on
both raises and change detail=f"...token_ids support not yet implemented" to a
plain string detail="token_ids support not yet implemented" (remove the
f-prefix) so the responses correctly indicate unimplemented functionality.

Comment on lines +135 to +187
async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
"""
Asynchronously streams Ollama-compatible chat completion chunks.
"""
start_time: float = time.time()
# Placeholders for metrics in streaming chunks
streaming_load_duration: int = 0
streaming_prompt_eval_count: int = 0
streaming_prompt_eval_duration: int = 0
streaming_eval_count: int = 0
streaming_eval_duration: int = 0

try:
for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit):
if chunk:
# Increment eval_count for each chunk (approximate)
streaming_eval_count += len(chunk.split()) # Simple token count approximation
yield json.dumps({
"model": persona.name,
"created_at": ts,
"message": {"role": "assistant", "content": chunk},
"done": False,
# Update total_duration dynamically for streaming chunks
"total_duration": int((time.time() - start_time) * 1_000_000_000),
"load_duration": streaming_load_duration,
"prompt_eval_count": streaming_prompt_eval_count,
"prompt_eval_duration": streaming_prompt_eval_duration,
"eval_count": streaming_eval_count,
"eval_duration": streaming_eval_duration # This would be cumulative in a real scenario
}) + "\n"
except Exception as e:
# Handle streaming errors gracefully
yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
return # Stop the generator

end_time: float = time.time()
streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000)

yield json.dumps({
"model": persona.name,
"created_at": ts,
"message": {"role": "assistant", "content": ""}, # Empty content for final done chunk
"done": True,
"total_duration": streaming_total_duration,
"load_duration": streaming_load_duration,
"prompt_eval_count": streaming_prompt_eval_count,
"prompt_eval_duration": streaming_prompt_eval_duration,
"eval_count": streaming_eval_count,
"eval_duration": streaming_eval_duration,
"done_reason": done_reason
}) + "\n"

return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Streaming content-type should be NDJSON and avoid blocking the event loop

  • Ollama streams line-delimited JSON; use application/x-ndjson.
  • Iterating a sync generator in an async def can block the loop. Use Starlette’s iterate_in_threadpool.
  • Handle asyncio.CancelledError separately to avoid masking client disconnects.
+from starlette.concurrency import iterate_in_threadpool
+import asyncio
@@
-    async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
+    async def streaming_ollama_chat_response() -> AsyncGenerator[str, None]:
@@
-        try:
-            for chunk in persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit):
+        try:
+            async for chunk in iterate_in_threadpool(persona.stream(persona_messages, lang=sess.lang, units=sess.system_unit)):
                 if chunk:
@@
-        except Exception as e:
+        except asyncio.CancelledError:
+            raise
+        except Exception as e:
             # Handle streaming errors gracefully
             yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
             return  # Stop the generator
@@
-    return StreamingResponse(streaming_ollama_chat_response(), media_type="application/json")
+    return StreamingResponse(streaming_ollama_chat_response(), media_type="application/x-ndjson")

As per FastAPI/Starlette best practices for streaming. [Based on learnings]

🧰 Tools
🪛 Ruff (0.14.0)

165-165: Do not catch blind exception: Exception

(BLE001)

Comment on lines +273 to +324
async def streaming_ollama_generate_response() -> AsyncGenerator[str, None]:
"""
Asynchronously streams Ollama-compatible generation chunks.
"""
start_time: float = time.time()
# Placeholders for metrics in streaming chunks
streaming_load_duration: int = 0
streaming_prompt_eval_count: int = 0
streaming_prompt_eval_duration: int = 0
streaming_eval_count: int = 0
streaming_eval_duration: int = 0

try:
# Use persona.stream for streaming generation
for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit):
if chunk:
# Increment eval_count for each chunk (approximate)
streaming_eval_count += len(chunk.split()) # Simple token count approximation
yield json.dumps({
"model": persona.name,
"created_at": ts,
"response": chunk, # Ollama /generate uses "response" key for content
"done": False,
"total_duration": int((time.time() - start_time) * 1_000_000_000),
"load_duration": streaming_load_duration,
"prompt_eval_count": streaming_prompt_eval_count,
"prompt_eval_duration": streaming_prompt_eval_duration,
"eval_count": streaming_eval_count,
"eval_duration": streaming_eval_duration # This would be cumulative in a real scenario
}) + "\n"
except Exception as e:
yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
return

end_time: float = time.time()
streaming_total_duration: int = int((end_time - start_time) * 1_000_000_000)

yield json.dumps({
"model": persona.name,
"created_at": ts,
"response": "", # Empty content for final done chunk in /generate
"done": True,
"total_duration": streaming_total_duration,
"load_duration": streaming_load_duration,
"prompt_eval_count": streaming_prompt_eval_count,
"prompt_eval_duration": streaming_prompt_eval_duration,
"eval_count": streaming_eval_count,
"eval_duration": streaming_eval_duration,
"done_reason": done_reason
}) + "\n"

return StreamingResponse(streaming_ollama_generate_response(), media_type="application/json")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

/generate streaming: same NDJSON + threadpool + CancelledError pattern

Apply the same changes as in /chat streaming:

+from starlette.concurrency import iterate_in_threadpool
+import asyncio
@@
-        try:
-            # Use persona.stream for streaming generation
-            for chunk in persona.stream(messages, lang=sess.lang, units=sess.system_unit):
+        try:
+            async for chunk in iterate_in_threadpool(persona.stream(messages, lang=sess.lang, units=sess.system_unit)):
                 if chunk:
@@
-        except Exception as e:
+        except asyncio.CancelledError:
+            raise
+        except Exception as e:
             yield json.dumps({"error": str(e), "done": True, "done_reason": "error"}) + "\n"
             return
🧰 Tools
🪛 Ruff (0.14.0)

303-303: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
ovos_persona_server/ollama.py lines 273-324: the /generate streaming handler
currently iterates persona.stream directly and returns JSON/plain application,
but needs the same NDJSON, threadpool offload, and CancelledError handling used
in /chat streaming; change the generator to run persona.stream in an executor
(e.g., asyncio.to_thread or loop.run_in_executor) so blocking IO doesn't block
the event loop, yield each chunk as NDJSON (one JSON object per line) with
"done": False, catch asyncio.CancelledError separately to stop streaming cleanly
without emitting an error chunk, catch other Exceptions to emit an error NDJSON
chunk and return, and finally emit the final "done": True NDJSON chunk; also
return StreamingResponse with media_type "application/x-ndjson".

Comment on lines +34 to +35
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to load persona")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string.

No placeholders used.

Apply:

-        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-                            detail=f"Failed to load persona")
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail="Failed to load persona"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to load persona")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to load persona"
)
🧰 Tools
🪛 Ruff (0.14.0)

35-35: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In ovos_persona_server/persona.py around lines 34 to 35, the HTTPException
detail uses an unnecessary f-string with no placeholders; replace the f-string
with a plain string literal "Failed to load persona" (remove the leading f) so
the detail is a normal string.

Comment on lines +27 to +38
model: Optional[str] = Field(..., description="The name of the model that generated the response.")
created_at: str = Field(..., description="The timestamp when the response was created (ISO 8601 format).")
message: Dict[str, str] = Field(..., description="The message object containing the role and content of the assistant's reply.")
done: bool = Field(..., description="Indicates if the response is the final one in a stream.")
context: Optional[List[int]] = Field(None, description="An encoding of the conversation used, can be sent in the next request to keep conversational memory (list of token IDs).")
total_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.")
load_duration: Optional[int] = Field(None, description="Time spent loading the model into memory in nanoseconds.")
prompt_eval_count: Optional[int] = Field(None, description="Number of tokens in the prompt.")
prompt_eval_duration: Optional[int] = Field(None, description="Time spent evaluating the prompt in nanoseconds.")
eval_count: Optional[int] = Field(None, description="Number of tokens in the response.")
eval_duration: Optional[int] = Field(None, description="Time spent generating the response in nanoseconds.")
done_reason: Optional[str] = Field(None, description="The reason the model stopped generating (e.g., 'stop', 'unload').")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Required fields typed as Optional with Field(...)

Several fields are marked Optional[...] but declared required with Field(...). This confuses validation and OpenAPI. Make them required (drop Optional) or truly optional (default None).

Examples to fix:

-    model: Optional[str] = Field(..., description="The name of the model ...")
+    model: str = Field(..., description="The name of the model ...")

-    model: Optional[str] = Field(..., description="The model to use for the chat.")
+    model: str = Field(..., description="The model to use for the chat.")

-    model: Optional[str] = Field(..., description="The model name to use for generation.")
+    model: str = Field(..., description="The model name to use for generation.")

-    model: Optional[str] = Field(..., description="The base model name.")
+    model: str = Field(..., description="The base model name.")

-    model: Optional[str] = Field(..., description="The model name to use for generating embeddings.")
+    model: str = Field(..., description="The model name to use for generating embeddings.")

-    model: Optional[str] = Field(..., description="The name of the model that generated the embeddings.")
+    model: str = Field(..., description="The name of the model that generated the embeddings.")

As per coding guidelines for Pydantic v2. [Based on learnings]

Also applies to: 108-116, 139-151, 186-191, 201-201, 218-223, 236-241

🤖 Prompt for AI Agents
In ovos_persona_server/schemas/ollama.py around lines 27 to 38 (and similarly at
108-116, 139-151, 186-191, 201, 218-223, 236-241), several fields are declared
as Optional[...] but use Field(...) which makes them required in Pydantic v2;
update each field to either be truly optional by setting Optional[...] with a
default of None (e.g., Field(None, ...)) or make them required by removing
Optional[...] and keeping Field(...), ensuring the type and default align so
OpenAPI/validation correctly reflects required vs optional.

Comment on lines +336 to +349
class CreateChatCompletionRequest(BaseModel):
"""
Request body for creating a chat completion.
"""
messages: List[
Union[
ChatCompletionRequestSystemMessage,
ChatCompletionRequestUserMessage,
ChatCompletionRequestAssistantMessage, # This one is not explicitly defined in the OpenAPI, but used in examples. Adding for completeness.
ChatCompletionRequestToolMessage,
ChatCompletionRequestFunctionMessage, # Deprecated
ChatCompletionRequestDeveloperMessage
]
] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. [Example Python code](https://cookbook.openai.com/examples/how_to_format_messages_for_chat_completions).")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Default for messages uses wrong model and a mutable value

  • Union excludes ChatCompletionRequestMessage; default should be ChatCompletionRequestUserMessage.
  • Avoid mutable defaults; use default_factory.
-    ] = Field([ChatCompletionRequestMessage(role="user", content="hello world")], description="A list of messages comprising the conversation so far. ...")
+    ] = Field(
+        default_factory=lambda: [ChatCompletionRequestUserMessage(content="hello world")],
+        description="A list of messages comprising the conversation so far. ..."
+    )

As per Pydantic v2 best practices. [Based on learnings]

🤖 Prompt for AI Agents
In ovos_persona_server/schemas/openai_chat.py around lines 336-349, the messages
field currently uses a mutable list literal and the wrong default model; change
the Field to use default_factory returning a list with a
ChatCompletionRequestUserMessage instance (e.g. default_factory=lambda:
[ChatCompletionRequestUserMessage(role="user", content="hello world")]) instead
of the inline list, and ensure the Union of allowed message types remains
correct (use ChatCompletionRequestUserMessage as the default element type). This
removes the mutable default and fixes the default message model.

Comment on lines +1 to +3
fastapi
ovos-persona
pydantic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add uvicorn and pin runtime deps for reproducibility.

uvicorn is required by __main__.py but not declared; installs will fail to run the server. Also consider pinning versions to stable ranges.

Apply:

 fastapi
 ovos-persona
 pydantic
+uvicorn[standard]

Optionally pin:

-fastapi
-pydantic
+fastapi>=0.119,<0.120
+pydantic>=2.11,<3
+# keep ovos-persona as constrained by your release process
+# and consider pinning uvicorn too
+uvicorn[standard]>=0.30,<0.31

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fastapi
ovos-persona
pydantic
fastapi
ovos-persona
pydantic
uvicorn[standard]
🤖 Prompt for AI Agents
In requirements.txt lines 1-3, uvicorn is missing and runtime deps are unpinned
which hurts reproducibility; update the file to add uvicorn (required by
__main__.py) and pin the runtime packages to stable version ranges (e.g.,
specific major.minor or caret-style ranges) so installs are reproducible — add a
uvicorn entry and replace bare package names with pinned versions or ranges for
fastapi, ovos-persona, pydantic (and uvicorn) consistent with tested
compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant