feat(runtime): add standalone edge runtime for Pi/Jetson deployment#799
feat(runtime): add standalone edge runtime for Pi/Jetson deployment#799rachmlenig wants to merge 30 commits intomainfrom
Conversation
Break the edge runtime's dependency on runtimes/universal/ by copying and trimming the needed files into runtimes/edge/. The edge runtime is now fully self-contained with its own pyproject.toml, Dockerfile, and zero imports from universal. Key changes from universal: - models/__init__.py exports only 4 model types (was 12) - vision router includes only detection/classification/streaming (no training, evaluation, tracking, OCR, or document extraction) - chat_completions/service.py makes heavy utils optional (context summarizer, history compressor, tool calling, thinking) - file_handler.py rewritten without PyMuPDF (no PDF processing) - context_calculator.py makes torch import lazy for GGUF-only deploys
✅ All E2E Tests Passed!Test Results by Platform
Summary
This comment was automatically generated by the E2E Tests workflow. |
Move 4 identical utility modules from both runtimes into llamafarm_common so bugs fixed in one place apply everywhere: - utils/safe_home.py → llamafarm_common/safe_home.py - utils/device.py → llamafarm_common/device.py - utils/model_cache.py → llamafarm_common/model_cache.py - utils/model_format.py → llamafarm_common/model_format.py Both runtimes now have thin re-export shims that import from llamafarm_common, so all internal `from utils.X import Y` statements continue to work unchanged. Also: - Add cachetools dep to llamafarm_common (needed by model_cache) - Consolidate pidfile.py to use safe_home instead of duplicating home directory resolution logic - Fix model_format.py internal import to use relative import within the common package Removes ~1,100 lines of duplicated code across the two runtimes. Candidates identified but not moved (would add heavy deps to common): - core/logging.py (needs structlog) - services/error_handler.py (needs fastapi) - models/base.py, vision_base.py (architectural scope change) - All router files (deeply coupled to FastAPI app structure)
Add HailoYOLOModel that runs YOLO inference on the Hailo-10H AI accelerator using pre-compiled .hef models from the Hailo Model Zoo. The server auto-detects Hailo hardware at startup: - Checks for hailo_platform package - Checks for PCI device ID 1e60 (Hailo-10H) via lspci - Falls back to /dev/hailo0 device node - Falls back to CPU/ultralytics if Hailo not available New file: models/hailo_model.py - HailoYOLOModel with same interface as YOLOModel - Letterbox preprocessing for aspect-ratio-preserving resize - NMS output parsing (Hailo .hef models include built-in NMS) - COCO 80-class label mapping - Configurable .hef directory via HAILO_HEF_DIR env var Server changes: - load_detection_model() selects backend based on hardware detection - FORCE_CPU_VISION=1 env var to skip Hailo and force CPU - hailo_platform import is fully optional (try/except)
Build from repo root with -f flag so COPY can reach common/ and packages/. Use --no-sources to skip [tool.uv.sources] relative paths that don't apply inside the container. Usage: docker build -t edge-runtime -f runtimes/edge/Dockerfile .
Two bugs prevented llama.cpp from loading on ARM64 Linux (Pi/Jetson): 1. Version mismatch: _get_llamafarm_release_version() read the llamafarm-llama package version (0.1.0) but the ARM64 binary is published under the main monorepo release tag (v0.0.28). These versions are decoupled. Now queries GitHub API for the latest release, with LLAMAFARM_RELEASE_VERSION env var override and v0.0.28 hardcoded fallback. 2. Extension mismatch: manifest template used .tar.gz but the actual published asset is .zip. Fixed to match.
The pre-built llama.cpp ARM64 binary requires GLIBC 2.38+ but python:3.12-slim-bookworm only has GLIBC 2.36. Switch to ubuntu:24.04 (GLIBC 2.39) and install Python via apt. Also add --break-system-packages to uv pip install since Ubuntu 24.04 marks system Python as externally managed (PEP 668). This is safe inside a container.
Address CodeQL review comments: - Remove `from module import *` from all 8 re-export shims (edge + universal). The explicit imports already cover everything needed. - Remove unused `get_file_images` import from edge server.py
Dockerfile: - Install vision extra (ultralytics, transformers) and pi-heif - Add system libs for OpenCV (libgl1, libglib2.0-0, libxcb1) - Set YOLO_AUTOINSTALL=false to prevent runtime pip installs Vision hardening: - Strip whitespace from base64 input before decoding (fixes newlines from curl piping with jq/base64 tools) - Wrap PIL Image.open() with proper error handling in vision_base, detect_classify — returns clear error instead of raw traceback - Pre-register HEIF plugin in yolo_model.py to prevent import errors on some ultralytics builds
…edge - Re-export HfApi and _check_local_cache_for_model from the universal model_format shim so tests that mock utils.model_format.HfApi continue to work - Add project.json for edge runtime so Nx can process the project graph without failing on the unnamed directory
…/rag The model_format tests mocked utils.model_format.HfApi, but after the refactor to a re-export shim, detect_model_format lives in llamafarm_common.model_format and uses its own HfApi reference. Fix mock targets to patch at the source module. The E2E source tests failed because UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY leaked from the CI environment into server/rag processes via os.Environ(). The PyTorch CPU index only has cp314 wheels for markupsafe, causing install failures on Python 3.12. Strip these vars from the base process environment so only services that explicitly declare them (universal-runtime) receive them.
Review Summary by QodoAdd standalone edge runtime for Pi/Jetson deployment with GGUF inference and KV cache management
WalkthroughsDescription• Introduces a fully self-contained edge runtime for Pi/Jetson deployment with zero dependencies on runtimes/universal/ • Implements GGUF language model inference via llama-cpp with memory-optimized quantized model support for edge devices • Adds multi-tier KV cache management (VRAM → RAM → disk) with segment-level validation for efficient prefix caching • Provides OpenAI-compatible chat completions service with optional heavy utilities (context summarizer, history compressor, tool calling) • Implements vision routers for detection, classification, and streaming with Hailo-10H accelerator support • Adds comprehensive context management with multiple truncation strategies (sliding_window, keep_system, middle_out, summarize) • Introduces thinking/reasoning model support with budget allocation and chain-of-thought utilities • Consolidates device detection and model format utilities into common llamafarm_common package for code reuse • Includes GPU allocation with VRAM estimation and SSRF-safe remote cascade support • Provides GGML logging integration and metadata caching to optimize performance on constrained hardware Diagramflowchart LR
A["Edge Runtime<br/>FastAPI Server"] --> B["GGUF Language<br/>Model"]
A --> C["Vision Models<br/>Detection/Classification"]
A --> D["KV Cache<br/>Manager"]
B --> E["llama-cpp<br/>Inference"]
B --> F["Context<br/>Calculator"]
D --> G["Multi-tier Cache<br/>VRAM/RAM/Disk"]
A --> H["Chat Completions<br/>Service"]
H --> I["Optional Utils<br/>Summarizer/Compressor"]
H --> J["Tool Calling<br/>& Thinking"]
C --> K["Hailo-10H<br/>Accelerator"]
L["Common Package"] -.->|Device Detection| A
L -.->|Model Format| A
File Changes1. runtimes/edge/models/gguf_language_model.py
|
Code Review by Qodo
1. Undocumented UV_INDEX_STRATEGY env var
|
There was a problem hiding this comment.
40 issues found across 66 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/routers/chat_completions/service.py">
<violation number="1" location="runtimes/edge/routers/chat_completions/service.py:46">
P0: The fallback stub returns a plain tuple `(text, None)`, but callers access `.content` and `.thinking` attributes (line 1223+). This will raise `AttributeError` for every non-streaming request when `utils.thinking` is absent — which is the expected edge deployment case.</violation>
<violation number="2" location="runtimes/edge/routers/chat_completions/service.py:398">
P1: `HistoryCompressor` can be `None` (line 39) but is called unconditionally here. This raises `TypeError` on every GGUF request with a context manager when the module is absent. Guard with a `None` check.</violation>
</file>
<file name="runtimes/edge/utils/context_calculator.py">
<violation number="1" location="runtimes/edge/utils/context_calculator.py:87">
P1: This branch uses system RAM as the CUDA memory budget when PyTorch is missing, so GGUF-only CUDA deployments can overestimate available memory and pick an unsafe context size.</violation>
<violation number="2" location="runtimes/edge/utils/context_calculator.py:445">
P1: When memory limits the safe context below 2048, this fallback still forces `2048`, which can overshoot the computed limit and trigger OOM on low-memory devices.</violation>
</file>
<file name="runtimes/edge/routers/cache.py">
<violation number="1" location="runtimes/edge/routers/cache.py:94">
P1: This router is never included in the FastAPI app, so all of the new `/v1/cache/*` endpoints are dead code.</violation>
</file>
<file name="runtimes/edge/server.py">
<violation number="1" location="runtimes/edge/server.py:271">
P1: Path traversal bypass: `Path('..').name` returns `'..'`, so `model_id='..'` passes the basename check and causes `VISION_MODELS_DIR / '..' / 'current.pt'` to escape the vision models directory. Reject `.` and `..` explicitly, and add a resolved-path containment check.
(Based on your team's feedback about combining basename checks with resolved-path containment.) [FEEDBACK_USED]</violation>
</file>
<file name="runtimes/edge/models/yolo_model.py">
<violation number="1" location="runtimes/edge/models/yolo_model.py:59">
P1: Replace the prefix check with a real resolved-path containment check; `startswith()` still accepts sibling directories outside the approved roots.
(Based on your team's feedback about validating identifiers used in filesystem paths with resolved-path containment.) [FEEDBACK_USED]</violation>
<violation number="2" location="runtimes/edge/models/yolo_model.py:96">
P2: Check `classes` explicitly. An empty list currently falls through as `None`, so detection runs without any class filter and returns classes the caller did not request.
(Based on your team's feedback about distinguishing `classes=None` from an explicit empty list.) [FEEDBACK_USED]</violation>
</file>
<file name="runtimes/edge/utils/context_manager.py">
<violation number="1" location="runtimes/edge/utils/context_manager.py:344">
P2: `middle_out` only keeps the first non-system message, so it can drop the assistant half of the initial exchange the strategy promises to preserve.</violation>
<violation number="2" location="runtimes/edge/utils/context_manager.py:413">
P1: Handle multimodal `content` parts here instead of passing list-valued message content to string-only token truncation.</violation>
</file>
<file name="runtimes/edge/services/error_handler.py">
<violation number="1" location="runtimes/edge/services/error_handler.py:118">
P1: Do not return raw exception text from the catch-all handler; it leaks internal server details to clients.
(Based on your team's feedback about avoiding raw exception text in client-facing errors.) [FEEDBACK_USED]</violation>
</file>
<file name="runtimes/edge/models/clip_model.py">
<violation number="1" location="runtimes/edge/models/clip_model.py:77">
P1: Guard the class cache with `_class_embeddings is not None`; after `unload()`, the stale cache key can skip re-encoding and make `classify()` fail on `self._class_embeddings.T`.</violation>
<violation number="2" location="runtimes/edge/models/clip_model.py:80">
P1: Avoid storing per-request classes in shared instance state. Concurrent `classify()` calls can overwrite `self.class_names` and `_class_embeddings`, returning scores for the wrong labels.</violation>
</file>
<file name="runtimes/edge/routers/chat_completions/router.py">
<violation number="1" location="runtimes/edge/routers/chat_completions/router.py:21">
P1: Avoid logging the full chat request body here; this endpoint accepts raw message contents and audio/tool payloads, so DEBUG logs would capture user data and large request bodies.</violation>
</file>
<file name="runtimes/edge/routers/vision/streaming.py">
<violation number="1" location="runtimes/edge/routers/vision/streaming.py:212">
P1: Remote cascade results bypass the confidence threshold check. `_RemoteResult` is always truthy, so `if result:` accepts any remote response regardless of confidence — breaking the cascade escalation logic. The local model path correctly gates on `det_result.confidence >= session.cascade.confidence_threshold`; the remote path should do the same.</violation>
</file>
<file name="packages/llamafarm-llama/src/llamafarm_llama/_binary.py">
<violation number="1" location="packages/llamafarm-llama/src/llamafarm_llama/_binary.py:77">
P1: Selecting the latest monorepo tag without verifying the matching `llama-{version}-bin-linux-arm64.zip` asset exists can make ARM64 downloads 404 when the release tag and llama.cpp version drift.</violation>
</file>
<file name="runtimes/edge/utils/kv_cache_manager.py">
<violation number="1" location="runtimes/edge/utils/kv_cache_manager.py:683">
P1: `gc()` mutates `_entries` and `_content_index` without holding `self._lock`, while all other async mutators acquire it. The `_gc_loop` background task should acquire the lock before calling `gc()`, otherwise concurrent `prepare()`/`save_after_generation()` calls can observe inconsistent state (e.g., a `_content_index` pointing to an already-evicted `_entries` key).</violation>
</file>
<file name="runtimes/edge/utils/history_compressor.py">
<violation number="1" location="runtimes/edge/utils/history_compressor.py:139">
P1: Skip deduplication for `tool` messages. Removing a repeated tool result by content can leave an unmatched `tool_call_id` in history and make the next chat request fail protocol validation.</violation>
</file>
<file name="runtimes/edge/routers/vision/detect_classify.py">
<violation number="1" location="runtimes/edge/routers/vision/detect_classify.py:60">
P2: Reject an explicit empty `detection_classes` list here. Right now `[]` falls through as “no filter” in both detection backends, so this endpoint can return all detections instead of none/an error.
(Based on your team's feedback about differentiating an explicit empty classes list from a missing one.) [FEEDBACK_USED]</violation>
<violation number="2" location="runtimes/edge/routers/vision/detect_classify.py:118">
P1: Validate `classification_model` before loading it. `load_classification_model` passes this value straight into `from_pretrained(...)`, which accepts local directory paths, so this endpoint currently allows path-like model ids to reach arbitrary filesystem locations.</violation>
</file>
<file name="runtimes/edge/utils/thinking.py">
<violation number="1" location="runtimes/edge/utils/thinking.py:136">
P1: This fallback drops the original user prompt when content can't be converted to a list.</violation>
<violation number="2" location="runtimes/edge/utils/thinking.py:248">
P2: This budget check is off by one and starts forcing `</think>` one token too early.</violation>
</file>
<file name="common/llamafarm_common/model_format.py">
<violation number="1" location="common/llamafarm_common/model_format.py:128">
P1: Don't treat a cache with no `.gguf` files as proof the repo is transformers-only. `scan_cache_dir()` only sees files already downloaded locally, so partially cached GGUF repos will be misclassified and loaded with the wrong model class.</violation>
</file>
<file name="common/llamafarm_common/model_cache.py">
<violation number="1" location="common/llamafarm_common/model_cache.py:56">
P1: `TTLCache(ttl=ttl * 10)` still gives each entry a hard internal expiry, so frequently accessed models can disappear after ~10×ttl anyway.</violation>
</file>
<file name="runtimes/edge/models/language_model.py">
<violation number="1" location="runtimes/edge/models/language_model.py:142">
P2: `generate()` ignores the caller's `stop` sequences and can return text past the requested boundary.</violation>
</file>
<file name="runtimes/edge/core/logging.py">
<violation number="1" location="runtimes/edge/core/logging.py:125">
P1: Enable propagation after clearing Uvicorn handlers, or `uvicorn.error` logs can be dropped instead of reaching the root structlog handler.</violation>
</file>
<file name="runtimes/edge/utils/jinja_tools.py">
<violation number="1" location="runtimes/edge/utils/jinja_tools.py:116">
P2: This substring check can misclassify templates as tool-aware and skip the fallback injector, which silently drops tool definitions for affected models.</violation>
<violation number="2" location="runtimes/edge/utils/jinja_tools.py:128">
P1: Use `ImmutableSandboxedEnvironment` here; the regular sandbox still lets untrusted GGUF templates mutate caller-owned `messages`/`tools` lists and dicts during rendering.</violation>
</file>
<file name="runtimes/edge/utils/tool_calling.py">
<violation number="1" location="runtimes/edge/utils/tool_calling.py:19">
P1: Regex tag parsing breaks valid tool calls when an argument string contains `</tool_call>`.</violation>
</file>
<file name="runtimes/edge/routers/chat_completions/types.py">
<violation number="1" location="runtimes/edge/routers/chat_completions/types.py:227">
P2: Multiple audio parts in one message are collapsed into a single transcription key, so STT fallback overwrites earlier segments and reuses the last transcript for every audio part.</violation>
</file>
<file name="runtimes/edge/utils/gguf_metadata_cache.py">
<violation number="1" location="runtimes/edge/utils/gguf_metadata_cache.py:69">
P2: `_cache_lock` is held during a slow `GGUFReader` I/O call in the monkey-patch fallback path, blocking all cache reads/writes for the entire retry duration (potentially seconds). This contradicts the explicit comment on line 76: "outside lock to avoid blocking". Use a separate lock to serialize the monkey-patch so cache lookups for already-cached paths aren't stalled.</violation>
</file>
<file name="runtimes/edge/routers/vision/detection.py">
<violation number="1" location="runtimes/edge/routers/vision/detection.py:58">
P2: Decode and validate the image before calling `_load_fn()`. Right now malformed requests can still trigger a full YOLO load before the endpoint returns 400.</violation>
<violation number="2" location="runtimes/edge/routers/vision/detection.py:61">
P2: Reject an explicit empty `classes` list before calling `model.detect()`. `[]` currently falls through as "no filter" and returns detections for every class.
(Based on your team's feedback about validating empty `classes` lists explicitly.) [FEEDBACK_USED]</violation>
</file>
<file name="runtimes/edge/pyproject.toml">
<violation number="1" location="runtimes/edge/pyproject.toml:67">
P2: This wheel config omits the non-Python context defaults file that `utils/context_calculator.py` expects, so installed edge runtimes will always fall back to the generic 2048-token context calculation path.</violation>
</file>
<file name="runtimes/edge/utils/context_summarizer.py">
<violation number="1" location="runtimes/edge/utils/context_summarizer.py:144">
P2: Count recent conversation turns by turn boundaries, not `keep_recent * 2` raw messages, or tool-using chats will summarize part of the latest turns.</violation>
</file>
<file name="runtimes/edge/Dockerfile">
<violation number="1" location="runtimes/edge/Dockerfile:39">
P2: Copy the edge runtime sources before installing the local `.[vision]` package.</violation>
<violation number="2" location="runtimes/edge/Dockerfile:56">
P2: Use `LF_RUNTIME_PORT` in the health check instead of hard-coding 11540.</violation>
</file>
<file name="cli/cmd/orchestrator/python_env.go">
<violation number="1" location="cli/cmd/orchestrator/python_env.go:167">
P2: This filter still lets other uv index env vars leak through. `UV_INDEX_URL`/`UV_DEFAULT_INDEX` and `UV_INDEX` also affect package resolution, so a user-exported index can still override installs for services that were supposed to run without the PyTorch index.</violation>
</file>
<file name="cli/cmd/orchestrator/services.go">
<violation number="1" location="cli/cmd/orchestrator/services.go:154">
P2: This now forces `UV_INDEX_STRATEGY` into the child env even when it is unset, which can override uv's default with an empty invalid value.</violation>
</file>
<file name="runtimes/edge/config/model_context_defaults.yaml">
<violation number="1" location="runtimes/edge/config/model_context_defaults.yaml:24">
P2: This `*Llama-3*` fallback also catches Llama 3.1 models and downgrades them to 8K context. Add a more specific Llama 3.1 rule before the generic Llama 3 entry.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
lspci -d 1e60: already filters by vendor ID, but the old check looked for "1e60" in the output text. lspci resolves vendor IDs to names, so the output contains "Hailo Technologies Ltd." instead of "1e60", causing detection to always fail. Check for any non-empty output instead.
ConfiguredInferModel doesn't expose .output() in HailoRT 5.2.0. Use self._infer_model.output().shape instead of self._configured.output().shape to get the output buffer dimensions.
ConfiguredInferModel.run() does not accept timeout_ms in HailoRT 5.2.0.
HailoRT 5.2.0 accepts timeout as a positional arg, not keyword.
… bots - Log warning on failed CPU offload instead of silent pass (base.py) - Log debug on unified memory detection failure instead of silent pass (gguf_language_model.py) - Remove unused _timing_start variable (gguf_language_model.py) - Add path traversal validation in load_language() (server.py) - Add path traversal validation in _read_gguf_metadata() (gguf_metadata_cache.py)
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/server.py">
<violation number="1" location="runtimes/edge/server.py:218">
P2: Path-traversal validation is incomplete: it misses Windows absolute/UNC path forms, so crafted `model_id` values can bypass this guard.</violation>
</file>
<file name="runtimes/edge/utils/gguf_metadata_cache.py">
<violation number="1" location="runtimes/edge/utils/gguf_metadata_cache.py:91">
P2: Substring matching ".." will reject valid GGUF paths that contain ".." in a filename or directory name. Check path segments for ".." instead of the raw string so legitimate paths aren’t blocked.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Hailo NMS outputs normalized coordinates (0.0–1.0) relative to the letterboxed input, but the rescaling math treated them as pixel values. This caused coordinates like (0.5 - 134px) = -133.5, producing garbage bounding boxes. Convert normalized coords to pixel space first (multiply by input dimensions), then subtract padding and divide by scale.
Log raw detection values and mapped coordinates to diagnose coordinate order mismatch (inverted boxes suggest y1/x1/y2/x2 order may be wrong).
The Hailo Model Zoo NMS postprocess outputs a per-class tensor of shape (num_classes, max_det, 5) where each detection is [y_min, x_min, y_max, x_max, score]. The class ID is implicit from the first dimension index. The old parser assumed a flat (N, 6) layout with [y1, x1, y2, x2, conf, class_id] which misaligned the data, producing inverted/out-of-bounds bounding boxes (e.g. det[0]=4.0 from misaligned class boundaries). Also adds debug logging for output tensor shapes and per-detection coordinate mapping to aid further diagnosis.
The Hailo NMS output for YOLO models is a flat buffer of size num_classes × (1 + max_det × 5). For 80 COCO classes with 100 max detections this is 40080 floats. Per-class layout: [count, y1, x1, y2, x2, score, y1, x1, ...]. The count field gives the number of valid detections for that class. The previous parser expected (num_classes, max_det, 5) which never matched the actual flat (40080,) shape, returning zero detections.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/models/hailo_model.py">
<violation number="1" location="runtimes/edge/models/hailo_model.py:142">
P1: The `nc=1` fallback can mis-parse valid multi-class NMS buffers as single-class output, leading to incorrect detections.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Harden path traversal check to reject Windows absolute/UNC paths (backslash, drive letter) in model_id (server.py) - Check path segments for ".." instead of raw substring match so legitimate paths with ".." in filenames aren't rejected (gguf_metadata_cache.py) - Remove nc=1 fallback in Hailo NMS parser to prevent mis-parsing multi-class buffers as single-class (hailo_model.py)
Allow other drone services (vision, comms, flight-control) to request LLM inference over the Zenoh pub/sub bus instead of HTTP, matching the IPC pattern used across all other on-drone services. - Subscribe to local/llm/request for inference requests - Publish results to local/llm/response - Publish heartbeat to local/llm/status - Graceful degradation: logs warning and continues HTTP-only if Zenoh socket is unavailable or eclipse-zenoh is not installed - Configurable via ZENOH_ENDPOINT and ZENOH_ENABLED env vars
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/services/zenoh_ipc.py">
<violation number="1" location="runtimes/edge/services/zenoh_ipc.py:152">
P2: Return a generic error string instead of `str(exc)` to avoid exposing internal exception details to IPC clients.
(Based on your team's feedback about avoiding raw exception text in client-facing responses.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
The eclipse-zenoh Python API's zenoh.open() returns a Session synchronously. Using await caused a TypeError at runtime.
The eclipse-zenoh Python package exposes a synchronous API, not async. Replace all awaited Zenoh calls with synchronous equivalents and switch the subscriber from an async iterator to a callback pattern using asyncio.run_coroutine_threadsafe to bridge back into the event loop.
get_optimal_device() was called every 30s by the health check, logging "Using CPU (no GPU acceleration)" each time. Cache the result so detection and its log messages only run once on startup.
Add edge-runtime service to all Docker workflow stages: build-amd64, build-arm64, create-manifest, test-compose, security-scan, and the release retagging workflow. Uses repo root as build context with runtimes/edge/Dockerfile. No special runners, disk cleanup, or PyTorch handling needed — it's a lightweight image.
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/services/zenoh_ipc.py">
<violation number="1" location="runtimes/edge/services/zenoh_ipc.py:128">
P1: Track and await/cancel Futures returned by `run_coroutine_threadsafe`; otherwise in-flight request handlers can outlive shutdown and fail after the session is closed.</violation>
</file>
<file name=".github/workflows/docker.yml">
<violation number="1" location=".github/workflows/docker.yml:379">
P1: Adding `edge-runtime` makes the current `grep "$SERVICE"` image lookup ambiguous for `runtime`, so the workflow can tag the wrong image and validate the wrong container.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
The lockfile was stale, causing `uv sync --locked` to fail in CI.
Adds an OpenAI-compatible text completions endpoint that accepts a raw prompt string without applying any chat template. Needed for fine-tuned models whose GGUF chat template doesn't match their training format.
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtimes/edge/routers/completions.py">
<violation number="1" location="runtimes/edge/routers/completions.py:55">
P2: `max_tokens` defaulting uses a falsy check, so an explicit `0` is overwritten to `512` instead of being preserved.</violation>
<violation number="2" location="runtimes/edge/routers/completions.py:63">
P2: `temperature`/`top_p` use falsy-default logic, which overrides explicit numeric values like `0` and changes generation behavior.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| if os.getenv("FORCE_CPU_VISION", "").lower() in ("1", "true", "yes"): | ||
| logger.info("Hailo detection skipped (FORCE_CPU_VISION=1)") | ||
| _use_hailo = False |
| import hailo_platform # noqa: F401 | ||
| except ImportError: | ||
| logger.info("hailo_platform not installed, using CPU backend for vision") | ||
| _use_hailo = False |
| ) | ||
| if result.stdout.strip(): | ||
| logger.info("Hailo-10H detected, using Hailo backend for vision") | ||
| _use_hailo = True |
| # Fallback: check for /dev/hailo0 | ||
| if os.path.exists("/dev/hailo0"): | ||
| logger.info("Hailo device found at /dev/hailo0, using Hailo backend") | ||
| _use_hailo = True |
| return True | ||
|
|
||
| logger.info("Hailo not detected, using CPU backend for vision") | ||
| _use_hailo = False |
Fixes for Qodo Code Review findingsBugs:
Rule violations:
|
Fixes for cubic code review findingsAddressed the following issues from the 40-issue review:
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Break the edge runtime's dependency on runtimes/universal/ by copying and trimming the needed files into runtimes/edge/. The edge runtime is now fully self-contained with its own pyproject.toml, Dockerfile, and zero imports from universal.
Key changes from universal: