feat(host-agent): add per-container stats endpoint and extract dir_size_gb#804
Conversation
…ze_gb Add GET /v1/service/stats to the host agent returning CPU, memory, and PID metrics for all dream-* containers via docker stats --no-stream. Extract dir_size_gb() from nested function in main.py to module-level in helpers.py for reuse by the upcoming resources endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs
left a comment
There was a problem hiding this comment.
Audit: APPROVE — solid foundational PR
Clean implementation. docker stats --no-stream with --format template avoids parsing ambiguity. _parse_mem_value() handles Docker IEC units correctly with longest-suffix-first matching. The dir_size_gb extraction to module-level improves reuse, and the new symlink-skip (not f.is_symlink()) is a good security improvement preventing symlink-based traversal.
Auth enforced, read-only, dream-* filter prevents exposing non-DreamServer containers. No shell=True.
Minor notes (non-blocking):
- Lazy import of
datetimeinside_iso_now()is unconventional — consider module-level removeprefix("dream-")requires Python 3.9+ (fine since container runs 3.11)_parse_mem_value()returns 0.0 for unrecognized units silently — consider logging a warning
This is a dependency for #806, #808, #810, #812. Should merge first.
Lightheartdevs
left a comment
There was a problem hiding this comment.
Revised Audit: REQUEST CHANGES — crash bug on restarting containers
Withdrawing my earlier approval after deeper review.
Bug 1 (MEDIUM — runtime crash): int("--") ValueError
Docker returns "--" for PIDs when a container is in Created/Restarting/Paused state. The code does int(raw.get("pids", "0") or "0") — the or "0" guard only handles empty string and None, not "--". This throws an unhandled ValueError, crashing the entire request.
Fix:
try:
pids = int(raw.get("pids", "0") or "0")
except (ValueError, TypeError):
pids = 0Bug 2 (LOW-MEDIUM — traceback leakage): Missing broad exception handler
The existing _handle_logs() has except Exception as exc: but the new _handle_service_stats only catches TimeoutExpired. Any unexpected error (e.g., FileNotFoundError if Docker binary disappears, UnicodeDecodeError from malformed output) leaks a raw Python traceback in the HTTP response.
Fix: Add except Exception as exc: matching the existing endpoint pattern.
Bug 3 (LOW): No result.returncode check on docker stats
If Docker returns non-zero with diagnostic text on stdout, it could trigger json.JSONDecodeError in the loop — which IS caught, so this degrades gracefully. But logging the error would help debugging.
Everything else confirmed solid: auth enforced, no injection vectors, _parse_mem_value() handles IEC units correctly, dir_size_gb symlink-skip is a good improvement, dream-* filter is acceptable convention-based scoping.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressing review feedbackAll 3 bugs fixed: Bug 1 ( try:
pids = int(raw.get("pids", "0") or "0")
except (ValueError, TypeError):
pids = 0Docker returns Bug 2 (Missing broad exception handler) — Fixed: Bug 3 (No returncode check) — Fixed: |
Lightheartdevs
left a comment
There was a problem hiding this comment.
Re-audit: APPROVE — all 3 fixes verified
int("--")crash: try/except (ValueError, TypeError) with fallback to 0 ✅- Broad exception handler:
except Exception as excmatching existing endpoint pattern ✅ - Returncode logging: non-zero docker stats logged as warning ✅
CI all green. This is the foundational PR for the observability chain — merge first.
|
Note: The Rust dashboard-api rewrite (#821) merged and deleted the Python files this PR modifies. The Please rebase or rewrite against the current |
What
GET /v1/service/statsto the host agent returning per-container CPU, memory, and PID metricsdir_size_gb()from nested function inmain.pyto module-level inhelpers.py_compute_storage()to use the imported versionWhy
dir_size_gb()is needed by both the resources endpoint and data lifecycle features — extraction enables reuseHow
docker stats --no-stream --format '...'returns one JSON object per line. Parse CPU%, memory usage/limit/percent, PIDs. Filter todream-*containers only. Auth required via bearer token_parse_mem_value()handles Docker's IEC units (B, KiB, MiB, GiB, TiB) with longest-suffix-first matching to avoid falseendswith("B")matchesnot f.is_symlink()) to prevent double-counting and avoid following links outside DATA_DIRThree Pillars Impact
docker stats --formatavailable since Docker 1.13+. Python 3.10+removeprefixsafe. IEC units only (Docker standard)Modified Files
dream-server/bin/dream-host-agent.py—_parse_mem_value(),_iso_now(),_handle_service_stats(), route indo_GETdream-server/extensions/services/dashboard-api/helpers.py—dir_size_gb()module-level functiondream-server/extensions/services/dashboard-api/main.py— import change, nested function removedTesting
Automated
Manual
curl -H "Authorization: Bearer $KEY" http://localhost:$PORT/v1/service/stats— verify containers array with CPU/memory datamemory_used_mb > 0for running servicesdream-*containers in response/api/storageendpoint returns same values as before (regression test for_compute_storagerefactor)Review
_iso_now, returncode unguarded, PIDs int cast, no unit tests for_parse_mem_value)Platform Impact
Sequence
PR 3 of 7 (Phase 2 — shared infrastructure)