Skip to content

feat(host-agent): add per-container stats endpoint and extract dir_size_gb#804

Merged
Lightheartdevs merged 2 commits intoLight-Heart-Labs:mainfrom
yasinBursali:feat/service-stats-agent
Apr 6, 2026
Merged

feat(host-agent): add per-container stats endpoint and extract dir_size_gb#804
Lightheartdevs merged 2 commits intoLight-Heart-Labs:mainfrom
yasinBursali:feat/service-stats-agent

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

  • Add GET /v1/service/stats to the host agent returning per-container CPU, memory, and PID metrics
  • Extract dir_size_gb() from nested function in main.py to module-level in helpers.py
  • Refactor _compute_storage() to use the imported version

Why

  • Dashboard needs per-service resource metrics for the upcoming resources page
  • dir_size_gb() is needed by both the resources endpoint and data lifecycle features — extraction enables reuse
  • The nested function prevented importing from other modules

How

  • Host agent: docker stats --no-stream --format '...' returns one JSON object per line. Parse CPU%, memory usage/limit/percent, PIDs. Filter to dream-* containers only. Auth required via bearer token
  • Memory parsing: _parse_mem_value() handles Docker's IEC units (B, KiB, MiB, GiB, TiB) with longest-suffix-first matching to avoid false endswith("B") matches
  • dir_size_gb extraction: Identical logic plus symlink-skip improvement (not f.is_symlink()) to prevent double-counting and avoid following links outside DATA_DIR

Three Pillars Impact

  • Install Reliability: No installer code touched — runtime API only
  • Broad Compatibility: docker stats --format available since Docker 1.13+. Python 3.10+ removeprefix safe. IEC units only (Docker standard)
  • Extension Coherence: No extension system changes

Modified Files

  • dream-server/bin/dream-host-agent.py_parse_mem_value(), _iso_now(), _handle_service_stats(), route in do_GET
  • dream-server/extensions/services/dashboard-api/helpers.pydir_size_gb() module-level function
  • dream-server/extensions/services/dashboard-api/main.py — import change, nested function removed

Testing

Automated

  • Python compile: PASS (all 3 files)
  • Existing tests: 173 pass, 1 pre-existing failure (Python 3.14 asyncio — not introduced by this PR)

Manual

  • curl -H "Authorization: Bearer $KEY" http://localhost:$PORT/v1/service/stats — verify containers array with CPU/memory data
  • Verify memory_used_mb > 0 for running services
  • Verify only dream-* containers in response
  • /api/storage endpoint returns same values as before (regression test for _compute_storage refactor)
  • No auth → 401

Review

  • CG: ⚠️ APPROVED WITH WARNINGS (4 non-blocking advisories: lazy import in _iso_now, returncode unguarded, PIDs int cast, no unit tests for _parse_mem_value)
  • Compatibility: PASS
  • Security: PASS (auth enforced, read-only, dream-* filter, no shell=True)

Platform Impact

  • macOS: Supported (Docker Desktop VM-scoped metrics)
  • Linux: Supported (native Docker, accurate metrics)
  • Windows (WSL2): No host agent — consumers degrade gracefully with empty stats

Sequence

PR 3 of 7 (Phase 2 — shared infrastructure)

…ze_gb

Add GET /v1/service/stats to the host agent returning CPU, memory, and
PID metrics for all dream-* containers via docker stats --no-stream.

Extract dir_size_gb() from nested function in main.py to module-level
in helpers.py for reuse by the upcoming resources endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit: APPROVE — solid foundational PR

Clean implementation. docker stats --no-stream with --format template avoids parsing ambiguity. _parse_mem_value() handles Docker IEC units correctly with longest-suffix-first matching. The dir_size_gb extraction to module-level improves reuse, and the new symlink-skip (not f.is_symlink()) is a good security improvement preventing symlink-based traversal.

Auth enforced, read-only, dream-* filter prevents exposing non-DreamServer containers. No shell=True.

Minor notes (non-blocking):

  • Lazy import of datetime inside _iso_now() is unconventional — consider module-level
  • removeprefix("dream-") requires Python 3.9+ (fine since container runs 3.11)
  • _parse_mem_value() returns 0.0 for unrecognized units silently — consider logging a warning

This is a dependency for #806, #808, #810, #812. Should merge first.

Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised Audit: REQUEST CHANGES — crash bug on restarting containers

Withdrawing my earlier approval after deeper review.

Bug 1 (MEDIUM — runtime crash): int("--") ValueError
Docker returns "--" for PIDs when a container is in Created/Restarting/Paused state. The code does int(raw.get("pids", "0") or "0") — the or "0" guard only handles empty string and None, not "--". This throws an unhandled ValueError, crashing the entire request.

Fix:

try:
    pids = int(raw.get("pids", "0") or "0")
except (ValueError, TypeError):
    pids = 0

Bug 2 (LOW-MEDIUM — traceback leakage): Missing broad exception handler
The existing _handle_logs() has except Exception as exc: but the new _handle_service_stats only catches TimeoutExpired. Any unexpected error (e.g., FileNotFoundError if Docker binary disappears, UnicodeDecodeError from malformed output) leaks a raw Python traceback in the HTTP response.

Fix: Add except Exception as exc: matching the existing endpoint pattern.

Bug 3 (LOW): No result.returncode check on docker stats
If Docker returns non-zero with diagnostic text on stdout, it could trigger json.JSONDecodeError in the loop — which IS caught, so this degrades gracefully. But logging the error would help debugging.

Everything else confirmed solid: auth enforced, no injection vectors, _parse_mem_value() handles IEC units correctly, dir_size_gb symlink-skip is a good improvement, dream-* filter is acceptable convention-based scoping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yasinBursali
Copy link
Copy Markdown
Contributor Author

Addressing review feedback

All 3 bugs fixed:

Bug 1 (int("--") ValueError) — Fixed:

try:
    pids = int(raw.get("pids", "0") or "0")
except (ValueError, TypeError):
    pids = 0

Docker returns "--" for PIDs when containers are in Created/Restarting/Paused state. Now handled gracefully.

Bug 2 (Missing broad exception handler) — Fixed:
Added except Exception as exc: after TimeoutExpired, matching the existing _handle_logs() pattern. Returns JSON 500 instead of uncontrolled HTML from BaseHTTPRequestHandler.

Bug 3 (No returncode check) — Fixed:
Added if result.returncode != 0: warning log after docker stats call. Logs stderr (truncated to 200 chars) but continues processing — docker stats may produce partial valid output even on non-zero exit.

Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-audit: APPROVE — all 3 fixes verified

  1. int("--") crash: try/except (ValueError, TypeError) with fallback to 0 ✅
  2. Broad exception handler: except Exception as exc matching existing endpoint pattern ✅
  3. Returncode logging: non-zero docker stats logged as warning ✅

CI all green. This is the foundational PR for the observability chain — merge first.

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Note: The Rust dashboard-api rewrite (#821) merged and deleted the Python files this PR modifies.

The dream-host-agent.py changes in this PR are still valid, but the helpers.py and main.py changes target deleted files. This PR needs a rebase that drops the Python dashboard-api changes and keeps only the host-agent changes.

Please rebase or rewrite against the current main branch.

@Lightheartdevs Lightheartdevs merged commit 484cd19 into Light-Heart-Labs:main Apr 6, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants