Skip to content

feat(dashboard-api): add per-service resource metrics endpoint#810

Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:feat/service-resources-dashboard
Apr 6, 2026
Merged

feat(dashboard-api): add per-service resource metrics endpoint#810
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:feat/service-resources-dashboard

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

  • Add GET /api/services/resources returning per-service CPU, RAM, and disk metrics
  • Create new routers/resources.py with parallel data fetching
  • Register resources router in main.py

Why

  • Dashboard needs per-service resource breakdown for monitoring and settings pages
  • Container stats (CPU/RAM) available from host agent but no dashboard-api consumer exists
  • Disk usage per service helps users understand storage consumption

How

  • Parallel fetch: asyncio.to_thread for both host agent HTTP call (container stats) and local disk scan simultaneously
  • Split cache TTLs: Container stats cached 20s (live data), disk scan cached 60s (slow I/O)
  • Container name mapping: Builds reverse map from SERVICES dict container_nameservice_id, correctly handling mismatched names (dream-webui → open-webui)
  • Disk mapping: _DATA_DIR_MAP maps data directory names to service IDs (e.g., modelsllama-server)
  • Graceful degradation: When host agent unavailable, returns disk-only data (container stats null)

Three Pillars Impact

  • Install Reliability: No installer changes
  • Broad Compatibility: asyncio.to_thread (Python 3.9+, container is 3.11). Negligible resource usage
  • Extension Coherence: Reports orphaned data directories not belonging to known services

New Files

  • dashboard-api/routers/resources.py (137 lines)

Modified Files

  • dashboard-api/main.py — router import + registration (2 lines)

Testing

Automated

  • Python compile: PASS
  • Existing tests: 173 pass

Manual

  • GET /api/services/resources returns services with CPU/RAM/disk data
  • Host agent down → returns disk data only, container fields null
  • Verify cache: second request within 20s returns cached container stats
  • No auth → 401

Review

  • CG: ⚠️ APPROVED WITH WARNINGS (recommend adding test file — non-blocking)
  • Security: PASS (auth enforced, read-only, no file mutations)

Platform Impact

  • All platforms supported. WSL2 returns disk-only data (no host agent)

Sequence

PR 6 of 7 (Phase 3). Depends on PR 3 (#804 — host agent stats + dir_size_gb) and PR 5 (#808 — container_name in SERVICES)

Add GET /api/services/resources returning per-service CPU, RAM, and
disk metrics. Container stats fetched from host agent with 20s cache,
disk usage scanned locally with 60s cache. Includes container name
reverse mapping and Docker Desktop memory caveat flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit: APPROVE

Good architecture — parallel fetch with asyncio.to_thread and split cache TTLs (20s for container stats, 60s for disk) is well-designed. The container name reverse-mapping handles naming mismatches correctly.

Auth enforced, read-only. Clean.

Minor notes (non-blocking):

  • _DATA_DIR_MAP is hardcoded — new services require manual updates. Consider deriving from the SERVICES dict or manifests.
  • 10s timeout on host agent fetch could cause the entire endpoint to hang when the agent is down. The 20s cache mitigates for subsequent requests.

Depends on #804. Should merge after it.

Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised Audit: REQUEST CHANGES — fatal import crash

Withdrawing my earlier approval after deeper review.

BLOCKER: from helpers import dir_size_gb crashes without PR #804
dir_size_gb does not exist in helpers.py on main — it's a nested function inside main.py:_compute_storage(). PR #804 extracts it. If #810 merges first, the module-level import fails and the entire dashboard-api refuses to start — all endpoints go down, not just this one.

Soft dependency on #808: Without it, SERVICES[sid] lacks container_name. The fallback f"dream-{sid}" works for most services but breaks for open-webui (container is dream-webui, fallback generates dream-open-webui) and langfuse (dream-langfuse-web vs dream-langfuse). These services show null container stats even when running.

Other findings (non-blocking):

  • _DATA_DIR_MAP is hardcoded and incomplete (~10 services missing) — unmapped services fall back to dir name convention
  • Cache thundering herd on simultaneous requests with expired TTL — two fetches dispatched, second write wins. Acceptable for dashboard.
  • opencode has container_name: "" creating an empty-string key in the reverse map (cosmetic)

What's good: Auth enforced, no SSRF (URL from server config), no path traversal (child.name only), no info leaks (relative paths), bounded resource usage (always 2 threads max regardless of service count), cache design with split TTLs is well-thought-out.

Required merge order: #804#808#810

@yasinBursali
Copy link
Copy Markdown
Contributor Author

Addressing review feedback

Re: from helpers import dir_size_gb BLOCKER — Acknowledged. This PR requires #804 to merge first. The function is extracted to module-level in #804.

Re: container_name soft dependency — Acknowledged. Without #808, open-webui and langfuse show null container stats due to non-standard naming. #808 adds container_name to the SERVICES dict which fixes this.

Required merge order: #804#808#810

No code changes needed in this PR — all issues are addressed by the dependency chain.

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Note: The Rust dashboard-api rewrite (#821) merged and deleted the Python files this PR modifies.

This PR added a Python endpoint and router that no longer exist. The per-service resource metrics feature needs to be reimplemented as a Rust endpoint in the dashboard-api crate.

Please rebase or rewrite against the current main branch.

@Lightheartdevs Lightheartdevs merged commit 71ded15 into Light-Heart-Labs:main Apr 6, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants