fix(extensions): health-based status for user extensions + restart for stopped services#825
Conversation
Lightheartdevs
left a comment
There was a problem hiding this comment.
Audit: REQUEST CHANGES — CI failing, good PR otherwise
This fixes a real UX bug — crashed/stopped extensions showing false "enabled" (green) status. The health-based status approach is correct and the security model is sound.
Security verified:
- SSRF mitigated: health-check hosts derived from directory names, never from manifest input ✓
- Health paths validated with regex + reject list (no
..,@,?,#, protocol prefixes) ✓ - Symlink checks on stopped-enable path under file lock ✓
- Compose content re-scanned before restart ✓
- Auth enforced on all endpoints ✓
- No shell injection (agent call via HTTP + JSON, agent uses list args) ✓
- YAML safe_load only ✓
2 CI failures need fixing:
-
Ruff lint: Unused
import osandMagicMockintest_extensions.py— trivial, just remove them. -
2 cache tests failing:
TestCaching::test_cache_returns_same_resultandTestCaching::test_reset_cacheboth getAssertionError: assert 'my-ext' in {}. The scan function works correctly in other tests (TestScanUserExtensionsall pass), so this appears to be a cache implementation or test isolation bug. The first call toget_user_services_cached()returns empty when it should find the extension.
Minor notes (non-blocking):
- The user extension health-check block is duplicated identically in
extensions_catalog()andextension_detail()— consider extracting to a helper - If a health check times out, the extension shows "stopped" even if the container is running — acceptable trade-off vs the old false "enabled", but worth documenting
- The stopped-enable path calls
_call_agentoutside the lock — matches existing pattern, intentional
…r stopped services User extensions now get real health checking instead of file-based status. "stopped" status (red badge) replaces the misleading "enabled" for crashed containers. Start button allows restarting stopped extensions without the disable-enable workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused os and MagicMock imports (ruff F401)
- Use float('-inf') as cache timestamp sentinel instead of 0.0 to
prevent false cache hits on fresh CI VMs where time.monotonic()
may be smaller than the TTL value
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d9d96ec to
fc6806e
Compare
|
Both CI failures fixed:
Rebased on latest main, all checks green. |
|
Merge conflict with main — When rebasing, the key mapping is:
For any NEW UI elements you're adding (stopped badge, start button), please use theme variables instead of hardcoded colors. Semantic status colors (green/red/orange/amber) stay as-is. CI was all green before the conflict — once rebased this should be ready to merge. |
What
Fix user extension status reporting and add restart capability for stopped extensions.
Why
User extensions compute status from file existence (
compose.yamlpresent = "enabled"), not container health. A crashed container still shows "enabled" with a green badge — misleading users into thinking the service is running. This erodes trust in the Extensions portal.How
user_extensions.pyscanner module with 30s TTL cachecompose.yamlexists but the container is unhealthy/down, status is "stopped" (red badge) instead of the previous false "enabled"enable_extension()now handles two cases — disabled (rename + start) and stopped (scan + start) — both under the extensions lock with symlink guardsThree Pillars Impact
dream enable/dream disableunchanged, mod pattern preserved,resolve-compose-stack.shunaffectedSecurity
service_id(Docker DNS), NEVER from manifestdefault_host/host_env. Regex-validated[a-z0-9][a-z0-9_-]*^/[A-Za-z0-9/_\-.]*$+ explicit reject of..,@,?,#, scheme prefixesos.lstat+S_ISLNKyaml.safe_load()everywhere — no unsafeyaml.load()try/except (TypeError, ValueError)prevents one broken manifest from crashing catalog for all users_extensions_lock()around symlink check + compose scan. Agent call outside lock (matches codebase pattern)_scan_compose_content()runs on stopped-enable path (not just disabled-enable), preventing bypass of security scannerTest Coverage
New:
test_user_extensions.py— 14 testsNew/Updated:
test_extensions.py— 9 new/updated testsFrontend
vite build: cleaneslint: 0 errors (5 pre-existing warnings in unrelated files)Manual test checklist (all platforms)
New Files
extensions/services/dashboard-api/user_extensions.pyextensions/services/dashboard-api/tests/test_user_extensions.pyModified Files
extensions/services/dashboard-api/routers/extensions.py_compute_extension_status()health-based for user extensions; catalog/detail include user ext health viaasyncio.gather;enable_extension()handles stopped case with lock + symlink guard + compose scanextensions/services/dashboard-api/tests/test_extensions.pyextensions/services/dashboard/src/pages/Extensions.jsxPlatform Impact
os.renameatomic on ext4Sequence
PR 1 of 3 — Extension Lifecycle States
🤖 Generated with Claude Code