-
Notifications
You must be signed in to change notification settings - Fork 0
Operations Health Monitoring
ReadyStackGo provides comprehensive health monitoring for all deployed stacks, giving you real-time visibility into the status of your containers, services, and infrastructure.
The Health Monitoring system provides:
- Real-time status updates via SignalR
- Container health tracking (running, stopped, restarting)
- Service health checks via HTTP endpoints
- Aggregated health views at environment and organization levels
- Health history for trend analysis
Each stack reports one of four health statuses:
| Status | Icon | Description |
|---|---|---|
| Healthy | 🟢 | All services running normally |
| Degraded | 🟡 | Some services experiencing issues |
| Unhealthy | 🔴 | Critical services down or failing |
| Unknown | ⚪ | Unable to determine status |
The main dashboard shows a compact summary of all deployments with their current health:
┌─────────────────────────────────────────┐
│ Health Overview │
├─────────────────────────────────────────┤
│ 🟢 Production │ 3/3 services healthy │
│ 🟡 Staging │ 2/3 services healthy │
│ 🔴 Development │ 1/3 services healthy │
└─────────────────────────────────────────┘
For a more comprehensive view, navigate to /health to access the dedicated Health Dashboard. This full-screen view provides:
At the top, four cards show the count of stacks by status:
- Healthy - All services running normally
- Degraded - Some services experiencing issues
- Unhealthy - Critical services down or failing
- Total - Total number of monitored stacks
- Status Filter - Show only stacks with a specific status (All/Healthy/Degraded/Unhealthy)
- Search - Find stacks by name or version
Each stack is displayed as a card showing:
- Stack name and version
- Last health check timestamp
- Service count (healthy/total)
- Operation mode badge (if not Normal)
- Overall health status badge
Click on a card to expand it and see:
- Individual service status with container details
- Restart counts per service
- Link to the full deployment detail page
The Health Dashboard receives live updates via SignalR:
- Status changes appear immediately without refresh
- A "Live" indicator shows the connection status
- Automatic reconnection if connection is lost
The deployment detail page (/deployments/{stackName}) shows:
- Overall stack health status badge
- Operation mode (Normal, Maintenance, Migrating, etc.)
- Health summary card with service counts
- Health History Chart - Visual timeline of health status over time
- Individual service status with container details
- Restart counts per service
The deployment detail page includes a chart showing health trends:
- Displays the last 100 health check results
- X-axis shows time, Y-axis shows health percentage (0-100%)
- Color-coded based on current status (green/yellow/red)
- Hover over data points to see exact values
- Helps identify patterns and recurring issues
Init containers (rsgo.lifecycle=init) are automatically excluded from health monitoring. They are run-once containers (e.g., database migrators) that exit after completion and are cleaned up automatically. Only regular service containers (rsgo.lifecycle=service) appear in health snapshots and the Health Dashboard.
ReadyStackGo monitors container status via the Docker API:
- Container state (running, stopped, restarting, exited)
- Restart count (only fetched for unhealthy containers to minimize API calls)
- Exit codes for stopped containers
For services exposing health endpoints, configure checks in the manifest:
services:
api:
image: myapp/api:latest
health:
type: http
url: http://api:8080/health
interval: 30s
timeout: 5s
retries: 3| Parameter | Default | Description |
|---|---|---|
type |
- | Check type: http, tcp, command
|
url |
- | URL for HTTP checks |
interval |
30s |
Time between checks |
timeout |
5s |
Maximum wait time |
retries |
3 |
Failures before unhealthy |
Health status updates are pushed to the UI via SignalR:
- Immediate notification when services start/stop
- Operation mode changes reflected instantly
- No manual refresh required
ReadyStackGo stores health snapshots for trend analysis:
- View health over time (last hour, day, week)
- Identify patterns (e.g., nightly restarts)
- Correlate issues with deployments
GET /api/deployments/{deploymentId}/healthResponse:
{
"deploymentId": "abc123",
"stackName": "my-app",
"overallStatus": "Healthy",
"operationMode": "Normal",
"services": [
{
"name": "api",
"status": "Healthy",
"containerId": "a1b2c3",
"restartCount": 0
}
],
"capturedAtUtc": "2024-01-15T10:30:00Z"
}GET /api/environments/{environmentId}/health-summaryReturns aggregated health for all stacks in an environment.
GET /api/deployments/{deploymentId}/health/history?hours=24Returns health snapshots for the specified time period.
- Configure health checks for all production services
- Set appropriate intervals - too frequent checks add load
- Use meaningful endpoints - health endpoints should test real dependencies
- Monitor restart counts - frequent restarts indicate problems
- Review health history before deployments
- Verify the environment is connected
- Check if containers exist
- Ensure Docker API is accessible
- Verify the health endpoint URL is correct
- Check network connectivity between containers
- Review container logs for errors
- Check container logs for crash causes
- Verify resource limits (memory, CPU)
- Review dependency availability (databases, APIs)
- Operation Mode - Managing maintenance and migration modes
- Troubleshooting - General troubleshooting guide
- API Reference - Full API documentation
Getting Started
Architecture
Configuration
Security
Setup Wizard
Development
Operations
CI/CD
Reference
- Roadmap
- API Reference
- Configuration Reference
- Manifest Schema
- Multi-Environment
- Stack Sources
- Plugin System
- Technical Specification
- Full Specification
Specifications
Release Notes