This document describes the monitoring dashboard and health check system for OutlookBookingSync.
The monitoring system provides comprehensive health checks, alerting, and a real-time dashboard to monitor the sync service operations.
- Quick Health Check:
/health- Basic connectivity test - Comprehensive Health:
/health/system- Full system status - Dashboard Data:
/health/dashboard- Aggregated monitoring data
- Alert Checks:
/alerts/check- Run health checks and trigger alerts - Alert History:
/alerts- View recent alerts - Alert Stats:
/alerts/stats- Alert statistics and summaries - Alert Management:
/alerts/{id}/acknowledge- Acknowledge alerts
- Web Dashboard:
/dashboard- Updated HTML monitoring interface with sync status - Auto-refresh: Updates every 30 seconds with real-time sync metrics
- Sync Management: Built-in controls for processing pending syncs and re-enabling failed events
- Error Analysis: Detailed retry analysis and cancellation statistics
- Sync Status Overview:
/health/sync-status- Comprehensive sync health monitoring - Sync Statistics:
/bridges/sync-stats- Detailed sync statistics - Cancelled Events:
/bridges/cancelled-events- Cancelled event tracking - Pending Events:
/bridges/{bridge}/pending-events- Pending sync operations
Stores system alerts and notifications:
CREATE TABLE outlook_sync_alerts (
id SERIAL PRIMARY KEY,
alert_type VARCHAR(100) NOT NULL,
severity VARCHAR(20) NOT NULL CHECK (severity IN ('info', 'warning', 'critical')),
message TEXT NOT NULL,
alert_data JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
acknowledged_at TIMESTAMP WITH TIME ZONE,
acknowledged_by VARCHAR(255)
);The system monitors:
- Connection response time
- Active queries count
- Total mappings
- Recent sync activity
- API credentials status
- Connectivity proxy
- Daemon running status
- Recent automated sync activity
- Job execution logs
- Memory usage
- Disk space
- CPU utilization
- Error rates
- Pending operations
- Recent sync statistics
- high_error_rate: >25% error rate (Critical)
- elevated_error_rate: >10% error rate (Warning)
- stalled_syncs: Operations pending >2 hours (Warning)
- no_cron_activity: No automated activity >30 minutes (Warning)
- slow_database: Response time >2s (Warning) or >5s (Critical)
- database_connectivity: Connection failures (Critical)
- high_pending_rate: >80% pending rate (Warning) / >95% (Critical)
- stuck_syncs: Operations pending >2 hours (Warning) / >6 hours (Critical)
- high_retry_rate: >50% events requiring retries (Warning)
- sync_stall: No sync activity >1 hour (Warning) / >3 hours (Critical)
- bridge_connectivity: Bridge communication failures (Critical)
- mapping_failures: Event mapping errors >10% (Warning) / >25% (Critical)
- cancellation_surge: Unusual cancellation patterns (Warning)
- Total mappings count
- Synced items count
- Pending operations
- Error count
- Database health with response times
- Cron job status with recent activity
- System resources (memory/disk usage)
- Outlook connectivity status
- Recent sync operations
- Error summaries
- Performance metrics
- Throughput statistics
The enhanced dashboard includes comprehensive sync status monitoring:
- Overall sync health status with color-coded indicators
- Error rate tracking with percentage breakdowns
- Pending rate monitoring for sync queue management
- Stuck sync detection for operations requiring intervention
- Per-bridge sync breakdowns showing individual bridge performance
- Retry analysis with average retry counts and patterns
- Cancellation tracking for deleted/cancelled events
- Last activity timestamps for each bridge pair
- Process Pending Syncs - Execute pending sync operations
- Re-enable Failed Events - Recover from sync failures
- View Cancelled Events - Display cancelled event details
- View Sync Statistics - Comprehensive sync metrics
- Sync throughput metrics - Events processed per time period
- Error trending - Historical error rate analysis
- Resource utilization - Bridge system performance metrics
# Optional: Alert webhook for notifications
ALERT_WEBHOOK_URL=https://your-webhook-endpoint.com/alerts
# Database configuration (required)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database
DB_USER=your_user
DB_PASS=your_passwordWhen configured, alerts are sent to the webhook URL:
{
"service": "OutlookBookingSync",
"alert_type": "high_error_rate",
"severity": "critical",
"urgency": "critical",
"message": "High error rate detected: 26.5%",
"timestamp": "2025-06-13 13:53:07",
"data": {
"error_rate": 26.5,
"error_count": 15,
"total_operations": 57
}
}# Create the alerts table
cat database/outlook_sync_alerts.sql | docker exec -i portico_outlook psql -h $DB_HOST -U $DB_USER -d $DB_NAMENavigate to: http://localhost:8082/dashboard
- Quick check:
curl http://localhost:8082/health - Full status:
curl http://localhost:8082/health/system
- Configure webhook URL in environment
- Run periodic alert checks:
curl -X POST http://localhost:8082/alerts/check
Add to existing cron jobs for automated monitoring:
# Check for alerts every 15 minutes
*/15 * * * * curl -s -X POST "http://localhost/alerts/check" > /dev/null 2>&1
# Clean up old alerts weekly
0 2 * * 0 curl -s -X DELETE "http://localhost/alerts/old?days=7" > /dev/null 2>&1Add to docker-compose.yml:
services:
portico_outlook:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s- Verify container is running:
docker ps -f name=portico_outlook - Check logs:
docker logs portico_outlook - Test health endpoint:
curl http://localhost:8082/health
- Verify table exists: Check
outlook_sync_alertstable - Check database connectivity in health status
- Review alert service logs
- Check recent activity in dashboard
- Review error summaries
- Investigate specific error messages
- Application Logs:
docker logs portico_outlook - Alert Logs: Stored in application logs with alert context
- Cron Logs: Container cron execution logs
- Dashboard auto-refresh: 30-second intervals
- Health checks: Lightweight database queries
- Alert checks: Run on-demand or via cron
- Alert table cleanup: Automatic via API endpoint
- Database indexing: Optimized for time-based queries
- Webhook timeouts: 10-second limit
- Dashboard: No built-in authentication (add reverse proxy)
- API endpoints: Protected by optional API key middleware
- Database: Uses application database credentials
- Alerts: Configurable retention (default 7 days)
- Health data: Real-time only, not stored
- Dashboard: No persistent storage
The monitoring system provides comprehensive tracking of the composite ID system and priority filtering operations.
The bridge system maintains detailed tracking of composite ID usage:
-- Bridge mappings with composite ID information
SELECT
source_bridge,
target_bridge,
COUNT(*) as total_mappings,
COUNT(CASE WHEN source_id LIKE '%\_[0-9]%' THEN 1 END) as composite_id_mappings,
COUNT(CASE WHEN source_id LIKE 'event\_%' THEN 1 END) as event_mappings,
COUNT(CASE WHEN source_id LIKE 'booking\_%' THEN 1 END) as booking_mappings,
COUNT(CASE WHEN source_id LIKE 'allocation\_%' THEN 1 END) as allocation_mappings
FROM bridge_mappings
GROUP BY source_bridge, target_bridge;
-- Priority filtering statistics
SELECT
DATE(created_at) as sync_date,
COUNT(*) as total_sync_operations,
COUNT(CASE WHEN operation_data->>'priority_filtered' = 'true' THEN 1 END) as priority_filtered_operations,
AVG((operation_data->>'conflicts_resolved')::int) as avg_conflicts_per_sync
FROM bridge_sync_logs
WHERE operation = 'sync'
AND created_at >= NOW() - INTERVAL '7 days'
GROUP BY DATE(created_at)
ORDER BY sync_date DESC;# Get composite ID system statistics
curl -X GET "http://your-bridge/health/composite-ids"
# Response includes detailed breakdown
{
"success": true,
"composite_id_stats": {
"total_mappings": 1245,
"composite_id_mappings": 1198,
"breakdown_by_type": {
"event": 789,
"booking": 312,
"allocation": 97,
"meeting": 43,
"appointment": 4
},
"health_status": "healthy",
"malformed_ids": 0,
"last_updated": "2025-06-18T14:30:00Z"
}
}
# Get priority filtering statistics
curl -X GET "http://your-bridge/health/priority-filtering"
# Response includes filtering effectiveness
{
"success": true,
"priority_filtering_stats": {
"total_sync_operations_24h": 48,
"operations_with_conflicts": 12,
"conflicts_resolved": 37,
"filtering_effectiveness": "92.5%",
"priority_breakdown": {
"priority_1_selected": 25,
"priority_2_selected": 8,
"priority_3_selected": 3
},
"most_common_conflicts": [
{
"conflict_type": "event_vs_booking",
"occurrences": 15,
"resolution": "event_selected"
},
{
"conflict_type": "booking_vs_allocation",
"occurrences": 8,
"resolution": "booking_selected"
}
]
}
}The monitoring system tracks priority filtering operations in real-time:
# Get current priority conflicts
curl -X GET "http://your-bridge/monitoring/priority-conflicts"
# Response shows active conflicts
{
"success": true,
"active_conflicts": [
{
"resource_id": "room_123",
"time_slot": "2025-06-18T14:00:00Z to 2025-06-18T15:00:00Z",
"conflicting_events": [
{
"composite_id": "event_78269",
"priority": 1,
"status": "selected_for_sync"
},
{
"composite_id": "booking_456",
"priority": 2,
"status": "filtered_out"
}
],
"resolution_time": "2025-06-18T13:45:22Z"
}
],
"conflict_summary": {
"total_conflicts_today": 5,
"resolved_conflicts": 5,
"pending_conflicts": 0
}
}
# Get priority filtering performance metrics
curl -X GET "http://your-bridge/monitoring/filtering-performance"
# Response includes performance data
{
"success": true,
"performance_metrics": {
"avg_filtering_time_ms": 12.3,
"max_filtering_time_ms": 45.6,
"filtering_operations_per_hour": 127,
"efficiency_rating": "excellent",
"resource_usage": {
"cpu_overhead": "0.2%",
"memory_overhead": "1.1MB"
}
}
}The monitoring system includes specialized alerts for composite ID and priority filtering issues:
- malformed_composite_ids: Detects invalid composite ID formats (Critical)
- composite_id_mapping_failures: ID resolution failures (Warning)
- orphaned_composite_mappings: Mappings without valid composite IDs (Warning)
- excessive_conflicts: >50% of sync operations have conflicts (Warning)
- priority_filtering_failures: Filter logic errors (Critical)
- unresolved_conflicts: Conflicts pending >1 hour (Warning)
# Trigger composite ID health check
curl -X POST "http://your-bridge/alerts/check-composite-ids"
# Trigger priority filtering health check
curl -X POST "http://your-bridge/alerts/check-priority-filtering"
# Response includes alert details
{
"success": true,
"alerts_generated": [
{
"alert_type": "excessive_conflicts",
"severity": "warning",
"message": "High conflict rate detected: 65% of sync operations had priority conflicts",
"data": {
"conflict_rate": 65.2,
"operations_checked": 46,
"conflicts_found": 30
}
}
]
}