Skip to content

WebSocket Connection Health Monitoring & Auto-Reconnection #38

@bchou9

Description

@bchou9

Feature Description

Implement robust WebSocket connection health monitoring, automatic reconnection, and graceful degradation to ensure reliable real-time collaboration even with unstable network conditions.

Current Issues

The Socket.IO implementation (frontend/src/services/socket.js) has basic connectivity but lacks:

  • Connection health monitoring
  • Heartbeat/ping mechanism visibility
  • Automatic retry with exponential backoff
  • Queue for missed events during disconnect
  • User notification of connection status
  • Graceful degradation to polling
  • Connection quality metrics

Problem Scenarios

  1. Network Interruption: User's WiFi drops → no indication → strokes lost
  2. Server Restart: Backend restarts → users don't auto-reconnect cleanly
  3. Slow Network: High latency → unclear if connected or stalled
  4. Mobile Switch: 4G ↔ WiFi transition → connection drops silently
  5. Background Tab: Browser throttles WebSocket → delayed updates

Proposed Solution

1. Connection Status Indicator

Visual feedback for users:

<ConnectionStatus>
  {status === 'connected' && <Icon color="green">✓ Connected</Icon>}
  {status === 'connecting' && <Icon color="yellow">↻ Connecting...</Icon>}
  {status === 'disconnected' && <Icon color="red">✗ Disconnected</Icon>}
  {status === 'degraded' && <Icon color="orange">⚠ Slow Connection</Icon>}
</ConnectionStatus>

Location: Top-right corner of canvas (non-intrusive)

2. Enhanced Socket Service

Frontend Implementation:

// frontend/src/services/socket.js
class EnhancedSocketService {
  constructor() {
    this.socket = null;
    this.status = 'disconnected';
    this.reconnectAttempts = 0;
    this.maxReconnectAttempts = 10;
    this.reconnectDelay = 1000; // Start at 1s
    this.maxReconnectDelay = 30000; // Max 30s
    this.eventQueue = [];
    this.healthCheckInterval = null;
    this.lastPongTime = null;
    this.statusListeners = [];
  }

  connect(token) {
    this.socket = io(SOCKET_URL, {
      auth: { token },
      reconnection: false, // Handle manually
      timeout: 10000,
      transports: ['websocket', 'polling'] // Try WebSocket first
    });

    this.setupEventHandlers();
    this.startHealthCheck();
  }

  setupEventHandlers() {
    this.socket.on('connect', () => {
      this.updateStatus('connected');
      this.reconnectAttempts = 0;
      this.reconnectDelay = 1000;
      this.flushEventQueue();
    });

    this.socket.on('disconnect', (reason) => {
      this.updateStatus('disconnected');
      this.handleDisconnect(reason);
    });

    this.socket.on('pong', (latency) => {
      this.lastPongTime = Date.now();
      this.updateConnectionQuality(latency);
    });

    this.socket.on('reconnect_failed', () => {
      this.updateStatus('failed');
      this.notifyUser('Connection failed. Please refresh the page.');
    });
  }

  startHealthCheck() {
    this.healthCheckInterval = setInterval(() => {
        this.updateStatus('disconnected');
        return;
      }

      // Send ping and expect pong within 5s
      this.socket.emit('ping');
      
      setTimeout(() => {
        const timeSinceLastPong = Date.now() - this.lastPongTime;
        if (timeSinceLastPong > 5000) {
          this.updateStatus('degraded');
        }
      }, 5000);
    }, 15000); // Check every 15s
  }

  handleDisconnect(reason) {
    clearInterval(this.healthCheckInterval);

    // Auto-reconnect for non-intentional disconnects
    if (reason !== 'io client disconnect') {
      this.attemptReconnect();
    }
  }

  attemptReconnect() {
    if (this.reconnectAttempts >= this.maxReconnectAttempts) {
      this.updateStatus('failed');
      this.notifyUser('Connection lost. Please refresh to reconnect.');
      return;
    }

    this.reconnectAttempts++;
    this.updateStatus('connecting');

    const delay = Math.min(
      this.reconnectDelay * Math.pow(2, this.reconnectAttempts - 1),
      this.maxReconnectDelay
    );

    setTimeout(() => {
      console.log();
      this.socket.connect();
    }, delay);
  }

  updateConnectionQuality(latency) {
    // Classify connection quality
    if (latency < 100) {
      // Excellent
    } else if (latency < 300) {
      // Good
    } else if (latency < 1000) {
      this.updateStatus('degraded');
    } else {
      this.updateStatus('poor');
    }
  }

  emit(event, data) {
    if (this.socket?.connected) {
      this.socket.emit(event, data);
    } else {
      // Queue event for later
      this.eventQueue.push({ event, data, timestamp: Date.now() });
      console.warn();
    }
  }

  flushEventQueue() {
    console.log();
    while (this.eventQueue.length > 0) {
      const { event, data } = this.eventQueue.shift();
      this.socket.emit(event, data);
    }
  }

  updateStatus(newStatus) {
    this.status = newStatus;
    this.statusListeners.forEach(listener => listener(newStatus));
  }

  onStatusChange(callback) {
    this.statusListeners.push(callback);
  }
}

3. Backend Health Monitoring

Add Ping/Pong Handlers:

# backend/routes/socketio_handlers.py
@socketio.on('ping')
def handle_ping():
    start_time = request.args.get('timestamp', type=int)
    latency = int((time.time() * 1000)) - start_time if start_time else 0
    emit('pong', latency)

@socketio.on('health_check')
def handle_health_check():
    emit('health_response', {
        'server_time': int(time.time() * 1000),
        'active_connections': len(socketio.server.manager.rooms.get('/', {}).keys()),
        'status': 'healthy'
    })

4. Connection Quality UI

Status Bar Component:

// frontend/src/components/ConnectionStatus.jsx
import { Chip, Tooltip } from '@mui/material';
import WifiIcon from '@mui/icons-material/Wifi';
import WifiOffIcon from '@mui/icons-material/WifiOff';
import SyncIcon from '@mui/icons-material/Sync';

export function ConnectionStatus() {
  const { status, latency, lastUpdate } = useSocket();

  const getStatusConfig = () => {
    switch (status) {
      case 'connected':
        return { icon: <WifiIcon />, color: 'success', label: 'Connected' };
      case 'connecting':
        return { icon: <SyncIcon className="rotating" />, color: 'warning', label: 'Connecting...' };
      case 'degraded':
        return { icon: <WifiIcon />, color: 'warning', label: 'Slow Connection' };
      case 'disconnected':
        return { icon: <WifiOffIcon />, color: 'error', label: 'Disconnected' };
      default:
        return { icon: <WifiOffIcon />, color: 'default', label: 'Unknown' };
    }
  };

  const config = getStatusConfig();

  return (
    <Tooltip title={}>
      <Chip
        icon={config.icon}
        label={config.label}
        color={config.color}
        size="small"
        sx={{ position: 'fixed', top: 16, right: 16, zIndex: 1000 }}
      />
    </Tooltip>
  );
}

5. Reconnection Toast Notifications

// In Canvas.js
useEffect(() => {
  socket.onStatusChange((status) => {
    if (status === 'connected') {
      showToast('Connected to server', 'success');
    } else if (status === 'disconnected') {
      showToast('Connection lost. Attempting to reconnect...', 'warning');
    } else if (status === 'failed') {
      showToast('Connection failed. Please refresh the page.', 'error');
    }
  });
}, []);

Event Queue Management

Priority Queue for Critical Events:

class EventQueue {
  constructor() {
    this.queue = [];
    this.maxAge = 60000; // 1 minute
    this.maxSize = 100;
  }

  add(event, data, priority = 'normal') {
    // Don't queue if full
    if (this.queue.length >= this.maxSize) {
      console.warn('Event queue full, dropping oldest event');
      this.queue.shift();
    }

    this.queue.push({
      event,
      data,
      priority,
      timestamp: Date.now()
    });
  }

  flush(socket) {
    // Remove stale events
    const now = Date.now();
    this.queue = this.queue.filter(item => 
      (now - item.timestamp) < this.maxAge
    );

    // Sort by priority (high first)
    this.queue.sort((a, b) => {
      const priorityOrder = { high: 0, normal: 1, low: 2 };
      return priorityOrder[a.priority] - priorityOrder[b.priority];
    });

    // Emit all queued events
    this.queue.forEach(({ event, data }) => {
      socket.emit(event, data);
    });

    this.queue = [];
  }
}

Metrics & Monitoring

Track Connection Metrics:

class ConnectionMetrics {
  constructor() {
    this.metrics = {
      totalConnects: 0,
      totalDisconnects: 0,
      averageLatency: 0,
      latencyHistory: [],
      connectionUptime: 0,
      lastConnectTime: null,
      reconnectAttempts: 0
    };
  }

  recordConnect() {
    this.metrics.totalConnects++;
    this.metrics.lastConnectTime = Date.now();
  }

  recordLatency(latency) {
    this.metrics.latencyHistory.push(latency);
    if (this.metrics.latencyHistory.length > 20) {
      this.metrics.latencyHistory.shift();
    }
    this.metrics.averageLatency = 
      this.metrics.latencyHistory.reduce((a, b) => a + b, 0) / 
      this.metrics.latencyHistory.length;
  }

  getReport() {
    return {
      ...this.metrics,
      uptime: Date.now() - this.metrics.lastConnectTime
    };
  }
}

Files to Create/Modify

Frontend:

  • frontend/src/services/socket.js - Enhanced connection logic
  • frontend/src/components/ConnectionStatus.jsx ⭐ (NEW)
  • frontend/src/hooks/useSocket.js ⭐ (NEW)
  • frontend/src/components/Canvas.js - Integrate status indicator
  • frontend/src/utils/eventQueue.js ⭐ (NEW)

Backend:

  • backend/routes/socketio_handlers.py - Add ping/pong handlers
  • backend/services/socketio_service.py - Connection tracking

Benefits

  • Reliability: Users stay connected even with network issues
  • Transparency: Clear visibility into connection status
  • Resilience: Graceful handling of disconnections
  • User Experience: No lost strokes during brief outages
  • Debugging: Connection metrics help diagnose issues
  • Mobile-Friendly: Better handling of network transitions

Testing Requirements

  • Unit tests for reconnection logic
  • Integration tests for event queueing
  • E2E tests simulating network failures
  • Performance tests for high-latency scenarios
  • Mobile testing for network transitions

Configuration Options

// config/socket.config.js
export const SOCKET_CONFIG = {
  reconnectAttempts: 10,
  reconnectDelayMin: 1000,
  reconnectDelayMax: 30000,
  healthCheckInterval: 15000,
  pongTimeout: 5000,
  eventQueueMax: 100,
  eventMaxAge: 60000,
  transports: ['websocket', 'polling']
};

Future Enhancements

  • Adaptive quality (reduce stroke resolution on slow connections)
  • Network type detection (WiFi vs. 4G)
  • Bandwidth usage monitoring
  • Connection analytics dashboard
  • Predictive reconnection (detect issues before disconnect)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions