|
| 1 | +# Production Reliability and Performance Guide |
| 2 | + |
| 3 | +This document outlines the reliability, performance, and security considerations for running the Hooks webhook server framework in production environments. |
| 4 | + |
| 5 | +## 🔍 Security Considerations |
| 6 | + |
| 7 | +### Dynamic Plugin Loading Security |
| 8 | + |
| 9 | +The framework includes comprehensive security measures for dynamic plugin loading: |
| 10 | + |
| 11 | +- **Class Name Validation**: All plugin class names are validated against safe patterns (`/\A[A-Z][a-zA-Z0-9_]*\z/`) |
| 12 | +- **Dangerous Class Blacklist**: System classes like `File`, `Dir`, `Kernel`, `Object`, `Process`, etc. are blocked from being loaded as plugins |
| 13 | +- **Path Traversal Protection**: Plugin file paths are normalized and validated to prevent loading files outside designated directories |
| 14 | +- **Safe Constant Resolution**: Uses `Object.const_get` only after thorough validation |
| 15 | + |
| 16 | +### Request Processing Security |
| 17 | + |
| 18 | +- **Request Size Limits**: Configurable request body size limits (default enforcement via `request_limit` config) |
| 19 | +- **JSON Parsing Protection**: JSON parsing includes security limits to prevent JSON bombs: |
| 20 | + - Maximum nesting depth (configurable via `JSON_MAX_NESTING`, default: 20) |
| 21 | + - Maximum payload size before parsing (configurable via `JSON_MAX_SIZE`, default: 10MB) |
| 22 | + - Disabled object creation from JSON (`create_additions: false`) |
| 23 | + - Uses plain Hash/Array classes to prevent object injection |
| 24 | +- **Header Validation**: Multiple header format handling with safe fallbacks and optimized lookup order |
| 25 | + |
| 26 | +## ⚡ Performance Optimizations |
| 27 | + |
| 28 | +### Startup Performance |
| 29 | + |
| 30 | +The framework uses several strategies to optimize startup time: |
| 31 | + |
| 32 | +- **Explicit Module Loading**: Core modules are loaded explicitly rather than using `Dir.glob` patterns for better performance and security |
| 33 | +- **Boot-time Plugin Loading**: All plugins are loaded once at startup rather than per-request |
| 34 | +- **Plugin Caching**: Loaded plugins are cached in class-level registries for fast access |
| 35 | +- **Sorted Directory Loading**: Plugin directories are processed in sorted order for consistent behavior |
| 36 | + |
| 37 | +### Runtime Performance |
| 38 | + |
| 39 | +- **Per-request Optimizations**: |
| 40 | + - Plugin instances are reused across requests |
| 41 | + - Request contexts use thread-local storage for efficient access |
| 42 | + - Handler instances are created per-request but classes are cached |
| 43 | + - Optimized header processing with common cases checked first |
| 44 | + |
| 45 | +- **Memory Management**: |
| 46 | + - Plugin registries use hash-based lookups for O(1) access |
| 47 | + - Thread-local contexts are properly cleaned up after requests |
| 48 | + - Clear plugin loading separates concerns efficiently |
| 49 | + |
| 50 | +- **Security Limits**: |
| 51 | + - Retry configuration includes bounds checking to prevent resource exhaustion |
| 52 | + - JSON parsing has built-in limits to prevent JSON bombs and memory attacks |
| 53 | + |
| 54 | +### Recommended Production Configuration |
| 55 | + |
| 56 | +```yaml |
| 57 | +# Example production configuration |
| 58 | +log_level: "info" # Reduces debug overhead |
| 59 | +request_limit: 1048576 # 1MB limit (adjust based on needs) |
| 60 | +request_timeout: 30 # 30 second timeout |
| 61 | +environment: "production" # Disables debug features like backtraces |
| 62 | +normalize_headers: true # Consistent header processing |
| 63 | +symbolize_payload: false # Reduced memory usage for large payloads |
| 64 | +``` |
| 65 | +
|
| 66 | +### Security Environment Variables |
| 67 | +
|
| 68 | +Additional security can be configured via environment variables: |
| 69 | +
|
| 70 | +```bash |
| 71 | +# JSON Security Limits |
| 72 | +JSON_MAX_NESTING=20 # Maximum JSON nesting depth (default: 20) |
| 73 | +JSON_MAX_SIZE=10485760 # Maximum JSON size before parsing (default: 10MB) |
| 74 | + |
| 75 | +# Retry Safety Limits |
| 76 | +DEFAULT_RETRY_SLEEP=1 # Sleep between retries 0-300 seconds (default: 1) |
| 77 | +DEFAULT_RETRY_TRIES=10 # Number of retry attempts 1-50 (default: 10) |
| 78 | +RETRY_LOG_RETRIES=false # Disable retry logging in production (default: true) |
| 79 | +``` |
| 80 | + |
| 81 | +## 🔧 Monitoring and Observability |
| 82 | + |
| 83 | +### Health Check Endpoint |
| 84 | + |
| 85 | +The built-in health endpoint (`/health`) provides comprehensive status information: |
| 86 | + |
| 87 | +```json |
| 88 | +{ |
| 89 | + "status": "healthy", |
| 90 | + "timestamp": "2025-01-01T12:00:00Z", |
| 91 | + "version": "1.0.0", |
| 92 | + "uptime_seconds": 3600, |
| 93 | + "config_checksum": "abc123", |
| 94 | + "endpoints_loaded": 5, |
| 95 | + "plugins_loaded": 3 |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +### Lifecycle Hooks for Monitoring |
| 100 | + |
| 101 | +Use lifecycle plugins to add comprehensive monitoring: |
| 102 | + |
| 103 | +- **Request Metrics**: Track request counts, timing, and error rates |
| 104 | +- **Error Reporting**: Capture and report exceptions with full context |
| 105 | +- **Resource Monitoring**: Track memory usage, plugin load times, etc. |
| 106 | + |
| 107 | +### Recommended Instrumentation |
| 108 | + |
| 109 | +```ruby |
| 110 | +# Example monitoring lifecycle plugin |
| 111 | +class MonitoringLifecycle < Hooks::Plugins::Lifecycle |
| 112 | + def on_request(env) |
| 113 | + stats.increment("webhook.requests", { |
| 114 | + handler: env["hooks.handler"], |
| 115 | + endpoint: env["PATH_INFO"] |
| 116 | + }) |
| 117 | + end |
| 118 | + |
| 119 | + def on_response(env, response) |
| 120 | + processing_time = Time.now - Time.parse(env["hooks.start_time"]) |
| 121 | + stats.timing("webhook.processing_time", processing_time * 1000, { |
| 122 | + handler: env["hooks.handler"] |
| 123 | + }) |
| 124 | + end |
| 125 | + |
| 126 | + def on_error(exception, env) |
| 127 | + stats.increment("webhook.errors", { |
| 128 | + error_type: exception.class.name, |
| 129 | + handler: env["hooks.handler"] |
| 130 | + }) |
| 131 | + |
| 132 | + failbot.report(exception, { |
| 133 | + request_id: env["hooks.request_id"], |
| 134 | + handler: env["hooks.handler"], |
| 135 | + endpoint: env["PATH_INFO"] |
| 136 | + }) |
| 137 | + end |
| 138 | +end |
| 139 | +``` |
| 140 | + |
| 141 | +## 🚀 Production Deployment Best Practices |
| 142 | + |
| 143 | +### Server Configuration |
| 144 | + |
| 145 | +1. **Use Puma in Cluster Mode** for production: |
| 146 | +```ruby |
| 147 | +# config/puma.rb |
| 148 | +workers ENV.fetch("WEB_CONCURRENCY", 2) |
| 149 | +threads_count = ENV.fetch("MAX_THREADS", 5) |
| 150 | +threads threads_count, threads_count |
| 151 | +preload_app! |
| 152 | +``` |
| 153 | + |
| 154 | +2. **Configure Resource Limits**: |
| 155 | + - Set appropriate worker memory limits |
| 156 | + - Configure worker restart thresholds |
| 157 | + - Set connection pool sizes appropriately |
| 158 | + |
| 159 | +3. **Environment Variables**: |
| 160 | +```bash |
| 161 | +# Retry configuration |
| 162 | +DEFAULT_RETRY_TRIES=3 # Reduced from default 10 |
| 163 | +DEFAULT_RETRY_SLEEP=1 # 1 second between retries |
| 164 | +RETRY_LOG_RETRIES=false # Reduce log noise in production |
| 165 | + |
| 166 | +# Logging |
| 167 | +LOG_LEVEL=info # Reduce debug overhead |
| 168 | +``` |
| 169 | + |
| 170 | +### Container Considerations |
| 171 | + |
| 172 | +```dockerfile |
| 173 | +# Optimized production Dockerfile |
| 174 | +FROM ruby:3.2-alpine AS builder |
| 175 | +WORKDIR /app |
| 176 | +COPY Gemfile* ./ |
| 177 | +RUN bundle install --deployment --without development test |
| 178 | + |
| 179 | +FROM ruby:3.2-alpine |
| 180 | +WORKDIR /app |
| 181 | +COPY --from=builder /app/vendor ./vendor |
| 182 | +COPY . . |
| 183 | + |
| 184 | +# Security: Run as non-root user |
| 185 | +RUN addgroup -g 1001 -S appuser && \ |
| 186 | + adduser -S appuser -u 1001 -G appuser |
| 187 | +USER appuser |
| 188 | + |
| 189 | +# Health check |
| 190 | +HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ |
| 191 | + CMD curl -f http://localhost:3000/health || exit 1 |
| 192 | + |
| 193 | +EXPOSE 3000 |
| 194 | +CMD ["bundle", "exec", "puma", "-C", "config/puma.rb"] |
| 195 | +``` |
| 196 | + |
| 197 | +## 🛡️ Security Hardening |
| 198 | + |
| 199 | +### Input Validation |
| 200 | + |
| 201 | +- **Payload Size Limits**: Always configure `request_limit` appropriate for your use case |
| 202 | +- **Timeout Configuration**: Set reasonable `request_timeout` values |
| 203 | +- **Content Type Validation**: Implement strict content type checking if needed |
| 204 | + |
| 205 | +### Authentication |
| 206 | + |
| 207 | +- **HMAC Validation**: Always enable authentication for production endpoints |
| 208 | +- **Secret Management**: Store webhook secrets in environment variables or secure secret management systems |
| 209 | +- **Signature Validation**: Use time-based signature validation to prevent replay attacks |
| 210 | + |
| 211 | +### Network Security |
| 212 | + |
| 213 | +- **TLS Termination**: Always terminate TLS/SSL at load balancer or reverse proxy |
| 214 | +- **IP Whitelisting**: Implement IP restrictions at network level when possible |
| 215 | +- **Rate Limiting**: Implement rate limiting at reverse proxy/load balancer level |
| 216 | + |
| 217 | +## 📊 Performance Benchmarking |
| 218 | + |
| 219 | +### Load Testing Recommendations |
| 220 | + |
| 221 | +1. **Baseline Testing**: Test with minimal handlers and no lifecycle plugins |
| 222 | +2. **Plugin Impact**: Measure performance impact of each lifecycle plugin |
| 223 | +3. **Memory Profiling**: Monitor memory usage over extended periods |
| 224 | +4. **Concurrency Testing**: Test with realistic concurrent webhook loads |
| 225 | + |
| 226 | +### Key Metrics to Monitor |
| 227 | + |
| 228 | +- **Request Processing Time**: P50, P95, P99 response times |
| 229 | +- **Memory Usage**: RSS, heap size, GC frequency |
| 230 | +- **Error Rates**: 4xx and 5xx response rates |
| 231 | +- **Plugin Performance**: Individual plugin execution times |
| 232 | +- **Resource Utilization**: CPU, memory, network I/O |
| 233 | + |
| 234 | +## 🔧 Troubleshooting |
| 235 | + |
| 236 | +### Common Performance Issues |
| 237 | + |
| 238 | +1. **High Memory Usage**: |
| 239 | + - Check for plugin memory leaks |
| 240 | + - Monitor payload sizes |
| 241 | + - Review lifecycle plugin efficiency |
| 242 | + |
| 243 | +2. **Slow Request Processing**: |
| 244 | + - Profile individual plugins |
| 245 | + - Check JSON parsing performance |
| 246 | + - Review handler implementation efficiency |
| 247 | + |
| 248 | +3. **Plugin Loading Issues**: |
| 249 | + - Verify plugin directory permissions |
| 250 | + - Check plugin class name formatting |
| 251 | + - Review security validation errors |
| 252 | + |
| 253 | +### Debug Configuration |
| 254 | + |
| 255 | +For troubleshooting, temporarily enable debug logging: |
| 256 | + |
| 257 | +```yaml |
| 258 | +log_level: "debug" |
| 259 | +environment: "development" # Enables error backtraces |
| 260 | +``` |
| 261 | +
|
| 262 | +**Important**: Never run production with debug logging enabled long-term due to performance and security implications. |
0 commit comments