|
| 1 | +# Gunicorn Worker Timeout Fix |
| 2 | + |
| 3 | +## Problem Description |
| 4 | + |
| 5 | +The Subscription Tracker was experiencing Gunicorn worker timeout errors when saving subscriptions, leading to "Server Unavailable" errors in the browser. The error traceback showed: |
| 6 | + |
| 7 | +``` |
| 8 | +File "/usr/local/lib/python3.13/site-packages/gunicorn/workers/base.py", line 204, in handle_abort |
| 9 | + sys.exit(1) |
| 10 | +``` |
| 11 | + |
| 12 | +## Root Causes Identified |
| 13 | + |
| 14 | +1. **Long-running external API calls** for currency conversion during subscription save operations |
| 15 | +2. **Multiple synchronous API calls** to exchange rate providers without proper timeout handling |
| 16 | +3. **No circuit breaker pattern** for failed API providers |
| 17 | +4. **Database operations without timeout protection** |
| 18 | +5. **Insufficient error handling** in critical paths |
| 19 | + |
| 20 | +## Fixes Implemented |
| 21 | + |
| 22 | +### 1. Improved Error Handling in Routes (`app/routes.py`) |
| 23 | + |
| 24 | +- Added try-catch blocks around subscription add/edit operations |
| 25 | +- Added database rollback on errors |
| 26 | +- Added user-friendly error messages |
| 27 | +- Added logging for debugging |
| 28 | + |
| 29 | +```python |
| 30 | +try: |
| 31 | + # subscription save logic |
| 32 | + db.session.commit() |
| 33 | + flash('Subscription added successfully!', 'success') |
| 34 | + return redirect(url_for('main.dashboard')) |
| 35 | +except Exception as e: |
| 36 | + db.session.rollback() |
| 37 | + current_app.logger.error(f"Error adding subscription: {e}") |
| 38 | + flash('An error occurred while saving the subscription. Please try again.', 'error') |
| 39 | + return render_template('add_subscription.html', form=form) |
| 40 | +``` |
| 41 | + |
| 42 | +### 2. Reduced API Timeouts (`app/currency.py`) |
| 43 | + |
| 44 | +- Reduced external API timeouts from 10s to 5s |
| 45 | +- Added circuit breaker pattern for failed providers |
| 46 | +- Improved fallback rate handling |
| 47 | + |
| 48 | +```python |
| 49 | +def _fetch_frankfurter(self): |
| 50 | + url = 'https://api.frankfurter.app/latest?from=EUR' |
| 51 | + r = requests.get(url, timeout=5) # Reduced from 10s |
| 52 | +``` |
| 53 | + |
| 54 | +### 3. Circuit Breaker Pattern (`app/currency.py`) |
| 55 | + |
| 56 | +- Added failure tracking for each provider |
| 57 | +- Automatic circuit opening after 3 consecutive failures |
| 58 | +- Circuit reset after 5 minutes |
| 59 | + |
| 60 | +```python |
| 61 | +def _is_circuit_open(self, provider): |
| 62 | + if provider not in self._circuit_breaker: |
| 63 | + return False |
| 64 | + failures, last_failure = self._circuit_breaker[provider] |
| 65 | + # Reset circuit breaker after 5 minutes |
| 66 | + if datetime.now().timestamp() - last_failure > 300: |
| 67 | + del self._circuit_breaker[provider] |
| 68 | + return False |
| 69 | + # Open circuit after 3 consecutive failures |
| 70 | + return failures >= 3 |
| 71 | +``` |
| 72 | + |
| 73 | +### 4. Enhanced Gunicorn Configuration (`gunicorn.conf.py`) |
| 74 | + |
| 75 | +- Increased worker timeout from 30s to 60s |
| 76 | +- Added proper worker management |
| 77 | +- Enhanced logging configuration |
| 78 | + |
| 79 | +```python |
| 80 | +# Worker timeout |
| 81 | +timeout = 60 # Increased from default 30s |
| 82 | +graceful_timeout = 30 |
| 83 | +workers = 2 |
| 84 | +worker_class = "sync" |
| 85 | +``` |
| 86 | + |
| 87 | +### 5. Database Connection Improvements (`config.py`) |
| 88 | + |
| 89 | +- Added connection pool settings |
| 90 | +- Added timeout configuration for SQLite |
| 91 | +- Added connection pre-ping for health checks |
| 92 | + |
| 93 | +```python |
| 94 | +SQLALCHEMY_ENGINE_OPTIONS = { |
| 95 | + 'pool_timeout': 20, |
| 96 | + 'pool_recycle': 3600, |
| 97 | + 'pool_pre_ping': True, |
| 98 | + 'connect_args': { |
| 99 | + 'timeout': 30, |
| 100 | + 'check_same_thread': False |
| 101 | + } |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +### 6. Improved Currency Conversion Caching (`app/models.py`) |
| 106 | + |
| 107 | +- Enhanced caching strategy to avoid API calls during subscription operations |
| 108 | +- Added fallback to database cache before making external API calls |
| 109 | +- Better error handling in conversion methods |
| 110 | + |
| 111 | +### 7. Dashboard Performance Improvements (`app/routes.py`) |
| 112 | + |
| 113 | +- Pre-fetch exchange rates once per request |
| 114 | +- Better error handling for cost calculations |
| 115 | +- User-friendly warnings when rates are unavailable |
| 116 | + |
| 117 | +### 8. Application-level Error Handling (`app/__init__.py`) |
| 118 | + |
| 119 | +- Added global timeout error handler |
| 120 | +- Added 500 error handler with proper rollback |
| 121 | +- Added performance logging for slow requests |
| 122 | + |
| 123 | +### 9. Health Check Endpoint (`app/routes.py`) |
| 124 | + |
| 125 | +- Added `/health` endpoint for monitoring |
| 126 | +- Checks database connectivity and currency rate availability |
| 127 | + |
| 128 | +### 10. Monitoring Script (`monitor.py`) |
| 129 | + |
| 130 | +- Python script to monitor application health |
| 131 | +- Tests both health endpoint and functional operations |
| 132 | +- Can be used for automated monitoring |
| 133 | + |
| 134 | +## Testing the Fixes |
| 135 | + |
| 136 | +1. **Basic Health Check**: |
| 137 | + ```bash |
| 138 | + curl http://localhost:5000/health |
| 139 | + ``` |
| 140 | + |
| 141 | +2. **Monitor Application**: |
| 142 | + ```bash |
| 143 | + python monitor.py --url http://localhost:5000 --once |
| 144 | + ``` |
| 145 | + |
| 146 | +3. **Load Testing**: |
| 147 | + - Try saving multiple subscriptions quickly |
| 148 | + - Test with different currencies |
| 149 | + - Test when external APIs are slow/unavailable |
| 150 | + |
| 151 | +## Prevention Measures |
| 152 | + |
| 153 | +1. **Monitoring**: Use the health check endpoint for automated monitoring |
| 154 | +2. **Alerting**: Set up alerts for 500 errors and slow response times |
| 155 | +3. **Regular Testing**: Use the monitor script to test functionality |
| 156 | +4. **Log Analysis**: Monitor application logs for warnings and errors |
| 157 | + |
| 158 | +## Recommended Environment Variables |
| 159 | + |
| 160 | +For production deployment, consider adding: |
| 161 | + |
| 162 | +```bash |
| 163 | +# Reduce currency refresh frequency to avoid API rate limits |
| 164 | +CURRENCY_REFRESH_MINUTES=1440 # 24 hours |
| 165 | + |
| 166 | +# Set specific provider priority |
| 167 | +CURRENCY_PROVIDER_PRIORITY=frankfurter,floatrates,erapi_open |
| 168 | + |
| 169 | +# Enable performance logging |
| 170 | +PERFORMANCE_LOGGING=true |
| 171 | +``` |
| 172 | + |
| 173 | +## Expected Improvements |
| 174 | + |
| 175 | +- Reduced timeout errors by 90%+ |
| 176 | +- Faster subscription save operations |
| 177 | +- Better user experience with error messages |
| 178 | +- More resilient currency conversion |
| 179 | +- Easier debugging and monitoring |
0 commit comments