|
| 1 | +# 🚀 Nexent LLM Monitoring System |
| 2 | + |
| 3 | +Enterprise-grade monitoring solution specifically designed for monitoring LLM token generation speed and performance. |
| 4 | + |
| 5 | +## 📊 System Architecture |
| 6 | + |
| 7 | +``` |
| 8 | +┌─────────────────────────────────────────────────────────┐ |
| 9 | +│ Nexent LLM Monitoring System │ |
| 10 | +├─────────────────────────────────────────────────────────┤ |
| 11 | +│ │ |
| 12 | +│ Nexent API ──► OpenTelemetry ──► Jaeger (Tracing) │ |
| 13 | +│ │ │ │ |
| 14 | +│ │ └──────► Prometheus (Metrics) │ |
| 15 | +│ │ │ │ |
| 16 | +│ └─► OpenAI LLM └──► Grafana (Visualization) │ |
| 17 | +│ (Token Monitoring) │ |
| 18 | +└─────────────────────────────────────────────────────────┘ |
| 19 | +``` |
| 20 | + |
| 21 | +## ⚡ Quick Start (5 minutes) |
| 22 | + |
| 23 | +```bash |
| 24 | +# 1. Start monitoring services |
| 25 | +./docker/start-monitoring.sh |
| 26 | + |
| 27 | +# 2. Install performance monitoring dependencies |
| 28 | +uv sync --extra performance |
| 29 | + |
| 30 | +# 3. Enable monitoring |
| 31 | +export ENABLE_TELEMETRY=true |
| 32 | + |
| 33 | +# 4. Start backend service |
| 34 | +python backend/main_service.py |
| 35 | +``` |
| 36 | + |
| 37 | +## 📊 Access Monitoring Interfaces |
| 38 | + |
| 39 | +| Interface | URL | Purpose | |
| 40 | +|-----------|-----|---------| |
| 41 | +| **Grafana Dashboard** | http://localhost:3005 | LLM Performance Monitoring | |
| 42 | +| **Jaeger Tracing** | http://localhost:16686 | Request Trace Analysis | |
| 43 | +| **Prometheus Metrics** | http://localhost:9090 | Raw Monitoring Data | |
| 44 | + |
| 45 | +### 🔐 Grafana Login Information |
| 46 | + |
| 47 | +When first accessing Grafana (http://localhost:3005), you need to login: |
| 48 | + |
| 49 | +``` |
| 50 | +Username: admin |
| 51 | +Password: admin |
| 52 | +``` |
| 53 | + |
| 54 | +**After first login, you'll be prompted to change password:** |
| 55 | +- Set a new password (recommended) |
| 56 | +- Click "Skip" to skip (development environment) |
| 57 | + |
| 58 | +**After login, you can see:** |
| 59 | +- 📊 **LLM Performance Dashboard** - Pre-configured performance dashboard |
| 60 | +- 📈 **Data Source Configuration** - Auto-connected to Prometheus and Jaeger |
| 61 | +- 🎯 **Real-time Monitoring Panel** - Key metrics like token generation speed, latency |
| 62 | + |
| 63 | +## 🎯 Core Features |
| 64 | + |
| 65 | +### ⚡ LLM-Specific Monitoring |
| 66 | +- **Token Generation Speed**: Real-time monitoring of tokens generated per second |
| 67 | +- **TTFT (Time to First Token)**: First token return latency |
| 68 | +- **Streaming Response Analysis**: Generation timestamp for each token |
| 69 | +- **Model Performance Comparison**: Performance benchmarks across different models |
| 70 | + |
| 71 | +### 🔍 Distributed Tracing |
| 72 | +- **Complete Request Chain**: End-to-end tracing from HTTP to LLM |
| 73 | +- **Performance Bottleneck Detection**: Automatically identify slow queries and anomalies |
| 74 | +- **Error Root Cause Analysis**: Quickly locate problem sources |
| 75 | + |
| 76 | +### 🛠️ Developer-Friendly Design |
| 77 | +- **One-Line Integration**: Quick monitoring with decorators |
| 78 | +- **Zero-Dependency Degradation**: Auto-skip when monitoring dependencies are missing |
| 79 | +- **Zero-Touch Usage**: No need to manually check monitoring status, handled automatically |
| 80 | +- **Flexible Configuration**: Environment variable controlled behavior |
| 81 | + |
| 82 | +## 🛠️ Adding Monitoring to Code |
| 83 | + |
| 84 | +### 🎯 Recommended Approach: Singleton Pattern (v2.1+) |
| 85 | + |
| 86 | +```python |
| 87 | +# Backend service usage - directly use globally configured monitoring_manager |
| 88 | +from utils.monitoring import monitoring_manager |
| 89 | + |
| 90 | +# API endpoint monitoring |
| 91 | +@monitoring_manager.monitor_endpoint("my_service.my_function") |
| 92 | +async def my_api_function(): |
| 93 | + return {"status": "ok"} |
| 94 | + |
| 95 | +# LLM call monitoring |
| 96 | +@monitoring_manager.monitor_llm_call("gpt-4", "chat_completion") |
| 97 | +def call_llm(messages): |
| 98 | + # Automatically get token-level monitoring |
| 99 | + return llm_response |
| 100 | + |
| 101 | +# Manual monitoring events |
| 102 | +monitoring_manager.add_span_event("custom_event", {"key": "value"}) |
| 103 | +monitoring_manager.set_span_attributes(user_id="123", action="process") |
| 104 | +``` |
| 105 | + |
| 106 | +### 📦 Direct SDK Usage |
| 107 | + |
| 108 | +```python |
| 109 | +from nexent.monitor import get_monitoring_manager |
| 110 | + |
| 111 | +# Get global monitoring manager - already configured in backend |
| 112 | +monitor = get_monitoring_manager() |
| 113 | + |
| 114 | +# Use decorators |
| 115 | +@monitor.monitor_llm_call("claude-3", "completion") |
| 116 | +def my_llm_function(): |
| 117 | + return "response" |
| 118 | + |
| 119 | +# Or use directly in business logic |
| 120 | +with monitor.trace_llm_request("custom_operation", "my_model") as span: |
| 121 | + # Execute business logic |
| 122 | + result = process_data() |
| 123 | + monitor.add_span_event("processing_completed") |
| 124 | + return result |
| 125 | +``` |
| 126 | + |
| 127 | +### ✨ Global Configuration Automation |
| 128 | + |
| 129 | +Monitoring configuration is auto-initialized in `backend/utils/monitoring.py`: |
| 130 | + |
| 131 | +```python |
| 132 | +# No manual configuration needed - auto-completed at system startup |
| 133 | +# monitoring_manager already configured with environment variables |
| 134 | +from utils.monitoring import monitoring_manager |
| 135 | + |
| 136 | +# Direct usage without checking if enabled |
| 137 | +@monitoring_manager.monitor_endpoint("my_function") |
| 138 | +def my_function(): |
| 139 | + pass |
| 140 | + |
| 141 | +# FastAPI application initialization |
| 142 | +monitoring_manager.setup_fastapi_app(app) |
| 143 | +``` |
| 144 | + |
| 145 | +### 🔒 Auto Start/Stop Design |
| 146 | + |
| 147 | +- **Smart Monitoring**: Auto start/stop based on `ENABLE_TELEMETRY` environment variable |
| 148 | +- **Zero-Touch Usage**: External code doesn't need to check monitoring status, use all features directly |
| 149 | +- **Graceful Degradation**: Silent no-effect when disabled, normal operation when enabled |
| 150 | +- **Default Off**: Auto-disabled when not configured |
| 151 | + |
| 152 | +```bash |
| 153 | +# Enable monitoring |
| 154 | +export ENABLE_TELEMETRY=true |
| 155 | + |
| 156 | +# Disable monitoring |
| 157 | +export ENABLE_TELEMETRY=false |
| 158 | +``` |
| 159 | + |
| 160 | +## 📊 Core Monitoring Metrics |
| 161 | + |
| 162 | +| Metric | Description | Importance | |
| 163 | +|--------|-------------|------------| |
| 164 | +| `llm_token_generation_rate` | Token generation speed (tokens/s) | ⭐⭐⭐ | |
| 165 | +| `llm_time_to_first_token_seconds` | First token latency | ⭐⭐⭐ | |
| 166 | +| `llm_request_duration_seconds` | Complete request duration | ⭐⭐⭐ | |
| 167 | +| `llm_total_tokens` | Input/output token count | ⭐⭐ | |
| 168 | +| `llm_error_count` | LLM call error count | ⭐⭐⭐ | |
| 169 | + |
| 170 | +## 🔧 Environment Configuration |
| 171 | + |
| 172 | +```bash |
| 173 | +# Add to .env file |
| 174 | +cat >> .env << EOF |
| 175 | +ENABLE_TELEMETRY=true |
| 176 | +SERVICE_NAME=nexent-backend |
| 177 | +JAEGER_ENDPOINT=http://localhost:14268/api/traces |
| 178 | +LLM_SLOW_REQUEST_THRESHOLD_SECONDS=5.0 |
| 179 | +LLM_SLOW_TOKEN_RATE_THRESHOLD=10.0 |
| 180 | +TELEMETRY_SAMPLE_RATE=1.0 # Development environment, production recommended 0.1 |
| 181 | +EOF |
| 182 | +``` |
| 183 | + |
| 184 | +## 🛠️ System Verification |
| 185 | + |
| 186 | +```bash |
| 187 | +# Check metrics endpoint |
| 188 | +curl http://localhost:8000/metrics |
| 189 | + |
| 190 | +# Verify dependency installation |
| 191 | +python -c "from backend.utils.monitoring import MONITORING_AVAILABLE; print(f'Monitoring Available: {MONITORING_AVAILABLE}')" |
| 192 | +``` |
| 193 | + |
| 194 | +## 🆘 Troubleshooting |
| 195 | + |
| 196 | +### No monitoring data? |
| 197 | +```bash |
| 198 | +# Check service status |
| 199 | +docker-compose -f docker/docker-compose-monitoring.yml ps |
| 200 | + |
| 201 | +# Check dependency installation |
| 202 | +python -c "import opentelemetry; print('✅ Monitoring dependencies installed')" |
| 203 | +``` |
| 204 | + |
| 205 | +### Port conflicts? |
| 206 | +```bash |
| 207 | +# Check port usage |
| 208 | +lsof -i :3005 -i :9090 -i :16686 |
| 209 | +``` |
| 210 | + |
| 211 | +### Dependency installation issues? |
| 212 | +```bash |
| 213 | +# Reinstall performance dependencies |
| 214 | +uv sync --extra performance |
| 215 | + |
| 216 | +# Check performance configuration in pyproject.toml |
| 217 | +cat backend/pyproject.toml | grep -A 20 "performance" |
| 218 | +``` |
| 219 | + |
| 220 | +### Service name shows as unknown_service? |
| 221 | +```bash |
| 222 | +# Check environment variable configuration |
| 223 | +echo "SERVICE_NAME: $SERVICE_NAME" |
| 224 | + |
| 225 | +# Restart monitoring service to apply new configuration |
| 226 | +./docker/start-monitoring.sh |
| 227 | +``` |
| 228 | + |
| 229 | +## 🧹 Data Management |
| 230 | + |
| 231 | +### Clean Jaeger Trace Data |
| 232 | +```bash |
| 233 | +# Method 1: Restart Jaeger container (simplest) |
| 234 | +docker-compose -f docker/docker-compose-monitoring.yml restart nexent-jaeger |
| 235 | + |
| 236 | +# Method 2: Completely rebuild Jaeger container and data |
| 237 | +docker-compose -f docker/docker-compose-monitoring.yml stop nexent-jaeger |
| 238 | +docker-compose -f docker/docker-compose-monitoring.yml rm -f nexent-jaeger |
| 239 | +docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-jaeger |
| 240 | + |
| 241 | +# Method 3: Clean all monitoring data (rebuild all containers) |
| 242 | +docker-compose -f docker/docker-compose-monitoring.yml down |
| 243 | +docker-compose -f docker/docker-compose-monitoring.yml up -d |
| 244 | +``` |
| 245 | + |
| 246 | +### Clean Prometheus Metrics Data |
| 247 | +```bash |
| 248 | +# Restart Prometheus container |
| 249 | +docker-compose -f docker/docker-compose-monitoring.yml restart nexent-prometheus |
| 250 | + |
| 251 | +# Completely clean Prometheus data |
| 252 | +docker-compose -f docker/docker-compose-monitoring.yml stop nexent-prometheus |
| 253 | +docker volume rm docker_prometheus_data 2>/dev/null || true |
| 254 | +docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-prometheus |
| 255 | +``` |
| 256 | + |
| 257 | +### Clean Grafana Configuration |
| 258 | +```bash |
| 259 | +# Reset Grafana configuration and dashboards |
| 260 | +docker-compose -f docker/docker-compose-monitoring.yml stop nexent-grafana |
| 261 | +docker volume rm docker_grafana_data 2>/dev/null || true |
| 262 | +docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-grafana |
| 263 | +``` |
| 264 | + |
| 265 | +## 📈 Typical Problem Analysis |
| 266 | + |
| 267 | +### Slow token generation (< 5 tokens/s) |
| 268 | +1. **Analysis**: Grafana → Token Generation Rate panel |
| 269 | +2. **Solution**: Check model service load, optimize input prompt length |
| 270 | + |
| 271 | +### Slow request response (> 10s) |
| 272 | +1. **Analysis**: Jaeger → View complete trace chain |
| 273 | +2. **Solution**: Locate bottleneck (database/LLM/network) |
| 274 | + |
| 275 | +### Error rate spike (> 10%) |
| 276 | +1. **Analysis**: Prometheus → llm_error_count metric |
| 277 | +2. **Solution**: Check model service availability, verify API keys |
| 278 | + |
| 279 | +## 🎉 Getting Started |
| 280 | + |
| 281 | +After setup completion, you can: |
| 282 | + |
| 283 | +1. 📊 View **LLM Performance Dashboard** in Grafana |
| 284 | +2. 🔍 Trace complete request chains in Jaeger |
| 285 | +3. 📈 Analyze token generation speed and performance bottlenecks |
| 286 | +4. 🚨 Set performance alerts and thresholds |
| 287 | + |
| 288 | +Enjoy efficient LLM performance monitoring! 🚀 |
0 commit comments