Skip to content

Commit 6b01e89

Browse files
committed
✨ Add performance monitor module
1 parent 87bf4fa commit 6b01e89

File tree

3 files changed

+585
-2
lines changed

3 files changed

+585
-2
lines changed

doc/docs/.vitepress/config.mts

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,13 @@ export default defineConfig({
1313

1414
// Ignore localhost links as they are meant for local deployment access
1515
ignoreDeadLinks: [
16-
// Ignore localhost links
17-
/^http:\/\/localhost:3000/
16+
// Ignore localhost links for main app
17+
/^http:\/\/localhost:3000/,
18+
// Ignore localhost links for monitoring services
19+
/^http:\/\/localhost:3005/, // Grafana
20+
/^http:\/\/localhost:9090/, // Prometheus
21+
/^http:\/\/localhost:16686/, // Jaeger
22+
/^http:\/\/localhost:8000/ // Metrics endpoint
1823
],
1924

2025
locales: {
@@ -73,6 +78,7 @@ export default defineConfig({
7378
{ text: 'Models', link: '/en/sdk/core/models' }
7479
]
7580
},
81+
{ text: 'Performance Monitoring', link: '/en/sdk/monitoring' },
7682
{ text: 'Vector Database', link: '/en/sdk/vector-database' },
7783
{ text: 'Data Processing', link: '/en/sdk/data-process' }
7884
]
@@ -200,6 +206,7 @@ export default defineConfig({
200206
{ text: '模型模块', link: '/zh/sdk/core/models' }
201207
]
202208
},
209+
{ text: '性能监控', link: '/zh/sdk/monitoring' },
203210
{ text: '向量数据库', link: '/zh/sdk/vector-database' },
204211
{ text: '数据处理', link: '/zh/sdk/data-process' }
205212
]

doc/docs/en/sdk/monitoring.md

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# 🚀 Nexent LLM Monitoring System
2+
3+
Enterprise-grade monitoring solution specifically designed for monitoring LLM token generation speed and performance.
4+
5+
## 📊 System Architecture
6+
7+
```
8+
┌─────────────────────────────────────────────────────────┐
9+
│ Nexent LLM Monitoring System │
10+
├─────────────────────────────────────────────────────────┤
11+
│ │
12+
│ Nexent API ──► OpenTelemetry ──► Jaeger (Tracing) │
13+
│ │ │ │
14+
│ │ └──────► Prometheus (Metrics) │
15+
│ │ │ │
16+
│ └─► OpenAI LLM └──► Grafana (Visualization) │
17+
│ (Token Monitoring) │
18+
└─────────────────────────────────────────────────────────┘
19+
```
20+
21+
## ⚡ Quick Start (5 minutes)
22+
23+
```bash
24+
# 1. Start monitoring services
25+
./docker/start-monitoring.sh
26+
27+
# 2. Install performance monitoring dependencies
28+
uv sync --extra performance
29+
30+
# 3. Enable monitoring
31+
export ENABLE_TELEMETRY=true
32+
33+
# 4. Start backend service
34+
python backend/main_service.py
35+
```
36+
37+
## 📊 Access Monitoring Interfaces
38+
39+
| Interface | URL | Purpose |
40+
|-----------|-----|---------|
41+
| **Grafana Dashboard** | http://localhost:3005 | LLM Performance Monitoring |
42+
| **Jaeger Tracing** | http://localhost:16686 | Request Trace Analysis |
43+
| **Prometheus Metrics** | http://localhost:9090 | Raw Monitoring Data |
44+
45+
### 🔐 Grafana Login Information
46+
47+
When first accessing Grafana (http://localhost:3005), you need to login:
48+
49+
```
50+
Username: admin
51+
Password: admin
52+
```
53+
54+
**After first login, you'll be prompted to change password:**
55+
- Set a new password (recommended)
56+
- Click "Skip" to skip (development environment)
57+
58+
**After login, you can see:**
59+
- 📊 **LLM Performance Dashboard** - Pre-configured performance dashboard
60+
- 📈 **Data Source Configuration** - Auto-connected to Prometheus and Jaeger
61+
- 🎯 **Real-time Monitoring Panel** - Key metrics like token generation speed, latency
62+
63+
## 🎯 Core Features
64+
65+
### ⚡ LLM-Specific Monitoring
66+
- **Token Generation Speed**: Real-time monitoring of tokens generated per second
67+
- **TTFT (Time to First Token)**: First token return latency
68+
- **Streaming Response Analysis**: Generation timestamp for each token
69+
- **Model Performance Comparison**: Performance benchmarks across different models
70+
71+
### 🔍 Distributed Tracing
72+
- **Complete Request Chain**: End-to-end tracing from HTTP to LLM
73+
- **Performance Bottleneck Detection**: Automatically identify slow queries and anomalies
74+
- **Error Root Cause Analysis**: Quickly locate problem sources
75+
76+
### 🛠️ Developer-Friendly Design
77+
- **One-Line Integration**: Quick monitoring with decorators
78+
- **Zero-Dependency Degradation**: Auto-skip when monitoring dependencies are missing
79+
- **Zero-Touch Usage**: No need to manually check monitoring status, handled automatically
80+
- **Flexible Configuration**: Environment variable controlled behavior
81+
82+
## 🛠️ Adding Monitoring to Code
83+
84+
### 🎯 Recommended Approach: Singleton Pattern (v2.1+)
85+
86+
```python
87+
# Backend service usage - directly use globally configured monitoring_manager
88+
from utils.monitoring import monitoring_manager
89+
90+
# API endpoint monitoring
91+
@monitoring_manager.monitor_endpoint("my_service.my_function")
92+
async def my_api_function():
93+
return {"status": "ok"}
94+
95+
# LLM call monitoring
96+
@monitoring_manager.monitor_llm_call("gpt-4", "chat_completion")
97+
def call_llm(messages):
98+
# Automatically get token-level monitoring
99+
return llm_response
100+
101+
# Manual monitoring events
102+
monitoring_manager.add_span_event("custom_event", {"key": "value"})
103+
monitoring_manager.set_span_attributes(user_id="123", action="process")
104+
```
105+
106+
### 📦 Direct SDK Usage
107+
108+
```python
109+
from nexent.monitor import get_monitoring_manager
110+
111+
# Get global monitoring manager - already configured in backend
112+
monitor = get_monitoring_manager()
113+
114+
# Use decorators
115+
@monitor.monitor_llm_call("claude-3", "completion")
116+
def my_llm_function():
117+
return "response"
118+
119+
# Or use directly in business logic
120+
with monitor.trace_llm_request("custom_operation", "my_model") as span:
121+
# Execute business logic
122+
result = process_data()
123+
monitor.add_span_event("processing_completed")
124+
return result
125+
```
126+
127+
### ✨ Global Configuration Automation
128+
129+
Monitoring configuration is auto-initialized in `backend/utils/monitoring.py`:
130+
131+
```python
132+
# No manual configuration needed - auto-completed at system startup
133+
# monitoring_manager already configured with environment variables
134+
from utils.monitoring import monitoring_manager
135+
136+
# Direct usage without checking if enabled
137+
@monitoring_manager.monitor_endpoint("my_function")
138+
def my_function():
139+
pass
140+
141+
# FastAPI application initialization
142+
monitoring_manager.setup_fastapi_app(app)
143+
```
144+
145+
### 🔒 Auto Start/Stop Design
146+
147+
- **Smart Monitoring**: Auto start/stop based on `ENABLE_TELEMETRY` environment variable
148+
- **Zero-Touch Usage**: External code doesn't need to check monitoring status, use all features directly
149+
- **Graceful Degradation**: Silent no-effect when disabled, normal operation when enabled
150+
- **Default Off**: Auto-disabled when not configured
151+
152+
```bash
153+
# Enable monitoring
154+
export ENABLE_TELEMETRY=true
155+
156+
# Disable monitoring
157+
export ENABLE_TELEMETRY=false
158+
```
159+
160+
## 📊 Core Monitoring Metrics
161+
162+
| Metric | Description | Importance |
163+
|--------|-------------|------------|
164+
| `llm_token_generation_rate` | Token generation speed (tokens/s) | ⭐⭐⭐ |
165+
| `llm_time_to_first_token_seconds` | First token latency | ⭐⭐⭐ |
166+
| `llm_request_duration_seconds` | Complete request duration | ⭐⭐⭐ |
167+
| `llm_total_tokens` | Input/output token count | ⭐⭐ |
168+
| `llm_error_count` | LLM call error count | ⭐⭐⭐ |
169+
170+
## 🔧 Environment Configuration
171+
172+
```bash
173+
# Add to .env file
174+
cat >> .env << EOF
175+
ENABLE_TELEMETRY=true
176+
SERVICE_NAME=nexent-backend
177+
JAEGER_ENDPOINT=http://localhost:14268/api/traces
178+
LLM_SLOW_REQUEST_THRESHOLD_SECONDS=5.0
179+
LLM_SLOW_TOKEN_RATE_THRESHOLD=10.0
180+
TELEMETRY_SAMPLE_RATE=1.0 # Development environment, production recommended 0.1
181+
EOF
182+
```
183+
184+
## 🛠️ System Verification
185+
186+
```bash
187+
# Check metrics endpoint
188+
curl http://localhost:8000/metrics
189+
190+
# Verify dependency installation
191+
python -c "from backend.utils.monitoring import MONITORING_AVAILABLE; print(f'Monitoring Available: {MONITORING_AVAILABLE}')"
192+
```
193+
194+
## 🆘 Troubleshooting
195+
196+
### No monitoring data?
197+
```bash
198+
# Check service status
199+
docker-compose -f docker/docker-compose-monitoring.yml ps
200+
201+
# Check dependency installation
202+
python -c "import opentelemetry; print('✅ Monitoring dependencies installed')"
203+
```
204+
205+
### Port conflicts?
206+
```bash
207+
# Check port usage
208+
lsof -i :3005 -i :9090 -i :16686
209+
```
210+
211+
### Dependency installation issues?
212+
```bash
213+
# Reinstall performance dependencies
214+
uv sync --extra performance
215+
216+
# Check performance configuration in pyproject.toml
217+
cat backend/pyproject.toml | grep -A 20 "performance"
218+
```
219+
220+
### Service name shows as unknown_service?
221+
```bash
222+
# Check environment variable configuration
223+
echo "SERVICE_NAME: $SERVICE_NAME"
224+
225+
# Restart monitoring service to apply new configuration
226+
./docker/start-monitoring.sh
227+
```
228+
229+
## 🧹 Data Management
230+
231+
### Clean Jaeger Trace Data
232+
```bash
233+
# Method 1: Restart Jaeger container (simplest)
234+
docker-compose -f docker/docker-compose-monitoring.yml restart nexent-jaeger
235+
236+
# Method 2: Completely rebuild Jaeger container and data
237+
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-jaeger
238+
docker-compose -f docker/docker-compose-monitoring.yml rm -f nexent-jaeger
239+
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-jaeger
240+
241+
# Method 3: Clean all monitoring data (rebuild all containers)
242+
docker-compose -f docker/docker-compose-monitoring.yml down
243+
docker-compose -f docker/docker-compose-monitoring.yml up -d
244+
```
245+
246+
### Clean Prometheus Metrics Data
247+
```bash
248+
# Restart Prometheus container
249+
docker-compose -f docker/docker-compose-monitoring.yml restart nexent-prometheus
250+
251+
# Completely clean Prometheus data
252+
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-prometheus
253+
docker volume rm docker_prometheus_data 2>/dev/null || true
254+
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-prometheus
255+
```
256+
257+
### Clean Grafana Configuration
258+
```bash
259+
# Reset Grafana configuration and dashboards
260+
docker-compose -f docker/docker-compose-monitoring.yml stop nexent-grafana
261+
docker volume rm docker_grafana_data 2>/dev/null || true
262+
docker-compose -f docker/docker-compose-monitoring.yml up -d nexent-grafana
263+
```
264+
265+
## 📈 Typical Problem Analysis
266+
267+
### Slow token generation (< 5 tokens/s)
268+
1. **Analysis**: Grafana → Token Generation Rate panel
269+
2. **Solution**: Check model service load, optimize input prompt length
270+
271+
### Slow request response (> 10s)
272+
1. **Analysis**: Jaeger → View complete trace chain
273+
2. **Solution**: Locate bottleneck (database/LLM/network)
274+
275+
### Error rate spike (> 10%)
276+
1. **Analysis**: Prometheus → llm_error_count metric
277+
2. **Solution**: Check model service availability, verify API keys
278+
279+
## 🎉 Getting Started
280+
281+
After setup completion, you can:
282+
283+
1. 📊 View **LLM Performance Dashboard** in Grafana
284+
2. 🔍 Trace complete request chains in Jaeger
285+
3. 📈 Analyze token generation speed and performance bottlenecks
286+
4. 🚨 Set performance alerts and thresholds
287+
288+
Enjoy efficient LLM performance monitoring! 🚀

0 commit comments

Comments
 (0)