Skip to content

Commit 05f4060

Browse files
Kamal Sai DevarapalliKamal Sai Devarapalli
authored andcommitted
Clean up documentation: reduce emojis and update service references
- Remove checkmark emojis from LOAD_TEST_RESULTS.md - Update Redis documentation (booking → taskprocessing references) - Update Vault setup paths (booking → taskprocessing) - Update log monitoring docs with correct service names - Add legacy class name notes where appropriate
1 parent 5ad055a commit 05f4060

File tree

5 files changed

+24
-25
lines changed

5 files changed

+24
-25
lines changed

LOAD_TEST_RESULTS.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,13 +81,13 @@
8181

8282
## Key Observations
8383

84-
### Excellent Performance
84+
### Excellent Performance
8585
1. **Response Times:** All services show sub-15ms average response times
8686
2. **Throughput:** All services handle **1,200-1,450 req/sec** easily
8787
3. **Success Rate:** 100% success rate for all working endpoints
8888
4. **Stability:** No errors or timeouts during testing
8989

90-
### Gunicorn Configuration Working
90+
### Gunicorn Configuration Working
9191
- **4 workers confirmed** running in each service
9292
- Workers are handling concurrent requests efficiently
9393
- No worker exhaustion or queuing delays observed
@@ -111,7 +111,7 @@
111111

112112
### After (Gunicorn - 4 Workers × 2 Threads)
113113
- **Concurrent requests:** 8 simultaneous
114-
- **Actual throughput:** **1,200-1,450 req/sec**
114+
- **Actual throughput:** **1,200-1,450 req/sec**
115115
- **Response times:** 12-15ms average (excellent)
116116

117117
**Improvement:** ~12-25x increase in throughput!
@@ -123,7 +123,7 @@
123123
### Current Configuration (4 workers × 2 threads)
124124
- **Theoretical max concurrent:** 8 requests per instance
125125
- **Actual measured throughput:** ~1,300 req/sec per service
126-
- **Target capacity (1000-2000 req/sec):** **ACHIEVED**
126+
- **Target capacity (1000-2000 req/sec):** **ACHIEVED**
127127

128128
### Scaling Recommendations
129129

@@ -150,7 +150,7 @@ For **higher loads** (2000+ req/sec), you can:
150150

151151
## Conclusion
152152

153-
**Gunicorn configuration is working perfectly!**
153+
**Gunicorn configuration is working perfectly!**
154154

155155
- All services are handling **1,200-1,450 requests/second**
156156
- Response times are excellent (12-15ms average)

REDIS_SETUP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ user = redis_helper.get_cached_user(123)
6868
| Service | Redis DB | Purpose |
6969
|---------|----------|---------|
7070
| User Management | 0 | User cache, sessions, rate limiting |
71-
| Booking | 1 | Booking cache, flight availability |
71+
| Task Processing | 1 | Task cache, task data |
7272
| Notification | 2 | Notification cache |
7373

7474
## Common Use Cases
@@ -176,7 +176,7 @@ KEYS user:*
176176

177177
# Booking keys (DB 1)
178178
docker-compose exec redis redis-cli -n 1
179-
KEYS booking:*
179+
KEYS booking:* # Legacy key pattern (task processing uses this pattern)
180180
```
181181

182182
## Next Steps

VAULT_SETUP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ secret/
112112
db # Database credentials
113113
jwt # JWT secret key
114114
kafka # Kafka credentials
115-
booking/
115+
taskprocessing/
116116
db
117117
external-api
118118
notification/

docs/log_monitoring_system.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
# Real-Time Log Monitoring System ## Overview A production-ready real-time log monitoring system that: - **Collects logs** from multiple microservices - **Streams logs** to Apache Kafka - **Filters errors** in real-time - **Provides dashboard** for visualization - **Integrates with Grafana** for advanced monitoring ## Architecture ``` Microservices (User Mgmt, Booking, etc) Logs via Kafka Handler Apache Kafka Topics: - application- logs - application- logs-errors Consumer filters errors Log Monitor Service - Error Store - API Endpoints - Dashboard Dashboard (HTML) Grafana Integration REST API ``` ## Components ### 1. Kafka Log Handler **Location**: `common/pyportal_common/logging_handlers/kafka_log_handler.py` - Custom Python logging handler - Automatically sends logs to Kafka - Separates errors into `application-logs-errors` topic - Includes metadata: service name, host, timestamp, etc. ### 2. Log Monitor Service **Location**: `services/logmonitor/` - Consumes logs from Kafka - Filters ERROR and CRITICAL level logs - Stores errors in-memory (can be extended to database) - Provides REST API for dashboard - Grafana-compatible endpoints ### 3. Dashboard **Location**: `services/logmonitor/app/dashboard.html` - Real-time error visualization - Auto-refresh capability - Statistics display - Error details view ## Setup ### 1. Start Services ```bash # Start all services including log monitor docker-compose up -d # Or start just log monitor docker-compose up -d logmonitor-service ``` ### 2. Access Dashboard Open browser: http://localhost:5004 ### 3. Verify Log Collection ```bash # Check if logs are being sent to Kafka docker-compose exec kafka kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic application-logs \ --from-beginning # Check error logs docker-compose exec kafka kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic application-logs-errors \ --from-beginning ``` ## API Endpoints ### Get Error Logs ```bash GET /api/v1/logs/errors?limit=100&service=usermanagement ``` ### Get Statistics ```bash GET /api/v1/logs/stats ``` ### Get Services ```bash GET /api/v1/logs/services ``` ### Get Service-Specific Errors ```bash GET /api/v1/logs/errors/usermanagement?limit=50 ``` ## Grafana Integration ### Setup Grafana Data Source 1. **Add Data Source in Grafana:** - Type: JSON API - URL: http://localhost:5004/api/v1/grafana - Access: Server (default) 2. **Query Endpoints:** - Search: `/api/v1/grafana/search` - Query: `/api/v1/grafana/query` - Annotations: `/api/v1/grafana/annotations` ### Example Grafana Queries **Error Count:** ```json { "target": "error_count" } ``` **Errors by Service:** ```json { "target": "error_by_service" } ``` **Errors by Level:** ```json { "target": "error_by_level" } ``` ## How It Works ### 1. Log Collection Each microservice automatically sends logs to Kafka: ```python # In service __init__.py from common.pyportal_common.logging_handlers.base_logger import LogMonitor # Logger automatically configured with Kafka handler logger = LogMonitor("usermanagement").logger # All log calls go to Kafka logger.info("User created") logger.error("Database connection failed") # Goes to error topic ``` ### 2. Error Filtering Log Monitor Service consumes from Kafka: ```python # Consumes from: # - application-logs-errors (pre-filtered) # - application-logs (filters for ERROR/CRITICAL) # Stores errors in ErrorLogStore error_store.add_error(log_data) ``` ### 3. Dashboard Display Dashboard polls API every 5 seconds: ```javascript // Auto-refresh setInterval(() => { fetch('/api/v1/logs/errors?limit=50') .then(res => res.json()) .then(data => updateErrors(data.errors)); }, 5000); ``` ## Configuration ### Environment Variables **Log Monitor Service:** ```bash LOG_MONITOR_SERVER_PORT=9094 KAFKA_BOOTSTRAP_SERVERS=kafka:29092 SERVICE_NAME=logmonitor ``` **Microservices (for Kafka logging):** ```bash KAFKA_BOOTSTRAP_SERVERS=kafka:29092 SERVICE_NAME=usermanagement # or booking, notification, etc. HOSTNAME=usermanagement-service ``` ## Features ### Real-Time Monitoring - Logs streamed to Kafka in real-time - Dashboard auto-refreshes every 5 seconds - No polling delays ### Error Filtering - Automatic filtering of ERROR and CRITICAL logs - Separate Kafka topic for errors - Efficient processing ### Multi-Service Support - Collects logs from all microservices - Service identification in logs - Per-service error statistics ### Grafana Ready - Compatible with Grafana JSON API data source - Time series data support - Annotation support for events ### Production Ready - Error handling and resilience - Connection pooling - Graceful degradation ## Extending the System ### Store Errors in Database Replace in-memory store with database: ```python # In kafka_consumer.py from app.models.error_log import ErrorLog def add_error_to_db(error_log): error = ErrorLog( timestamp=error_log['timestamp'], level=error_log['level'], service=error_log['service'], message=error_log['message'] ) db.session.add(error) db.session.commit() ``` ### Add Alerting ```python # In kafka_consumer.py def check_and_alert(error_log): if error_log['level'] == 'CRITICAL': send_alert_email(error_log) send_slack_notification(error_log) ``` ### Add Log Retention ```python # Clean old errors def cleanup_old_errors(): cutoff = datetime.utcnow() - timedelta(days=7) ErrorLog.query.filter(ErrorLog.timestamp < cutoff).delete() db.session.commit() ``` ## Monitoring Best Practices 1. **Set Appropriate Log Levels** - Use ERROR for recoverable errors - Use CRITICAL for system failures 2. **Include Context** - Service name - Request ID - User ID (if applicable) 3. **Monitor Dashboard Regularly** - Check for error spikes - Identify problematic services - Track error trends 4. **Set Up Alerts** - Critical error threshold - Error rate threshold - Service-specific alerts ## Troubleshooting ### Logs Not Appearing 1. Check Kafka connection: ```bash docker-compose exec kafka kafka-topics --list --bootstrap-server localhost:9092 ``` 2. Check log monitor service: ```bash docker-compose logs logmonitor-service ``` 3. Verify environment variables: ```bash docker-compose exec usermanagement-service env | grep KAFKA ``` ### Dashboard Not Loading 1. Check service is running: ```bash docker-compose ps logmonitor-service ``` 2. Check API endpoint: ```bash curl http://localhost:5004/api/v1/logs/stats ``` ## Performance Considerations - **Kafka Topics**: Separate topics for errors improve filtering - **Batch Processing**: Consider batching log writes - **Storage**: In-memory store is fast but limited; use DB for production - **Rate Limiting**: Monitor Kafka throughput ## Security - **Authentication**: Add authentication to API endpoints - **Authorization**: Restrict dashboard access - **Encryption**: Use TLS for Kafka in production - **Log Sanitization**: Remove sensitive data from logs ## Next Steps 1. System is implemented and ready 2. Add database persistence for errors 3. Implement alerting (email/Slack) 4. Add log retention policies 5. Set up Grafana dashboards 6. Add authentication to dashboard
1+
# Real-Time Log Monitoring System ## Overview A production-ready real-time log monitoring system that: - **Collects logs** from multiple microservices - **Streams logs** to Apache Kafka - **Filters errors** in real-time - **Provides dashboard** for visualization - **Integrates with Grafana** for advanced monitoring ## Architecture ``` Microservices (User Mgmt, Task Processing, Notification, etc) Logs via Kafka Handler Apache Kafka Topics: - application- logs - application- logs-errors Consumer filters errors Log Monitor Service - Error Store - API Endpoints - Dashboard Dashboard (HTML) Grafana Integration REST API ``` ## Components ### 1. Kafka Log Handler **Location**: `common/pyportal_common/logging_handlers/kafka_log_handler.py` - Custom Python logging handler - Automatically sends logs to Kafka - Separates errors into `application-logs-errors` topic - Includes metadata: service name, host, timestamp, etc. ### 2. Log Monitor Service **Location**: `services/logmonitor/` - Consumes logs from Kafka - Filters ERROR and CRITICAL level logs - Stores errors in-memory (can be extended to database) - Provides REST API for dashboard - Grafana-compatible endpoints ### 3. Dashboard **Location**: `services/logmonitor/app/dashboard.html` - Real-time error visualization - Auto-refresh capability - Statistics display - Error details view ## Setup ### 1. Start Services ```bash # Start all services including log monitor docker-compose up -d # Or start just log monitor docker-compose up -d logmonitor-service ``` ### 2. Access Dashboard Open browser: http://localhost:5004 ### 3. Verify Log Collection ```bash # Check if logs are being sent to Kafka docker-compose exec kafka kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic application-logs \ --from-beginning # Check error logs docker-compose exec kafka kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic application-logs-errors \ --from-beginning ``` ## API Endpoints ### Get Error Logs ```bash GET /api/v1/logs/errors?limit=100&service=usermanagement ``` ### Get Statistics ```bash GET /api/v1/logs/stats ``` ### Get Services ```bash GET /api/v1/logs/services ``` ### Get Service-Specific Errors ```bash GET /api/v1/logs/errors/usermanagement?limit=50 ``` ## Grafana Integration ### Setup Grafana Data Source 1. **Add Data Source in Grafana:** - Type: JSON API - URL: http://localhost:5004/api/v1/grafana - Access: Server (default) 2. **Query Endpoints:** - Search: `/api/v1/grafana/search` - Query: `/api/v1/grafana/query` - Annotations: `/api/v1/grafana/annotations` ### Example Grafana Queries **Error Count:** ```json { "target": "error_count" } ``` **Errors by Service:** ```json { "target": "error_by_service" } ``` **Errors by Level:** ```json { "target": "error_by_level" } ``` ## How It Works ### 1. Log Collection Each microservice automatically sends logs to Kafka: ```python # In service __init__.py from common.pyportal_common.logging_handlers.base_logger import LogMonitor # Logger automatically configured with Kafka handler logger = LogMonitor("usermanagement").logger # All log calls go to Kafka logger.info("User created") logger.error("Database connection failed") # Goes to error topic ``` ### 2. Error Filtering Log Monitor Service consumes from Kafka: ```python # Consumes from: # - application-logs-errors (pre-filtered) # - application-logs (filters for ERROR/CRITICAL) # Stores errors in ErrorLogStore error_store.add_error(log_data) ``` ### 3. Dashboard Display Dashboard polls API every 5 seconds: ```javascript // Auto-refresh setInterval(() => { fetch('/api/v1/logs/errors?limit=50') .then(res => res.json()) .then(data => updateErrors(data.errors)); }, 5000); ``` ## Configuration ### Environment Variables **Log Monitor Service:** ```bash LOG_MONITOR_SERVER_PORT=9094 KAFKA_BOOTSTRAP_SERVERS=kafka:29092 SERVICE_NAME=logmonitor ``` **Microservices (for Kafka logging):** ```bash KAFKA_BOOTSTRAP_SERVERS=kafka:29092 SERVICE_NAME=usermanagement # or booking, notification, etc. HOSTNAME=usermanagement-service ``` ## Features ### Real-Time Monitoring - Logs streamed to Kafka in real-time - Dashboard auto-refreshes every 5 seconds - No polling delays ### Error Filtering - Automatic filtering of ERROR and CRITICAL logs - Separate Kafka topic for errors - Efficient processing ### Multi-Service Support - Collects logs from all microservices - Service identification in logs - Per-service error statistics ### Grafana Ready - Compatible with Grafana JSON API data source - Time series data support - Annotation support for events ### Production Ready - Error handling and resilience - Connection pooling - Graceful degradation ## Extending the System ### Store Errors in Database Replace in-memory store with database: ```python # In kafka_consumer.py from app.models.error_log import ErrorLog def add_error_to_db(error_log): error = ErrorLog( timestamp=error_log['timestamp'], level=error_log['level'], service=error_log['service'], message=error_log['message'] ) db.session.add(error) db.session.commit() ``` ### Add Alerting ```python # In kafka_consumer.py def check_and_alert(error_log): if error_log['level'] == 'CRITICAL': send_alert_email(error_log) send_slack_notification(error_log) ``` ### Add Log Retention ```python # Clean old errors def cleanup_old_errors(): cutoff = datetime.utcnow() - timedelta(days=7) ErrorLog.query.filter(ErrorLog.timestamp < cutoff).delete() db.session.commit() ``` ## Monitoring Best Practices 1. **Set Appropriate Log Levels** - Use ERROR for recoverable errors - Use CRITICAL for system failures 2. **Include Context** - Service name - Request ID - User ID (if applicable) 3. **Monitor Dashboard Regularly** - Check for error spikes - Identify problematic services - Track error trends 4. **Set Up Alerts** - Critical error threshold - Error rate threshold - Service-specific alerts ## Troubleshooting ### Logs Not Appearing 1. Check Kafka connection: ```bash docker-compose exec kafka kafka-topics --list --bootstrap-server localhost:9092 ``` 2. Check log monitor service: ```bash docker-compose logs logmonitor-service ``` 3. Verify environment variables: ```bash docker-compose exec usermanagement-service env | grep KAFKA ``` ### Dashboard Not Loading 1. Check service is running: ```bash docker-compose ps logmonitor-service ``` 2. Check API endpoint: ```bash curl http://localhost:5004/api/v1/logs/stats ``` ## Performance Considerations - **Kafka Topics**: Separate topics for errors improve filtering - **Batch Processing**: Consider batching log writes - **Storage**: In-memory store is fast but limited; use DB for production - **Rate Limiting**: Monitor Kafka throughput ## Security - **Authentication**: Add authentication to API endpoints - **Authorization**: Restrict dashboard access - **Encryption**: Use TLS for Kafka in production - **Log Sanitization**: Remove sensitive data from logs ## Next Steps 1. System is implemented and ready 2. Add database persistence for errors 3. Implement alerting (email/Slack) 4. Add log retention policies 5. Set up Grafana dashboards 6. Add authentication to dashboard

docs/redis_integration.md

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Each service uses a separate Redis database to avoid key conflicts:
2525
| Service | Redis DB | Usage |
2626
|---------|----------|-------|
2727
| User Management | 0 | User cache, sessions, rate limiting |
28-
| Booking | 1 | Booking cache, flight availability |
28+
| Task Processing | 1 | Task cache, task data (legacy: BookingRedisHelper) |
2929
| Notification | 2 | Notification cache, delivery status |
3030

3131
## Usage Examples
@@ -77,28 +77,27 @@ is_allowed, remaining = redis_helper.check_rate_limit(
7777
)
7878
```
7979

80-
### Booking Service
80+
### Task Processing Service
8181

8282
```python
83-
from app.redis_helper import BookingRedisHelper
83+
from app.redis_helper import BookingRedisHelper # Legacy class name, used for task processing
8484

8585
# Initialize helper
8686
redis_helper = BookingRedisHelper()
8787

88-
# Cache booking
89-
booking_data = {
90-
'bookingId': 456,
88+
# Cache task data
89+
task_data = {
90+
'taskId': 456,
9191
'userId': 123,
92-
'flightId': 789,
93-
'numberOfSeats': 2,
94-
'status': 'confirmed'
92+
'status': 'processing',
93+
'details': {...}
9594
}
96-
redis_helper.cache_booking(456, booking_data, ttl=3600)
95+
redis_helper.cache_booking(456, task_data, ttl=3600) # Legacy method name
9796

98-
# Get cached booking
99-
cached_booking = redis_helper.get_cached_booking(456)
97+
# Get cached task
98+
cached_task = redis_helper.get_cached_booking(456) # Legacy method name
10099

101-
# Cache flight availability
100+
# Cache task-related data
102101
redis_helper.cache_flight_availability(
103102
flight_id=789,
104103
available_seats=10,
@@ -152,7 +151,7 @@ REDIS_PASSWORD= # Redis password (optional)
152151
Each service uses its own Redis database:
153152

154153
- **User Management**: `REDIS_DB=0`
155-
- **Booking**: `REDIS_DB=1`
154+
- **Task Processing**: `REDIS_DB=1`
156155
- **Notification**: `REDIS_DB=2`
157156

158157
## Caching Strategies
@@ -290,8 +289,8 @@ Use consistent key naming for easier management:
290289
- **Users**: `user:{user_id}`
291290
- **User Lookup**: `user:lookup:username:{username}`, `user:lookup:email:{email}`
292291
- **Sessions**: `session:{session_id}`
293-
- **Bookings**: `booking:{booking_id}`
294-
- **User Bookings**: `user:bookings:{user_id}`
292+
- **Tasks**: `booking:{task_id}` (legacy key pattern, used for task processing)
293+
- **User Tasks**: `user:bookings:{user_id}` (legacy key pattern)
295294
- **Flight Availability**: `flight:availability:{flight_id}`
296295
- **Rate Limits**: `rate_limit:{service}:{identifier}`
297296

0 commit comments

Comments
 (0)