This document describes the caching layer and performance monitoring system implemented in the monitoring dashboard.
The application uses a dual-mode caching system:
- Redis (production): High-performance, distributed caching
- In-Memory (fallback): Automatic fallback when Redis is unavailable
Location: lib/cache/cache-service.ts
Features:
- Automatic TTL (Time To Live) management
- Tag-based invalidation for bulk operations
- Hit rate statistics and performance metrics
- Graceful degradation (Redis → in-memory)
- Thread-safe singleton pattern
import { CacheService, CacheKeys } from '@/lib/cache/cache-service';
// Get value from cache
const value = await CacheService.get<MyType>('my-key');
// Set value with TTL
await CacheService.set('my-key', myData, {
ttl: 300, // 5 minutes
tags: ['metrics', 'overview']
});
// Delete specific key
await CacheService.delete('my-key');
// Get or compute pattern
const data = await CacheService.getOrSet(
'expensive-operation',
async () => {
// Expensive computation or API call
return await fetchData();
},
{ ttl: 600, tags: ['api-data'] }
);// Invalidate all metrics caches
await CacheService.invalidateByTag('metrics');
// Invalidate specific category
await CacheService.invalidateByTag('sentry');
// Clear all cache
await CacheService.clear();const stats = CacheService.getStats();
console.log(`Hit rate: ${stats.hitRate.toFixed(2)}%`);
console.log(`Total hits: ${stats.hits}`);
console.log(`Total misses: ${stats.misses}`);Location: lib/cache/cache-service.ts (CacheKeys object)
// Metrics cache keys
CacheKeys.metrics.overview() // 'metrics:overview'
CacheKeys.metrics.sentry('7d') // 'metrics:sentry:7d'
CacheKeys.metrics.cicd('24h') // 'metrics:cicd:24h'
CacheKeys.metrics.systemHealth() // 'metrics:system-health'
// Reports cache keys
CacheKeys.reports.history(1, 'all') // 'reports:history:1:all'
CacheKeys.reports.scheduled() // 'reports:scheduled'
CacheKeys.reports.report('report-123') // 'reports:report-123'
// User cache keys
CacheKeys.user.notifications('user-id') // 'user:user-id:notifications'
CacheKeys.user.preferences('user-id') // 'user:user-id:preferences'| Data Type | TTL | Reasoning |
|---|---|---|
| Overview metrics | 2 minutes | Balance between freshness and API load |
| Sentry data | 3 minutes | External API rate limiting |
| CI/CD data | 2 minutes | Build status changes frequently |
| Report history | 5 minutes | Changes infrequently |
| User notifications | 30 seconds | High priority for real-time feel |
| System health | 1 minute | Important for monitoring |
# .env
REDIS_URL=redis://localhost:6379
# or for Redis Cloud
REDIS_URL=rediss://username:password@host:portWhen Redis is not configured or unavailable, the system automatically falls back to an in-memory cache with:
- Automatic cleanup every 5 minutes
- Same TTL support
- Limited to single server instance
Location: lib/monitoring/performance.ts
The performance monitoring service tracks:
- API response times
- Database query performance
- External API call durations
- Success/failure rates
import { PerformanceMonitor } from '@/lib/monitoring/performance';
export async function GET() {
const endTiming = PerformanceMonitor.start('api:my-endpoint');
try {
const data = await fetchData();
endTiming({ endpoint: 'my-endpoint' }, { success: true });
return NextResponse.json(data);
} catch (error) {
endTiming({ endpoint: 'my-endpoint' }, { success: false, error: String(error) });
throw error;
}
}import { measureAsync, measureSync } from '@/lib/monitoring/performance';
// Measure async operations
const result = await measureAsync(
'database:query:users',
async () => await prisma.user.findMany(),
{ operation: 'findMany' }
);
// Measure sync operations
const computed = measureSync(
'compute:complex-calculation',
() => performCalculation(),
{ input: 'large-dataset' }
);// Get overall stats
const stats = PerformanceMonitor.getStats();
// Get stats for specific operation
const apiStats = PerformanceMonitor.getStats('api:metrics:overview');
console.log(`Total requests: ${stats.totalRequests}`);
console.log(`Average duration: ${stats.averageDuration.toFixed(2)}ms`);
console.log(`P95: ${stats.p95.toFixed(2)}ms`);
console.log(`P99: ${stats.p99.toFixed(2)}ms`);Requests taking longer than 2 seconds automatically log a warning:
⚠️ Slow request detected: api:metrics:sentry took 2341.52ms
{
tags: { endpoint: 'sentry' },
metadata: { success: true }
}
GET /api/cacheResponse:
{
"stats": {
"hits": 1250,
"misses": 180,
"sets": 95,
"deletes": 12,
"hitRate": 87.41
},
"uptime": 3600,
"nodeVersion": "v20.10.0"
}DELETE /api/cache?tag=metricsResponse:
{
"success": true,
"message": "Cache invalidated for tag: metrics"
}DELETE /api/cache?all=trueResponse:
{
"success": true,
"message": "All cache cleared"
}POST /api/cache/reset-statsGET /api/monitoring/performance
GET /api/monitoring/performance?name=api:metrics:overviewResponse:
{
"stats": {
"totalRequests": 342,
"averageDuration": 125.43,
"p50": 98.21,
"p95": 456.78,
"p99": 892.34,
"slowestRequests": [
{
"name": "api:metrics:sentry",
"duration": 2341.52,
"timestamp": "2025-01-16T12:34:56.789Z",
"tags": { "endpoint": "sentry" },
"metadata": { "success": true }
}
]
},
"timestamp": "2025-01-16T12:45:00.000Z"
}DELETE /api/monitoring/performance-
Use appropriate TTLs
- Short TTL (30s-1m): High-priority, frequently changing data
- Medium TTL (2-5m): Dashboard metrics, API aggregations
- Long TTL (10-60m): Configuration, rarely changing data
-
Always use tags
await CacheService.set(key, data, { ttl: 300, tags: ['category', 'subcategory'] // Enables bulk invalidation });
-
Use predefined cache keys
// ✅ Good: Use CacheKeys helpers const key = CacheKeys.metrics.sentry('7d'); // ❌ Bad: Manual key construction const key = 'metrics:sentry:7d';
-
Handle cache misses gracefully
const data = await CacheService.getOrSet( key, async () => { // Fallback computation return await fetchFromDatabase(); }, options );
-
Invalidate on data changes
// After updating data await updateDatabase(); await CacheService.invalidateByTag('metrics');
-
Name operations consistently
// Pattern: category:resource:action PerformanceMonitor.start('api:metrics:overview'); PerformanceMonitor.start('database:user:create'); PerformanceMonitor.start('external:sentry:fetch');
-
Always include tags
endTiming( { endpoint: 'overview', method: 'GET' }, { success: true, cached: false } );
-
Monitor critical paths
- API endpoints
- Database queries
- External API calls
- Complex computations
-
Set performance budgets
- API responses: < 200ms (P95)
- Database queries: < 100ms (P95)
- External APIs: < 1000ms (P95)
Symptom: Warning in logs: "
Solution:
- Verify Redis is running:
redis-cli ping - Check
REDIS_URLin.env - Verify network connectivity
- Check Redis logs for errors
Temporary Workaround: Application continues with in-memory cache
Symptom: Cache hit rate < 50%
Possible Causes:
- TTL too short for data access patterns
- Frequent cache invalidations
- High traffic with cold cache
- Incorrect cache key generation
Solutions:
- Analyze access patterns:
GET /api/cache - Increase TTL for stable data
- Pre-warm cache on deployment
- Review invalidation strategy
Symptom: High memory usage in development
Solution:
// Manually trigger cleanup
CacheService.cleanup();
// Or adjust max entries in cache-service.ts
private cache: Map<string, { value: any; expires: number; tags: string[] }> = new Map();Symptom: Slow requests not appearing in warnings
Cause: Threshold set to 2000ms
Solution: Adjust threshold in lib/monitoring/performance.ts:
// Log slow requests (> 2 seconds)
if (metric.duration > 2000) {Symptom: GET /api/monitoring/performance returns zero requests
Cause: Metrics not persisted across server restarts
Solution: Metrics are in-memory only. For persistent metrics, integrate with:
- Prometheus
- Datadog
- New Relic
- CloudWatch
Investigation Steps:
- Check slowest requests:
GET /api/monitoring/performance - Identify patterns in tags/metadata
- Review external API performance
- Check database query performance
- Verify cache effectiveness
import { NextResponse } from 'next/server';
import { CacheService, CacheKeys } from '@/lib/cache/cache-service';
import { PerformanceMonitor } from '@/lib/monitoring/performance';
export async function GET() {
const endTiming = PerformanceMonitor.start('api:my-route');
try {
const data = await CacheService.getOrSet(
'my-custom-key',
async () => {
// Expensive operation
const result = await fetchExpensiveData();
return result;
},
{ ttl: 300, tags: ['my-category'] }
);
endTiming({ endpoint: 'my-route' }, { success: true });
return NextResponse.json(data);
} catch (error) {
endTiming({ endpoint: 'my-route' }, { success: false });
throw error;
}
}// On application startup or deployment
async function warmCache() {
console.log('Warming cache...');
await Promise.all([
// Pre-fetch commonly accessed data
CacheService.getOrSet(
CacheKeys.metrics.overview(),
async () => await fetchOverviewMetrics(),
{ ttl: 120 }
),
CacheService.getOrSet(
CacheKeys.metrics.systemHealth(),
async () => await fetchSystemHealth(),
{ ttl: 60 }
),
]);
console.log('✅ Cache warmed');
}// Invalidate cache on a schedule (e.g., every hour)
setInterval(async () => {
console.log('Scheduled cache invalidation...');
await CacheService.invalidateByTag('metrics');
}, 60 * 60 * 1000); // 1 hour-
Distributed Cache Invalidation
- Redis Pub/Sub for multi-instance cache invalidation
- Webhook-based invalidation from external systems
-
Advanced Performance Analytics
- Percentile histograms
- Time-series performance data
- Anomaly detection
-
Cache Warming
- Automatic cache pre-loading on deployment
- Predictive cache warming based on access patterns
-
Performance Budgets
- Automated alerts when budgets exceeded
- CI/CD integration for performance regression detection
-
Observability Integration
- Prometheus metrics export
- OpenTelemetry traces
- Datadog/New Relic integration
Last Updated: 2025-01-16 Version: 1.0.0