Caching and Performance Monitoring

This document describes the caching layer and performance monitoring system implemented in the monitoring dashboard.

Caching Strategy
Performance Monitoring
API Reference
Best Practices
Troubleshooting

Caching Strategy

Overview

The application uses a dual-mode caching system:

Redis (production): High-performance, distributed caching
In-Memory (fallback): Automatic fallback when Redis is unavailable

Cache Service

Location: lib/cache/cache-service.ts

Features:

Automatic TTL (Time To Live) management
Tag-based invalidation for bulk operations
Hit rate statistics and performance metrics
Graceful degradation (Redis → in-memory)
Thread-safe singleton pattern

Usage Examples

Basic Operations

import { CacheService, CacheKeys } from '@/lib/cache/cache-service';

// Get value from cache
const value = await CacheService.get<MyType>('my-key');

// Set value with TTL
await CacheService.set('my-key', myData, {
  ttl: 300, // 5 minutes
  tags: ['metrics', 'overview']
});

// Delete specific key
await CacheService.delete('my-key');

// Get or compute pattern
const data = await CacheService.getOrSet(
  'expensive-operation',
  async () => {
    // Expensive computation or API call
    return await fetchData();
  },
  { ttl: 600, tags: ['api-data'] }
);

Tag-Based Invalidation

// Invalidate all metrics caches
await CacheService.invalidateByTag('metrics');

// Invalidate specific category
await CacheService.invalidateByTag('sentry');

// Clear all cache
await CacheService.clear();

Cache Statistics

const stats = CacheService.getStats();
console.log(`Hit rate: ${stats.hitRate.toFixed(2)}%`);
console.log(`Total hits: ${stats.hits}`);
console.log(`Total misses: ${stats.misses}`);

Predefined Cache Keys

Location: lib/cache/cache-service.ts (CacheKeys object)

// Metrics cache keys
CacheKeys.metrics.overview()                 // 'metrics:overview'
CacheKeys.metrics.sentry('7d')              // 'metrics:sentry:7d'
CacheKeys.metrics.cicd('24h')               // 'metrics:cicd:24h'
CacheKeys.metrics.systemHealth()            // 'metrics:system-health'

// Reports cache keys
CacheKeys.reports.history(1, 'all')         // 'reports:history:1:all'
CacheKeys.reports.scheduled()               // 'reports:scheduled'
CacheKeys.reports.report('report-123')      // 'reports:report-123'

// User cache keys
CacheKeys.user.notifications('user-id')     // 'user:user-id:notifications'
CacheKeys.user.preferences('user-id')       // 'user:user-id:preferences'

Cache TTL Guidelines

Data Type	TTL	Reasoning
Overview metrics	2 minutes	Balance between freshness and API load
Sentry data	3 minutes	External API rate limiting
CI/CD data	2 minutes	Build status changes frequently
Report history	5 minutes	Changes infrequently
User notifications	30 seconds	High priority for real-time feel
System health	1 minute	Important for monitoring

Configuration

Redis Configuration

# .env
REDIS_URL=redis://localhost:6379
# or for Redis Cloud
REDIS_URL=rediss://username:password@host:port

In-Memory Fallback

When Redis is not configured or unavailable, the system automatically falls back to an in-memory cache with:

Automatic cleanup every 5 minutes
Same TTL support
Limited to single server instance

Performance Monitoring

Overview

Location: lib/monitoring/performance.ts

The performance monitoring service tracks:

API response times
Database query performance
External API call durations
Success/failure rates

Usage Examples

Automatic Timing

import { PerformanceMonitor } from '@/lib/monitoring/performance';

export async function GET() {
  const endTiming = PerformanceMonitor.start('api:my-endpoint');

  try {
    const data = await fetchData();
    endTiming({ endpoint: 'my-endpoint' }, { success: true });
    return NextResponse.json(data);
  } catch (error) {
    endTiming({ endpoint: 'my-endpoint' }, { success: false, error: String(error) });
    throw error;
  }
}

Helper Functions

import { measureAsync, measureSync } from '@/lib/monitoring/performance';

// Measure async operations
const result = await measureAsync(
  'database:query:users',
  async () => await prisma.user.findMany(),
  { operation: 'findMany' }
);

// Measure sync operations
const computed = measureSync(
  'compute:complex-calculation',
  () => performCalculation(),
  { input: 'large-dataset' }
);

Performance Statistics

// Get overall stats
const stats = PerformanceMonitor.getStats();

// Get stats for specific operation
const apiStats = PerformanceMonitor.getStats('api:metrics:overview');

console.log(`Total requests: ${stats.totalRequests}`);
console.log(`Average duration: ${stats.averageDuration.toFixed(2)}ms`);
console.log(`P95: ${stats.p95.toFixed(2)}ms`);
console.log(`P99: ${stats.p99.toFixed(2)}ms`);

Slow Request Alerts

Requests taking longer than 2 seconds automatically log a warning:

⚠️ Slow request detected: api:metrics:sentry took 2341.52ms
{
  tags: { endpoint: 'sentry' },
  metadata: { success: true }
}

API Reference

Cache Management API

Get Cache Statistics

GET /api/cache

Response:

{
  "stats": {
    "hits": 1250,
    "misses": 180,
    "sets": 95,
    "deletes": 12,
    "hitRate": 87.41
  },
  "uptime": 3600,
  "nodeVersion": "v20.10.0"
}

Invalidate Cache by Tag

DELETE /api/cache?tag=metrics

Response:

{
  "success": true,
  "message": "Cache invalidated for tag: metrics"
}

Clear All Cache

DELETE /api/cache?all=true

Response:

{
  "success": true,
  "message": "All cache cleared"
}

Reset Cache Statistics

POST /api/cache/reset-stats

Performance Monitoring API

Get Performance Statistics

GET /api/monitoring/performance
GET /api/monitoring/performance?name=api:metrics:overview

Response:

{
  "stats": {
    "totalRequests": 342,
    "averageDuration": 125.43,
    "p50": 98.21,
    "p95": 456.78,
    "p99": 892.34,
    "slowestRequests": [
      {
        "name": "api:metrics:sentry",
        "duration": 2341.52,
        "timestamp": "2025-01-16T12:34:56.789Z",
        "tags": { "endpoint": "sentry" },
        "metadata": { "success": true }
      }
    ]
  },
  "timestamp": "2025-01-16T12:45:00.000Z"
}

Clear Performance Metrics

DELETE /api/monitoring/performance

Best Practices

Caching Best Practices

Use appropriate TTLs
- Short TTL (30s-1m): High-priority, frequently changing data
- Medium TTL (2-5m): Dashboard metrics, API aggregations
- Long TTL (10-60m): Configuration, rarely changing data

Always use tags

await CacheService.set(key, data, {
  ttl: 300,
  tags: ['category', 'subcategory'] // Enables bulk invalidation
});

Use predefined cache keys

// ✅ Good: Use CacheKeys helpers
const key = CacheKeys.metrics.sentry('7d');

// ❌ Bad: Manual key construction
const key = 'metrics:sentry:7d';

Handle cache misses gracefully

const data = await CacheService.getOrSet(
  key,
  async () => {
    // Fallback computation
    return await fetchFromDatabase();
  },
  options
);

Invalidate on data changes

// After updating data
await updateDatabase();
await CacheService.invalidateByTag('metrics');

Performance Monitoring Best Practices

Name operations consistently

// Pattern: category:resource:action
PerformanceMonitor.start('api:metrics:overview');
PerformanceMonitor.start('database:user:create');
PerformanceMonitor.start('external:sentry:fetch');

Always include tags

endTiming(
  { endpoint: 'overview', method: 'GET' },
  { success: true, cached: false }
);

Monitor critical paths
- API endpoints
- Database queries
- External API calls
- Complex computations
Set performance budgets
- API responses: < 200ms (P95)
- Database queries: < 100ms (P95)
- External APIs: < 1000ms (P95)

Troubleshooting

Cache Issues

Redis Connection Failures

Symptom: Warning in logs: "⚠️ Redis not available, using in-memory cache"

Solution:

Verify Redis is running: redis-cli ping
Check REDIS_URL in .env
Verify network connectivity
Check Redis logs for errors

Temporary Workaround: Application continues with in-memory cache

Low Hit Rate

Symptom: Cache hit rate < 50%

Possible Causes:

TTL too short for data access patterns
Frequent cache invalidations
High traffic with cold cache
Incorrect cache key generation

Solutions:

Analyze access patterns: GET /api/cache
Increase TTL for stable data
Pre-warm cache on deployment
Review invalidation strategy

Memory Issues (In-Memory Mode)

Symptom: High memory usage in development

Solution:

// Manually trigger cleanup
CacheService.cleanup();

// Or adjust max entries in cache-service.ts
private cache: Map<string, { value: any; expires: number; tags: string[] }> = new Map();

Performance Issues

Slow Requests Not Logged

Symptom: Slow requests not appearing in warnings

Cause: Threshold set to 2000ms

Solution: Adjust threshold in lib/monitoring/performance.ts:

// Log slow requests (> 2 seconds)
if (metric.duration > 2000) {

Missing Metrics

Symptom: GET /api/monitoring/performance returns zero requests

Cause: Metrics not persisted across server restarts

Solution: Metrics are in-memory only. For persistent metrics, integrate with:

Prometheus
Datadog
New Relic
CloudWatch

High P99 Latency

Investigation Steps:

Check slowest requests: GET /api/monitoring/performance
Identify patterns in tags/metadata
Review external API performance
Check database query performance
Verify cache effectiveness

Integration Examples

Integrating Cache in New API Route

import { NextResponse } from 'next/server';
import { CacheService, CacheKeys } from '@/lib/cache/cache-service';
import { PerformanceMonitor } from '@/lib/monitoring/performance';

export async function GET() {
  const endTiming = PerformanceMonitor.start('api:my-route');

  try {
    const data = await CacheService.getOrSet(
      'my-custom-key',
      async () => {
        // Expensive operation
        const result = await fetchExpensiveData();
        return result;
      },
      { ttl: 300, tags: ['my-category'] }
    );

    endTiming({ endpoint: 'my-route' }, { success: true });
    return NextResponse.json(data);
  } catch (error) {
    endTiming({ endpoint: 'my-route' }, { success: false });
    throw error;
  }
}

Cache Warming Strategy

// On application startup or deployment
async function warmCache() {
  console.log('Warming cache...');

  await Promise.all([
    // Pre-fetch commonly accessed data
    CacheService.getOrSet(
      CacheKeys.metrics.overview(),
      async () => await fetchOverviewMetrics(),
      { ttl: 120 }
    ),

    CacheService.getOrSet(
      CacheKeys.metrics.systemHealth(),
      async () => await fetchSystemHealth(),
      { ttl: 60 }
    ),
  ]);

  console.log('✅ Cache warmed');
}

Scheduled Cache Invalidation

// Invalidate cache on a schedule (e.g., every hour)
setInterval(async () => {
  console.log('Scheduled cache invalidation...');
  await CacheService.invalidateByTag('metrics');
}, 60 * 60 * 1000); // 1 hour

Future Enhancements

Planned Features

Distributed Cache Invalidation
- Redis Pub/Sub for multi-instance cache invalidation
- Webhook-based invalidation from external systems
Advanced Performance Analytics
- Percentile histograms
- Time-series performance data
- Anomaly detection
Cache Warming
- Automatic cache pre-loading on deployment
- Predictive cache warming based on access patterns
Performance Budgets
- Automated alerts when budgets exceeded
- CI/CD integration for performance regression detection
Observability Integration
- Prometheus metrics export
- OpenTelemetry traces
- Datadog/New Relic integration

Last Updated: 2025-01-16 Version: 1.0.0

FilesExpand file tree

CACHING_AND_PERFORMANCE.md

Latest commit

History

CACHING_AND_PERFORMANCE.md

File metadata and controls

Caching and Performance Monitoring

Table of Contents

Caching Strategy

Overview

Cache Service

Usage Examples

Basic Operations

Tag-Based Invalidation

Cache Statistics

Predefined Cache Keys

Cache TTL Guidelines

Configuration

Redis Configuration

In-Memory Fallback

Performance Monitoring

Overview

Usage Examples

Automatic Timing

Helper Functions

Performance Statistics

Slow Request Alerts

API Reference

Cache Management API

Get Cache Statistics

Invalidate Cache by Tag

Clear All Cache

Reset Cache Statistics

Performance Monitoring API

Get Performance Statistics

Clear Performance Metrics

Best Practices

Caching Best Practices

Performance Monitoring Best Practices

Troubleshooting

Cache Issues

Redis Connection Failures

Low Hit Rate

Memory Issues (In-Memory Mode)

Performance Issues

Slow Requests Not Logged

Missing Metrics

High P99 Latency

Integration Examples

Integrating Cache in New API Route

Cache Warming Strategy

Scheduled Cache Invalidation

Future Enhancements

Planned Features