Skip to content

Latest commit

 

History

History
322 lines (245 loc) · 7.12 KB

File metadata and controls

322 lines (245 loc) · 7.12 KB

Performance Optimization Guide

Overview

Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:

  • Sub-second response times for cached requests
  • 100+ concurrent generations supported
  • Memory efficient data handling (< 400MB)
  • 50%+ cache hit rate for typical workloads

Performance Targets

Metric Target Notes
P99 Latency < 1000ms For cached requests < 100ms
Throughput > 10 req/s Scales with concurrency
Memory Usage < 400MB With 1000-item cache
Cache Hit Rate > 50% Depends on workload patterns
Error Rate < 1% With retry logic

Optimization Strategies

1. Context Caching

Configuration:

const synth = new AgenticSynth({
  enableCache: true,
  cacheSize: 1000,      // Adjust based on memory
  cacheTTL: 3600000,    // 1 hour in milliseconds
});

Benefits:

  • Reduces API calls by 50-80%
  • Sub-100ms latency for cache hits
  • Automatic LRU eviction

Best Practices:

  • Use consistent prompts for better cache hits
  • Increase cache size for repetitive workloads
  • Monitor cache hit rate with synth.getMetrics()

2. Model Routing

Configuration:

const synth = new AgenticSynth({
  modelPreference: [
    'claude-sonnet-4-5-20250929',
    'claude-3-5-sonnet-20241022'
  ],
});

Features:

  • Automatic load balancing
  • Performance-based routing
  • Error handling and fallback

3. Concurrent Generation

Configuration:

const synth = new AgenticSynth({
  maxConcurrency: 100,  // Adjust based on API limits
});

Usage:

const prompts = [...]; // 100+ prompts
const results = await synth.generateBatch(prompts, {
  maxTokens: 500
});

Performance:

  • 2-3x faster than sequential
  • Respects concurrency limits
  • Automatic batching

4. Memory Management

Configuration:

const synth = new AgenticSynth({
  memoryLimit: 512 * 1024 * 1024,  // 512MB
});

Features:

  • Automatic memory tracking
  • LRU eviction when over limit
  • Periodic cleanup with synth.optimize()

5. Streaming for Large Outputs

Usage:

const stream = synth.generateStream(prompt, {
  maxTokens: 4096
});

for await (const chunk of stream) {
  // Process chunk immediately
  processChunk(chunk);
}

Benefits:

  • Lower time-to-first-byte
  • Reduced memory usage
  • Better user experience

Benchmarking

Running Benchmarks

# Run all benchmarks
npm run benchmark

# Run specific suite
npm run benchmark -- --suite "Throughput Test"

# With custom settings
npm run benchmark -- --iterations 20 --concurrency 200

# Generate report
npm run benchmark -- --output benchmarks/report.md

Benchmark Suites

  1. Throughput Test: Measures requests per second
  2. Latency Test: Measures P50/P95/P99 latencies
  3. Memory Test: Measures memory usage and leaks
  4. Cache Test: Measures cache effectiveness
  5. Concurrency Test: Tests concurrent request handling
  6. Streaming Test: Measures streaming performance

Analyzing Results

# Analyze performance
npm run perf:analyze

# Generate detailed report
npm run perf:report

Bottleneck Detection

The built-in bottleneck analyzer automatically detects:

1. Latency Bottlenecks

  • Cause: Slow API responses, network issues
  • Solution: Increase cache size, optimize prompts
  • Impact: 30-50% latency reduction

2. Throughput Bottlenecks

  • Cause: Low concurrency, sequential processing
  • Solution: Increase maxConcurrency, use batch API
  • Impact: 2-3x throughput increase

3. Memory Bottlenecks

  • Cause: Large cache, memory leaks
  • Solution: Reduce cache size, call optimize()
  • Impact: 40-60% memory reduction

4. Cache Bottlenecks

  • Cause: Low hit rate, small cache
  • Solution: Increase cache size, optimize keys
  • Impact: 20-40% cache improvement

CI/CD Integration

Performance Regression Detection

# Run in CI
npm run benchmark:ci

Features:

  • Automatic threshold checking
  • Fails build on regression
  • Generates reports for artifacts

GitHub Actions Example

- name: Performance Benchmarks
  run: npm run benchmark:ci

- name: Upload Report
  uses: actions/upload-artifact@v3
  with:
    name: performance-report
    path: benchmarks/performance-report.md

Profiling

CPU Profiling

npm run benchmark:profile
node --prof-process isolate-*.log > profile.txt

Memory Profiling

node --expose-gc --max-old-space-size=512 dist/benchmarks/runner.js

Chrome DevTools

node --inspect-brk dist/benchmarks/runner.js
# Open chrome://inspect

Optimization Checklist

  • Enable caching for repetitive workloads
  • Set appropriate cache size (1000+ items)
  • Configure concurrency based on API limits
  • Use batch API for multiple generations
  • Implement streaming for large outputs
  • Monitor memory usage regularly
  • Run benchmarks before releases
  • Set up CI/CD performance tests
  • Profile bottlenecks periodically
  • Optimize prompt patterns for cache hits

Performance Monitoring

Runtime Metrics

// Get current metrics
const metrics = synth.getMetrics();
console.log('Cache:', metrics.cache);
console.log('Memory:', metrics.memory);
console.log('Router:', metrics.router);

Performance Monitor

import { PerformanceMonitor } from '@ruvector/agentic-synth';

const monitor = new PerformanceMonitor();
monitor.start();

// ... run workload ...

const metrics = monitor.getMetrics();
console.log('Throughput:', metrics.throughput);
console.log('P99 Latency:', metrics.p99LatencyMs);

Bottleneck Analysis

import { BottleneckAnalyzer } from '@ruvector/agentic-synth';

const analyzer = new BottleneckAnalyzer();
const report = analyzer.analyze(metrics);

if (report.detected) {
  console.log('Bottlenecks:', report.bottlenecks);
  console.log('Recommendations:', report.recommendations);
}

Best Practices

  1. Cache Strategy: Use prompts as cache keys, normalize formatting
  2. Concurrency: Start with 100, increase based on API limits
  3. Memory: Monitor with getMetrics(), call optimize() periodically
  4. Streaming: Use for outputs > 1000 tokens
  5. Benchmarking: Run before releases, track trends
  6. Monitoring: Enable in production, set up alerts
  7. Optimization: Profile first, optimize bottlenecks
  8. Testing: Include performance tests in CI/CD

Troubleshooting

High Latency

  • Check cache hit rate
  • Increase cache size
  • Optimize prompt patterns
  • Check network connectivity

Low Throughput

  • Increase maxConcurrency
  • Use batch API
  • Reduce maxTokens
  • Check API rate limits

High Memory Usage

  • Reduce cache size
  • Call optimize() regularly
  • Use streaming for large outputs
  • Check for memory leaks

Low Cache Hit Rate

  • Normalize prompt formatting
  • Increase cache size
  • Increase TTL
  • Review workload patterns

Additional Resources