Agentic-Synth is optimized for high-performance synthetic data generation with the following targets:
- Sub-second response times for cached requests
- 100+ concurrent generations supported
- Memory efficient data handling (< 400MB)
- 50%+ cache hit rate for typical workloads
| Metric | Target | Notes |
|---|---|---|
| P99 Latency | < 1000ms | For cached requests < 100ms |
| Throughput | > 10 req/s | Scales with concurrency |
| Memory Usage | < 400MB | With 1000-item cache |
| Cache Hit Rate | > 50% | Depends on workload patterns |
| Error Rate | < 1% | With retry logic |
Configuration:
const synth = new AgenticSynth({
enableCache: true,
cacheSize: 1000, // Adjust based on memory
cacheTTL: 3600000, // 1 hour in milliseconds
});Benefits:
- Reduces API calls by 50-80%
- Sub-100ms latency for cache hits
- Automatic LRU eviction
Best Practices:
- Use consistent prompts for better cache hits
- Increase cache size for repetitive workloads
- Monitor cache hit rate with
synth.getMetrics()
Configuration:
const synth = new AgenticSynth({
modelPreference: [
'claude-sonnet-4-5-20250929',
'claude-3-5-sonnet-20241022'
],
});Features:
- Automatic load balancing
- Performance-based routing
- Error handling and fallback
Configuration:
const synth = new AgenticSynth({
maxConcurrency: 100, // Adjust based on API limits
});Usage:
const prompts = [...]; // 100+ prompts
const results = await synth.generateBatch(prompts, {
maxTokens: 500
});Performance:
- 2-3x faster than sequential
- Respects concurrency limits
- Automatic batching
Configuration:
const synth = new AgenticSynth({
memoryLimit: 512 * 1024 * 1024, // 512MB
});Features:
- Automatic memory tracking
- LRU eviction when over limit
- Periodic cleanup with
synth.optimize()
Usage:
const stream = synth.generateStream(prompt, {
maxTokens: 4096
});
for await (const chunk of stream) {
// Process chunk immediately
processChunk(chunk);
}Benefits:
- Lower time-to-first-byte
- Reduced memory usage
- Better user experience
# Run all benchmarks
npm run benchmark
# Run specific suite
npm run benchmark -- --suite "Throughput Test"
# With custom settings
npm run benchmark -- --iterations 20 --concurrency 200
# Generate report
npm run benchmark -- --output benchmarks/report.md- Throughput Test: Measures requests per second
- Latency Test: Measures P50/P95/P99 latencies
- Memory Test: Measures memory usage and leaks
- Cache Test: Measures cache effectiveness
- Concurrency Test: Tests concurrent request handling
- Streaming Test: Measures streaming performance
# Analyze performance
npm run perf:analyze
# Generate detailed report
npm run perf:reportThe built-in bottleneck analyzer automatically detects:
- Cause: Slow API responses, network issues
- Solution: Increase cache size, optimize prompts
- Impact: 30-50% latency reduction
- Cause: Low concurrency, sequential processing
- Solution: Increase maxConcurrency, use batch API
- Impact: 2-3x throughput increase
- Cause: Large cache, memory leaks
- Solution: Reduce cache size, call optimize()
- Impact: 40-60% memory reduction
- Cause: Low hit rate, small cache
- Solution: Increase cache size, optimize keys
- Impact: 20-40% cache improvement
# Run in CI
npm run benchmark:ciFeatures:
- Automatic threshold checking
- Fails build on regression
- Generates reports for artifacts
- name: Performance Benchmarks
run: npm run benchmark:ci
- name: Upload Report
uses: actions/upload-artifact@v3
with:
name: performance-report
path: benchmarks/performance-report.mdnpm run benchmark:profile
node --prof-process isolate-*.log > profile.txtnode --expose-gc --max-old-space-size=512 dist/benchmarks/runner.jsnode --inspect-brk dist/benchmarks/runner.js
# Open chrome://inspect- Enable caching for repetitive workloads
- Set appropriate cache size (1000+ items)
- Configure concurrency based on API limits
- Use batch API for multiple generations
- Implement streaming for large outputs
- Monitor memory usage regularly
- Run benchmarks before releases
- Set up CI/CD performance tests
- Profile bottlenecks periodically
- Optimize prompt patterns for cache hits
// Get current metrics
const metrics = synth.getMetrics();
console.log('Cache:', metrics.cache);
console.log('Memory:', metrics.memory);
console.log('Router:', metrics.router);import { PerformanceMonitor } from '@ruvector/agentic-synth';
const monitor = new PerformanceMonitor();
monitor.start();
// ... run workload ...
const metrics = monitor.getMetrics();
console.log('Throughput:', metrics.throughput);
console.log('P99 Latency:', metrics.p99LatencyMs);import { BottleneckAnalyzer } from '@ruvector/agentic-synth';
const analyzer = new BottleneckAnalyzer();
const report = analyzer.analyze(metrics);
if (report.detected) {
console.log('Bottlenecks:', report.bottlenecks);
console.log('Recommendations:', report.recommendations);
}- Cache Strategy: Use prompts as cache keys, normalize formatting
- Concurrency: Start with 100, increase based on API limits
- Memory: Monitor with getMetrics(), call optimize() periodically
- Streaming: Use for outputs > 1000 tokens
- Benchmarking: Run before releases, track trends
- Monitoring: Enable in production, set up alerts
- Optimization: Profile first, optimize bottlenecks
- Testing: Include performance tests in CI/CD
- Check cache hit rate
- Increase cache size
- Optimize prompt patterns
- Check network connectivity
- Increase maxConcurrency
- Use batch API
- Reduce maxTokens
- Check API rate limits
- Reduce cache size
- Call optimize() regularly
- Use streaming for large outputs
- Check for memory leaks
- Normalize prompt formatting
- Increase cache size
- Increase TTL
- Review workload patterns