Comprehensive benchmarking tool for testing the globally distributed RuVector vector search system at scale (500M+ concurrent connections).
- Overview
- Features
- Prerequisites
- Installation
- Quick Start
- Benchmark Scenarios
- Running Benchmarks
- Understanding Results
- Best Practices
- Cost Estimation
- Troubleshooting
- Advanced Usage
This benchmarking suite provides enterprise-grade load testing capabilities for RuVector, supporting:
- Massive Scale: Test up to 25B concurrent connections
- Multi-Region: Distributed load generation across 11 GCP regions
- Comprehensive Metrics: Latency, throughput, errors, resource utilization, costs
- SLA Validation: Automated checking against 99.99% availability, <50ms p99 latency targets
- Advanced Analysis: Statistical analysis, bottleneck identification, recommendations
- Multi-protocol support (HTTP, HTTP/2, WebSocket, gRPC)
- Realistic query patterns (uniform, hotspot, Zipfian, burst)
- Configurable ramp-up/down rates
- Connection lifecycle management
- Geographic distribution
- Latency distribution (p50, p90, p95, p99, p99.9)
- Throughput tracking (QPS, bandwidth)
- Error analysis by type and region
- Resource utilization (CPU, memory, network)
- Cost per million queries
- Regional performance comparison
- Statistical analysis with anomaly detection
- SLA compliance checking
- Bottleneck identification
- Performance score calculation
- Actionable recommendations
- Interactive visualization dashboard
- Markdown and JSON reports
- CSV export for further analysis
- Node.js: v18+ (for TypeScript execution)
- k6: Latest version (installation guide)
- Access: RuVector cluster endpoint
- Claude Flow: For hooks integration
npm install -g claude-flow@alpha
- Docker: For containerized execution
- GCP Account: For multi-region load generation
-
Clone Repository
cd /home/user/ruvector/benchmarks -
Install Dependencies
npm install -g typescript ts-node npm install k6 @types/k6
-
Verify Installation
k6 version ts-node --version
-
Configure Environment
export BASE_URL="https://your-ruvector-cluster.example.com" export PARALLEL=2 # Number of parallel scenarios
# Quick validation (100M connections, 45 minutes)
ts-node benchmark-runner.ts run baseline_100m
# Full baseline test (500M connections, 3+ hours)
ts-node benchmark-runner.ts run baseline_500m
# Burst test (10x spike to 5B connections)
ts-node benchmark-runner.ts run burst_10x# Quick validation suite (~1 hour)
ts-node benchmark-runner.ts group quick_validation
# Standard test suite (~6 hours)
ts-node benchmark-runner.ts group standard_suite
# Full stress testing suite (~10 hours)
ts-node benchmark-runner.ts group stress_suite
# All scenarios (~48 hours)
ts-node benchmark-runner.ts group full_suitets-node benchmark-runner.ts list- Description: Steady-state operation with 500M concurrent connections
- Duration: 3h 15m
- Target: P99 < 50ms, 99.99% availability
- Use Case: Production capacity validation
- Description: Smaller baseline for quick validation
- Duration: 45m
- Target: P99 < 50ms, 99.99% availability
- Use Case: CI/CD integration, quick regression tests
- Description: Sudden spike to 5B concurrent (10x baseline)
- Duration: 20m
- Target: P99 < 100ms, 99.9% availability
- Use Case: Flash sale, viral event simulation
- Description: Extreme spike to 12.5B concurrent (25x baseline)
- Duration: 35m
- Target: P99 < 150ms, 99.5% availability
- Use Case: Major global event (Olympics, elections)
- Description: Maximum spike to 25B concurrent (50x baseline)
- Duration: 50m
- Target: P99 < 200ms, 99% availability
- Use Case: Stress testing absolute limits
- Description: Test recovery when one region fails
- Duration: 45m
- Target: <10% throughput degradation, <1% errors
- Use Case: Disaster recovery validation
- Description: Test recovery when multiple regions fail
- Duration: 55m
- Target: <20% throughput degradation, <2% errors
- Use Case: Multi-region outage preparation
- Description: 95% reads, 5% writes (typical production workload)
- Duration: 1h 50m
- Target: P99 < 50ms, 99.99% availability
- Use Case: Production simulation
- Description: 70% writes, 30% reads (batch indexing scenario)
- Duration: 1h 50m
- Target: P99 < 80ms, 99.95% availability
- Use Case: Bulk data ingestion
- Description: 50% reads, 50% writes
- Duration: 1h 50m
- Target: P99 < 60ms, 99.98% availability
- Use Case: Mixed workload validation
- Description: Predictable spike with geographic concentration (Europe)
- Duration: 3h
- Target: P99 < 100ms during matches
- Use Case: Major sporting event
- Description: Sustained high load with periodic spikes
- Duration: 14h
- Target: P99 < 80ms, 99.95% availability
- Use Case: E-commerce peak period
# Set environment variables
export BASE_URL="https://ruvector.example.com"
export REGION="us-east1"
# Run single test
ts-node benchmark-runner.ts run baseline_500m
# Run with custom config
BASE_URL="https://staging.example.com" \
PARALLEL=3 \
ts-node benchmark-runner.ts group standard_suite# Enable hooks (default)
export ENABLE_HOOKS=true
# Disable hooks
export ENABLE_HOOKS=false
ts-node benchmark-runner.ts run baseline_500mHooks will automatically:
- Execute
npx claude-flow@alpha hooks pre-taskbefore each test - Store results in swarm memory
- Execute
npx claude-flow@alpha hooks post-taskafter completion
To distribute load across regions:
# Deploy load generators to GCP regions
for region in us-east1 us-west1 europe-west1 asia-east1; do
gcloud compute instances create "k6-${region}" \
--zone="${region}-a" \
--machine-type="n2-standard-32" \
--image-family="ubuntu-2004-lts" \
--image-project="ubuntu-os-cloud" \
--metadata-from-file=startup-script=setup-k6.sh
done
# Run distributed test
ts-node benchmark-runner.ts run baseline_500m# Build container
docker build -t ruvector-benchmark .
# Run test
docker run \
-e BASE_URL="https://ruvector.example.com" \
-v $(pwd)/results:/results \
ruvector-benchmark run baseline_500mresults/
run-{timestamp}/
{scenario}-{timestamp}-raw.json # Raw K6 metrics
{scenario}-{timestamp}-metrics.json # Processed metrics
{scenario}-{timestamp}-metrics.csv # CSV export
{scenario}-{timestamp}-analysis.json # Analysis report
{scenario}-{timestamp}-report.md # Markdown report
SUMMARY.md # Multi-scenario summary
- P50 (Median): 50% of requests faster than this
- P90: 90% of requests faster than this
- P95: 95% of requests faster than this
- P99: 99% of requests faster than this (SLA target)
- P99.9: 99.9% of requests faster than this
Target: P99 < 50ms for baseline, <100ms for burst
- QPS: Queries per second
- Peak QPS: Maximum sustained throughput
- Average QPS: Mean throughput over test duration
Target: 50M QPS for 500M baseline connections
- Total Errors: Count of failed requests
- Error Rate %: Percentage of requests that failed
- By Type: Breakdown (timeout, connection, server, client)
- By Region: Geographic distribution
Target: < 0.01% error rate (99.99% success)
- Uptime %: Percentage of time system was available
- Downtime: Total milliseconds of unavailability
- MTBF: Mean time between failures
- MTTR: Mean time to recovery
Target: 99.99% availability (52 minutes/year downtime)
- CPU %: Average and peak CPU usage
- Memory %: Average and peak memory usage
- Network: Bandwidth, ingress/egress bytes
- Per Region: Resource usage by geographic location
Alert Thresholds: CPU > 80%, Memory > 85%
- Total Cost: Compute + network + storage
- Cost Per Million: Queries per million queries
- Per Region: Cost breakdown by location
Target: < $0.50 per million queries
Overall score (0-100) calculated from:
- Performance (35%): Latency and throughput
- Reliability (35%): Availability and error rate
- Scalability (20%): Resource utilization efficiency
- Efficiency (10%): Cost effectiveness
Grades:
- 90-100: Excellent
- 80-89: Good
- 70-79: Fair
- 60-69: Needs Improvement
- <60: Poor
✅ PASSED if all criteria met:
- P99 latency < 50ms (baseline) or scenario target
- Availability >= 99.99%
- Error rate < 0.01%
❌ FAILED if any criterion violated
Each test generates an analysis report with:
-
Statistical Analysis
- Summary statistics
- Distribution histograms
- Time series charts
- Anomaly detection
-
SLA Compliance
- Pass/fail status
- Violation details
- Duration and severity
-
Bottlenecks
- Identified constraints
- Current vs. threshold values
- Impact assessment
- Recommendations
-
Recommendations
- Prioritized action items
- Implementation guidance
- Estimated impact and cost
Open visualization-dashboard.html in a browser to view:
- Real-time metrics
- Interactive charts
- Geographic heat maps
- Historical comparisons
- Cost analysis
-
Baseline Environment
- Ensure cluster is healthy
- No active deployments or maintenance
- Stable configuration
-
Resource Allocation
- Sufficient load generator capacity
- Network bandwidth provisioned
- Monitoring systems ready
-
Communication
- Notify team of upcoming test
- Schedule during low-traffic periods
- Have rollback plan ready
-
Monitoring
- Watch real-time metrics
- Check for anomalies
- Monitor costs
-
Safety
- Start with smaller tests (baseline_100m)
- Gradually increase load
- Be ready to abort if issues detected
-
Documentation
- Note any unusual events
- Document configuration changes
- Record observations
-
Analysis
- Review all metrics
- Identify bottlenecks
- Compare to previous runs
-
Reporting
- Share results with team
- Document findings
- Create action items
-
Follow-Up
- Implement recommendations
- Re-test after changes
- Track improvements over time
- Quick Validation: Daily (CI/CD)
- Standard Suite: Weekly
- Stress Testing: Monthly
- Full Suite: Quarterly
Per hour of testing:
- Compute: ~$1,000/hour (distributed load generators)
- Network: ~$200/hour (egress traffic)
- Storage: ~$10/hour (results storage)
Total: ~$1,200/hour
| Scenario | Duration | Estimated Cost |
|---|---|---|
| baseline_100m | 45m | $900 |
| baseline_500m | 3h 15m | $3,900 |
| burst_10x | 20m | $400 |
| burst_25x | 35m | $700 |
| burst_50x | 50m | $1,000 |
| read_heavy | 1h 50m | $2,200 |
| world_cup | 3h | $3,600 |
| black_friday | 14h | $16,800 |
| Full Suite | ~48h | ~$57,600 |
- Use Spot Instances: 60-80% savings on load generators
- Regional Selection: Test in fewer regions
- Shorter Duration: Reduce steady-state phase
- Parallel Execution: Minimize total runtime
# Install k6
brew install k6 # macOS
sudo apt install k6 # Linux
choco install k6 # Windows# Check cluster endpoint
curl -v https://your-ruvector-cluster.example.com/health
# Verify network connectivity
ping your-ruvector-cluster.example.com# Increase Node.js memory limit
export NODE_OPTIONS="--max-old-space-size=8192"
# Use smaller scenario
ts-node benchmark-runner.ts run baseline_100m- Check cluster health
- Verify capacity (not overloaded)
- Review network latency
- Check authentication/authorization
- Insufficient load generator capacity
- Network bandwidth limitations
- Target cluster under-provisioned
- Configuration issues (connection limits, timeouts)
# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug
ts-node benchmark-runner.ts run baseline_500mFor issues or questions:
- GitHub Issues: https://github.com/ruvnet/ruvector/issues
- Documentation: https://docs.ruvector.io
- Community: https://discord.gg/ruvector
Create custom scenario in benchmark-scenarios.ts:
export const SCENARIOS = {
...SCENARIOS,
my_custom_test: {
name: 'My Custom Test',
description: 'Custom workload pattern',
config: {
targetConnections: 1000000000,
rampUpDuration: '15m',
steadyStateDuration: '1h',
rampDownDuration: '10m',
queriesPerConnection: 100,
queryInterval: '1000',
protocol: 'http',
vectorDimension: 768,
queryPattern: 'uniform',
},
k6Options: {
// K6 configuration
},
expectedMetrics: {
p99Latency: 50,
errorRate: 0.01,
throughput: 100000000,
availability: 99.99,
},
duration: '1h25m',
tags: ['custom'],
},
};# .github/workflows/benchmark.yml
name: Benchmark
on:
schedule:
- cron: '0 0 * * 0' # Weekly
workflow_dispatch:
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
- name: Install k6
run: |
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
- name: Run benchmark
env:
BASE_URL: ${{ secrets.BASE_URL }}
run: |
cd benchmarks
ts-node benchmark-runner.ts run baseline_100m
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: benchmarks/results/import { BenchmarkRunner } from './benchmark-runner';
const runner = new BenchmarkRunner({
baseUrl: 'https://ruvector.example.com',
parallelScenarios: 2,
enableHooks: true,
});
// Run single scenario
const run = await runner.runScenario('baseline_500m');
console.log(`Score: ${run.analysis?.score.overall}/100`);
// Run multiple scenarios
const results = await runner.runScenarios([
'baseline_500m',
'burst_10x',
'read_heavy',
]);
// Check if all passed SLA
const allPassed = Array.from(results.values()).every(
r => r.analysis?.slaCompliance.met
);Happy Benchmarking! 🚀
For questions or contributions, please visit: https://github.com/ruvnet/ruvector