Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

RuVector Benchmarking Suite

Comprehensive benchmarking tool for testing the globally distributed RuVector vector search system at scale (500M+ concurrent connections).

Table of Contents

Overview

This benchmarking suite provides enterprise-grade load testing capabilities for RuVector, supporting:

  • Massive Scale: Test up to 25B concurrent connections
  • Multi-Region: Distributed load generation across 11 GCP regions
  • Comprehensive Metrics: Latency, throughput, errors, resource utilization, costs
  • SLA Validation: Automated checking against 99.99% availability, <50ms p99 latency targets
  • Advanced Analysis: Statistical analysis, bottleneck identification, recommendations

Features

Load Generation

  • Multi-protocol support (HTTP, HTTP/2, WebSocket, gRPC)
  • Realistic query patterns (uniform, hotspot, Zipfian, burst)
  • Configurable ramp-up/down rates
  • Connection lifecycle management
  • Geographic distribution

Metrics Collection

  • Latency distribution (p50, p90, p95, p99, p99.9)
  • Throughput tracking (QPS, bandwidth)
  • Error analysis by type and region
  • Resource utilization (CPU, memory, network)
  • Cost per million queries
  • Regional performance comparison

Analysis & Reporting

  • Statistical analysis with anomaly detection
  • SLA compliance checking
  • Bottleneck identification
  • Performance score calculation
  • Actionable recommendations
  • Interactive visualization dashboard
  • Markdown and JSON reports
  • CSV export for further analysis

Prerequisites

Required

  • Node.js: v18+ (for TypeScript execution)
  • k6: Latest version (installation guide)
  • Access: RuVector cluster endpoint

Optional

  • Claude Flow: For hooks integration
    npm install -g claude-flow@alpha
  • Docker: For containerized execution
  • GCP Account: For multi-region load generation

Installation

  1. Clone Repository

    cd /home/user/ruvector/benchmarks
  2. Install Dependencies

    npm install -g typescript ts-node
    npm install k6 @types/k6
  3. Verify Installation

    k6 version
    ts-node --version
  4. Configure Environment

    export BASE_URL="https://your-ruvector-cluster.example.com"
    export PARALLEL=2  # Number of parallel scenarios

Quick Start

Run a Single Scenario

# Quick validation (100M connections, 45 minutes)
ts-node benchmark-runner.ts run baseline_100m

# Full baseline test (500M connections, 3+ hours)
ts-node benchmark-runner.ts run baseline_500m

# Burst test (10x spike to 5B connections)
ts-node benchmark-runner.ts run burst_10x

Run Scenario Groups

# Quick validation suite (~1 hour)
ts-node benchmark-runner.ts group quick_validation

# Standard test suite (~6 hours)
ts-node benchmark-runner.ts group standard_suite

# Full stress testing suite (~10 hours)
ts-node benchmark-runner.ts group stress_suite

# All scenarios (~48 hours)
ts-node benchmark-runner.ts group full_suite

List Available Tests

ts-node benchmark-runner.ts list

Benchmark Scenarios

Baseline Tests

baseline_500m

  • Description: Steady-state operation with 500M concurrent connections
  • Duration: 3h 15m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: Production capacity validation

baseline_100m

  • Description: Smaller baseline for quick validation
  • Duration: 45m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: CI/CD integration, quick regression tests

Burst Tests

burst_10x

  • Description: Sudden spike to 5B concurrent (10x baseline)
  • Duration: 20m
  • Target: P99 < 100ms, 99.9% availability
  • Use Case: Flash sale, viral event simulation

burst_25x

  • Description: Extreme spike to 12.5B concurrent (25x baseline)
  • Duration: 35m
  • Target: P99 < 150ms, 99.5% availability
  • Use Case: Major global event (Olympics, elections)

burst_50x

  • Description: Maximum spike to 25B concurrent (50x baseline)
  • Duration: 50m
  • Target: P99 < 200ms, 99% availability
  • Use Case: Stress testing absolute limits

Failover Tests

regional_failover

  • Description: Test recovery when one region fails
  • Duration: 45m
  • Target: <10% throughput degradation, <1% errors
  • Use Case: Disaster recovery validation

multi_region_failover

  • Description: Test recovery when multiple regions fail
  • Duration: 55m
  • Target: <20% throughput degradation, <2% errors
  • Use Case: Multi-region outage preparation

Workload Tests

read_heavy

  • Description: 95% reads, 5% writes (typical production workload)
  • Duration: 1h 50m
  • Target: P99 < 50ms, 99.99% availability
  • Use Case: Production simulation

write_heavy

  • Description: 70% writes, 30% reads (batch indexing scenario)
  • Duration: 1h 50m
  • Target: P99 < 80ms, 99.95% availability
  • Use Case: Bulk data ingestion

balanced_workload

  • Description: 50% reads, 50% writes
  • Duration: 1h 50m
  • Target: P99 < 60ms, 99.98% availability
  • Use Case: Mixed workload validation

Real-World Scenarios

world_cup

  • Description: Predictable spike with geographic concentration (Europe)
  • Duration: 3h
  • Target: P99 < 100ms during matches
  • Use Case: Major sporting event

black_friday

  • Description: Sustained high load with periodic spikes
  • Duration: 14h
  • Target: P99 < 80ms, 99.95% availability
  • Use Case: E-commerce peak period

Running Benchmarks

Basic Usage

# Set environment variables
export BASE_URL="https://ruvector.example.com"
export REGION="us-east1"

# Run single test
ts-node benchmark-runner.ts run baseline_500m

# Run with custom config
BASE_URL="https://staging.example.com" \
PARALLEL=3 \
ts-node benchmark-runner.ts group standard_suite

With Claude Flow Hooks

# Enable hooks (default)
export ENABLE_HOOKS=true

# Disable hooks
export ENABLE_HOOKS=false

ts-node benchmark-runner.ts run baseline_500m

Hooks will automatically:

  • Execute npx claude-flow@alpha hooks pre-task before each test
  • Store results in swarm memory
  • Execute npx claude-flow@alpha hooks post-task after completion

Multi-Region Execution

To distribute load across regions:

# Deploy load generators to GCP regions
for region in us-east1 us-west1 europe-west1 asia-east1; do
  gcloud compute instances create "k6-${region}" \
    --zone="${region}-a" \
    --machine-type="n2-standard-32" \
    --image-family="ubuntu-2004-lts" \
    --image-project="ubuntu-os-cloud" \
    --metadata-from-file=startup-script=setup-k6.sh
done

# Run distributed test
ts-node benchmark-runner.ts run baseline_500m

Docker Execution

# Build container
docker build -t ruvector-benchmark .

# Run test
docker run \
  -e BASE_URL="https://ruvector.example.com" \
  -v $(pwd)/results:/results \
  ruvector-benchmark run baseline_500m

Understanding Results

Output Structure

results/
  run-{timestamp}/
    {scenario}-{timestamp}-raw.json       # Raw K6 metrics
    {scenario}-{timestamp}-metrics.json   # Processed metrics
    {scenario}-{timestamp}-metrics.csv    # CSV export
    {scenario}-{timestamp}-analysis.json  # Analysis report
    {scenario}-{timestamp}-report.md      # Markdown report
    SUMMARY.md                            # Multi-scenario summary

Key Metrics

Latency

  • P50 (Median): 50% of requests faster than this
  • P90: 90% of requests faster than this
  • P95: 95% of requests faster than this
  • P99: 99% of requests faster than this (SLA target)
  • P99.9: 99.9% of requests faster than this

Target: P99 < 50ms for baseline, <100ms for burst

Throughput

  • QPS: Queries per second
  • Peak QPS: Maximum sustained throughput
  • Average QPS: Mean throughput over test duration

Target: 50M QPS for 500M baseline connections

Error Rate

  • Total Errors: Count of failed requests
  • Error Rate %: Percentage of requests that failed
  • By Type: Breakdown (timeout, connection, server, client)
  • By Region: Geographic distribution

Target: < 0.01% error rate (99.99% success)

Availability

  • Uptime %: Percentage of time system was available
  • Downtime: Total milliseconds of unavailability
  • MTBF: Mean time between failures
  • MTTR: Mean time to recovery

Target: 99.99% availability (52 minutes/year downtime)

Resource Utilization

  • CPU %: Average and peak CPU usage
  • Memory %: Average and peak memory usage
  • Network: Bandwidth, ingress/egress bytes
  • Per Region: Resource usage by geographic location

Alert Thresholds: CPU > 80%, Memory > 85%

Cost

  • Total Cost: Compute + network + storage
  • Cost Per Million: Queries per million queries
  • Per Region: Cost breakdown by location

Target: < $0.50 per million queries

Performance Score

Overall score (0-100) calculated from:

  • Performance (35%): Latency and throughput
  • Reliability (35%): Availability and error rate
  • Scalability (20%): Resource utilization efficiency
  • Efficiency (10%): Cost effectiveness

Grades:

  • 90-100: Excellent
  • 80-89: Good
  • 70-79: Fair
  • 60-69: Needs Improvement
  • <60: Poor

SLA Compliance

PASSED if all criteria met:

  • P99 latency < 50ms (baseline) or scenario target
  • Availability >= 99.99%
  • Error rate < 0.01%

FAILED if any criterion violated

Analysis Report

Each test generates an analysis report with:

  1. Statistical Analysis

    • Summary statistics
    • Distribution histograms
    • Time series charts
    • Anomaly detection
  2. SLA Compliance

    • Pass/fail status
    • Violation details
    • Duration and severity
  3. Bottlenecks

    • Identified constraints
    • Current vs. threshold values
    • Impact assessment
    • Recommendations
  4. Recommendations

    • Prioritized action items
    • Implementation guidance
    • Estimated impact and cost

Visualization Dashboard

Open visualization-dashboard.html in a browser to view:

  • Real-time metrics
  • Interactive charts
  • Geographic heat maps
  • Historical comparisons
  • Cost analysis

Best Practices

Before Running Tests

  1. Baseline Environment

    • Ensure cluster is healthy
    • No active deployments or maintenance
    • Stable configuration
  2. Resource Allocation

    • Sufficient load generator capacity
    • Network bandwidth provisioned
    • Monitoring systems ready
  3. Communication

    • Notify team of upcoming test
    • Schedule during low-traffic periods
    • Have rollback plan ready

During Tests

  1. Monitoring

    • Watch real-time metrics
    • Check for anomalies
    • Monitor costs
  2. Safety

    • Start with smaller tests (baseline_100m)
    • Gradually increase load
    • Be ready to abort if issues detected
  3. Documentation

    • Note any unusual events
    • Document configuration changes
    • Record observations

After Tests

  1. Analysis

    • Review all metrics
    • Identify bottlenecks
    • Compare to previous runs
  2. Reporting

    • Share results with team
    • Document findings
    • Create action items
  3. Follow-Up

    • Implement recommendations
    • Re-test after changes
    • Track improvements over time

Test Frequency

  • Quick Validation: Daily (CI/CD)
  • Standard Suite: Weekly
  • Stress Testing: Monthly
  • Full Suite: Quarterly

Cost Estimation

Load Generation Costs

Per hour of testing:

  • Compute: ~$1,000/hour (distributed load generators)
  • Network: ~$200/hour (egress traffic)
  • Storage: ~$10/hour (results storage)

Total: ~$1,200/hour

Scenario Cost Estimates

Scenario Duration Estimated Cost
baseline_100m 45m $900
baseline_500m 3h 15m $3,900
burst_10x 20m $400
burst_25x 35m $700
burst_50x 50m $1,000
read_heavy 1h 50m $2,200
world_cup 3h $3,600
black_friday 14h $16,800
Full Suite ~48h ~$57,600

Cost Optimization

  1. Use Spot Instances: 60-80% savings on load generators
  2. Regional Selection: Test in fewer regions
  3. Shorter Duration: Reduce steady-state phase
  4. Parallel Execution: Minimize total runtime

Troubleshooting

Common Issues

K6 Not Found

# Install k6
brew install k6  # macOS
sudo apt install k6  # Linux
choco install k6  # Windows

Connection Refused

# Check cluster endpoint
curl -v https://your-ruvector-cluster.example.com/health

# Verify network connectivity
ping your-ruvector-cluster.example.com

Out of Memory

# Increase Node.js memory limit
export NODE_OPTIONS="--max-old-space-size=8192"

# Use smaller scenario
ts-node benchmark-runner.ts run baseline_100m

High Error Rate

  • Check cluster health
  • Verify capacity (not overloaded)
  • Review network latency
  • Check authentication/authorization

Slow Performance

  • Insufficient load generator capacity
  • Network bandwidth limitations
  • Target cluster under-provisioned
  • Configuration issues (connection limits, timeouts)

Debug Mode

# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug

ts-node benchmark-runner.ts run baseline_500m

Support

For issues or questions:

Advanced Usage

Custom Scenarios

Create custom scenario in benchmark-scenarios.ts:

export const SCENARIOS = {
  ...SCENARIOS,
  my_custom_test: {
    name: 'My Custom Test',
    description: 'Custom workload pattern',
    config: {
      targetConnections: 1000000000,
      rampUpDuration: '15m',
      steadyStateDuration: '1h',
      rampDownDuration: '10m',
      queriesPerConnection: 100,
      queryInterval: '1000',
      protocol: 'http',
      vectorDimension: 768,
      queryPattern: 'uniform',
    },
    k6Options: {
      // K6 configuration
    },
    expectedMetrics: {
      p99Latency: 50,
      errorRate: 0.01,
      throughput: 100000000,
      availability: 99.99,
    },
    duration: '1h25m',
    tags: ['custom'],
  },
};

Integration with CI/CD

# .github/workflows/benchmark.yml
name: Benchmark
on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - name: Install k6
        run: |
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6
      - name: Run benchmark
        env:
          BASE_URL: ${{ secrets.BASE_URL }}
        run: |
          cd benchmarks
          ts-node benchmark-runner.ts run baseline_100m
      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: benchmark-results
          path: benchmarks/results/

Programmatic Usage

import { BenchmarkRunner } from './benchmark-runner';

const runner = new BenchmarkRunner({
  baseUrl: 'https://ruvector.example.com',
  parallelScenarios: 2,
  enableHooks: true,
});

// Run single scenario
const run = await runner.runScenario('baseline_500m');
console.log(`Score: ${run.analysis?.score.overall}/100`);

// Run multiple scenarios
const results = await runner.runScenarios([
  'baseline_500m',
  'burst_10x',
  'read_heavy',
]);

// Check if all passed SLA
const allPassed = Array.from(results.values()).every(
  r => r.analysis?.slaCompliance.met
);

Happy Benchmarking! 🚀

For questions or contributions, please visit: https://github.com/ruvnet/ruvector