Skip to content

Conversation

@etiennedi
Copy link
Member

Add Plan Runner for Automated Benchmarking

This PR adds plan_runner.py, a new automation tool for running benchmarks across multiple configurations and comparing control vs candidate branches.

What's New

Plan Runner (plan_runner.py) automates the pattern of:

  1. Ingesting data on a specified branch
  2. Running query benchmarks on both control and candidate branches
  3. Generating visualizations comparing the results

All configuration is defined in a YAML plan file that specifies:

  • Control and candidate branches to compare
  • Global parameters shared across all runs
  • Individual run configurations with specific parameters
  • Optional async indexing per run

Backward Compatibility

No changes to existing workflows. The benchmarker and visualizer can still be used independently for single runs. This tool is purely additive.

Getting Started

  1. Copy plan.yml.example to plan.yml
  2. Adjust branches, parameters, and runs to your needs
  3. Run: python3 plan_runner.py plan.yml

Use --dry-run to preview what will be executed without running anything.

Features

  • Automatic branch switching and Weaviate rebuilding
  • Result archiving in results_archive/
  • Visualizations in visualizations/ with:
    • Run name as title
    • All parameters as subtitle
    • Branch information at bottom
  • Per-run async indexing configuration
  • Process cleanup - kills stale Weaviate instances
  • Graceful error handling and logging

Example Outputs

benchmark_openai_100k_pq_global_vector benchmark_openai_100k_pq_named_vector benchmark_openai_100k_rq8_dynamic_index_global_vector benchmark_openai_100k_rq8_dynamic_index_named_vector benchmark_openai_100k_rq8_global_vector benchmark_openai_100k_rq8_named_vector

Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 1   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca
🛡️ The following SAST misconfigurations have been detected
NAME FILE
medium Security Risks of Using the Subprocess Module ...arker/plan_runner.py View in code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants