Skip to content

feat: unified benchmark runner with composable config#10

Closed
andygrove wants to merge 1 commit intomainfrom
unified-benchmark-runner
Closed

feat: unified benchmark runner with composable config#10
andygrove wants to merge 1 commit intomainfrom
unified-benchmark-runner

Conversation

@andygrove
Copy link
Owner

Summary

  • Replace scattered benchmark scripts (10 duplicated shell scripts in dev/benchmarks/ and separate pyspark shuffle benchmarks) with a single composable framework under benchmarks/
  • Config system with merge precedence: profile (cluster shape) < engine (plugin/JARs) < CLI overrides
  • Python entry point (run.py) that builds and executes spark-submit with --dry-run support

What's included

  • Runner: run.py entry point, config loader, SparkSession builder, CLI with tpc and shuffle subcommands
  • Suites: TPC-H/TPC-DS (22/99 queries) and shuffle (hash/round-robin) benchmarks
  • Engines: spark, comet, comet-iceberg, gluten, blaze, plus 3 shuffle variants
  • Profiles: local, standalone-tpch, standalone-tpcds, docker, k8s
  • Profiling: Level 1 JVM metrics via Spark REST API
  • Analysis: Comparison chart generation and memory report tools
  • Infrastructure: Docker Compose (with memory-constrained overlay) and Kubernetes manifests

Usage

# TPC-H with Comet
python benchmarks/run.py \
    --engine comet --profile standalone-tpch --restart-cluster \
    -- tpc --benchmark tpch --data $TPCH_DATA --queries $TPCH_QUERIES \
       --output . --iterations 1

# Preview command without executing
python benchmarks/run.py \
    --engine comet --profile standalone-tpch --dry-run \
    -- tpc --benchmark tpch --data $TPCH_DATA --queries $TPCH_QUERIES \
       --output . --iterations 1

Test plan

  • Verify --dry-run produces correct spark-submit command for each engine
  • Run TPC-H locally with --engine comet --profile local
  • Run shuffle benchmark locally with --engine comet-native-shuffle --profile local
  • Verify JSON output is compatible with benchmarks/analysis/compare.py

🤖 Generated with Claude Code

Replace scattered benchmark scripts (10 duplicated shell scripts in
dev/benchmarks/ and separate pyspark shuffle benchmarks) with a single
composable framework under benchmarks/.

Key changes:
- Config system: profile (cluster shape) + engine (plugin/JARs) + CLI
  overrides with clear merge precedence
- Python entry point (run.py) that builds and executes spark-submit
- TPC-H/TPC-DS and shuffle benchmark suites
- Level 1 JVM profiling via Spark REST API
- Analysis tools for comparison charts and memory reports
- Docker and Kubernetes infrastructure
- 8 engine configs (spark, comet, comet-iceberg, gluten, blaze, plus
  3 shuffle variants) and 5 profile configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant