This document provides guidance for AI assistants (Claude, GitHub Copilot, etc.) working on the lance-bench repository.
Purpose: Automated benchmark infrastructure for tracking Lance performance over time Tech Stack: Python, GitHub Actions, LanceDB, AWS S3 Key Dependencies: PyGithub, LanceDB, PyArrow, pytest-benchmark
-
GitHub Actions Workflows (
.github/workflows/)- Automated scheduling (every 6 hours)
- Reusable benchmark runners for Rust and Python
- Orchestration workflows
-
Publishing Scripts (
scripts/)- Parse benchmark output (Criterion/pytest-benchmark)
- Transform to common data model
- Upload to LanceDB
-
Database Package (
packages/lance_bench_db/)- Data models (Result, TestBed, DutBuild, SummaryValues)
- Connection utilities
- PyArrow schema definitions
New Lance Commit
↓
Scheduler (schedule_benchmarks.py)
↓
Check Database (has_results_for_commit)
↓
Trigger Workflow (run-benchmarks.yml)
↓
Run Benchmarks (Rust: Criterion, Python: pytest-benchmark)
↓
Publish Results (publish_criterion.py / publish_pytest.py)
↓
Store in LanceDB (S3: s3://lance-bench-results)
- Use type hints for all function signatures
- Prefer
pathlib.Pathover string paths - Use
Noneas default for optional parameters (not""or0) - Raise exceptions for invalid states (fail fast)
- Use f-strings for formatting
- Use emoji prefixes in logs: ✓ (success), ℹ️ (info),
⚠️ (warning), ❌ (error)
# Good: Explicit None default with validation
def func(version: str | None = None) -> str:
if version is None:
raise ValueError("Version is required")
return version
# Bad: Empty string default that silently fails
def func(version: str = "") -> str:
return version # Could be empty!- Extract common utilities to
publish_util.py - Both
publish_criterion.pyandpublish_pytest.pyshould use shared functions - Example:
get_test_bed()is shared between both publishers
All results use this format:
- Version:
{VERSION}+{SHORT_SHA}(e.g.,"0.15.0+abc1234") - Timestamp: Unix timestamp of the commit
- Extract from:
Cargo.toml(version) +git show -s --format=%ct(timestamp)
Check for existing results by short SHA:
short_sha = commit_sha[:7]
query = results_table.search().where(f"dut.version LIKE '%{short_sha}%'").limit(1)
results = query.to_list()
has_results = len(results) > 0- Criterion: Outputs nanoseconds (keep as-is)
- pytest-benchmark: Outputs seconds (convert to nanoseconds)
values = [v * 1_000_000_000 for v in raw_data] # seconds -> nanosecondsCannot use default GITHUB_TOKEN to trigger workflows:
# Requires SCHEDULER_GITHUB_TOKEN secret
workflow.create_dispatch(ref="main", inputs={"git_sha": commit_sha})Database connections should retry with exponential backoff:
for attempt in range(3):
try:
db = connect()
break
except Exception as e:
if attempt == 2:
raise
time.sleep((attempt + 1) * 2)schedule-benchmarks.yml- Cron scheduler (4x daily)run-benchmarks.yml- Rust benchmark orchestratorrun-rust-benchmarks.yml- Reusable Rust workflow (called per crate)run-python-benchmarks.yml- Reusable Python workflowlint.yml- Code quality checks
schedule_benchmarks.py- Check latest commit, trigger if newbackfill_benchmarks.py- Process historical commitspublish_criterion.py- Parse Criterion JSON, publish to DBpublish_pytest.py- Parse pytest-benchmark JSON, publish to DBpublish_util.py- Shared utilities (get_test_bed)
packages/lance_bench_db/models.py- Data models and schemapackages/lance_bench_db/dataset.py- Connection and URI resolution
- Add benchmark to Lance repository using Criterion
- Edit
run-benchmarks.ymlto add new job:bench-new-crate: uses: ./.github/workflows/run-rust-benchmarks.yml with: git_sha: ${{ inputs.git_sha }} crate_path: "rust/new-crate" secrets: LANCE_BENCH_DB_URI: ${{ secrets.LANCE_BENCH_DB_URI }} # ... other secrets
- Add pytest benchmark to
lance/python/python/ci_benchmarks/benchmarks/ - No workflow changes needed (pytest auto-discovers)
- Update
packages/lance_bench_db/models.py - Update PyArrow schema in
Result.to_arrow_table() - Consider migration strategy for existing data
- Check GitHub Actions logs in the Actions tab
- For scheduler: Look for rate limits, auth errors
- For benchmarks: Check build logs, benchmark output
- For publishing: Verify database connection, AWS credentials
# Set local database
export LANCE_BENCH_URI="$HOME/.lance-bench"
# Test publishing (without AWS)
uv run python scripts/publish_criterion.py \
/path/to/criterion-output.json \
--testbed-name "local-test" \
--dut-version "test+1234567" \
--dut-timestamp $(date +%s)
# Test scheduler (requires GitHub token)
export GITHUB_TOKEN="your-token"
uv run python scripts/schedule_benchmarks.py- Use manual workflow dispatch in GitHub Actions
- Monitor first few scheduled runs for issues
- Verify results appear in database
The default GITHUB_TOKEN cannot trigger other workflows. Always use SCHEDULER_GITHUB_TOKEN for workflow dispatch.
Python benchmarks need Rust toolchain even though they're Python tests (Lance is Rust-backed).
Python benchmarks require datasets to be generated first (gen_all.py). This is not idempotent - it creates new datasets each time.
- Criterion: Line-delimited JSON (JSONL)
- pytest-benchmark: Single JSON object
- Both are parsed differently in publish scripts
Both dut_version and dut_timestamp are required for publishing. The scripts will raise ValueError if either cannot be determined.
LanceDB S3 access requires:
s3:GetObjects3:PutObjects3:ListBucket
- Database stores short SHA (7 chars) in version string
- Queries use
LIKE '%{short_sha}%'to match - Collisions are possible but unlikely in practice
- Follow existing patterns (especially for publish scripts)
- Add error handling with clear messages
- Use type hints and docstrings
- Test locally before pushing
- Consider backwards compatibility with existing data
- Check environment variables are set
- Verify AWS credentials have correct permissions
- Look for rate limiting (GitHub API: 5000 req/hour)
- Confirm LanceDB connection works locally first
- Keep publish scripts similar in structure
- Extract common code to
publish_util.py - Maintain backwards compatibility with stored data
- Update both README.md and this file
LANCE_BENCH_URI- Database location (S3 or local)AWS_ACCESS_KEY_ID- AWS credentialsAWS_SECRET_ACCESS_KEY- AWS credentials
GITHUB_TOKEN- For GitHub API access and workflow triggeringLANCE_BENCH_REPO- Repository name (usually fromgithub.repository)
MAX_COMMITS- For backfill script (default: 10)COMMIT_INTERVAL- For backfill script (default: 1)
- Primary key:
id(UUID) - Indexed by:
dut.version(contains commit SHA) - Time-series:
timestampfield - Raw data:
valuesarray (all measurements) - Aggregates:
summarystruct (min, max, mean, etc.)
Always: {VERSION}+{SHORT_SHA}
- VERSION from Cargo.toml (e.g., "0.15.0")
- SHORT_SHA is first 7 chars of commit
- Example: "0.15.0+abc1234"
All measurements stored in nanoseconds for consistency:
- Criterion native: nanoseconds
- pytest-benchmark: convert seconds → nanoseconds
- Adding new dependency: Is it already available in Lance repo?
- Changing data model: How to handle existing data?
- New workflow: Should it be reusable (
workflow_call) or standalone? - Error handling: Should this fail fast or retry?
- Path references: Is this relative to lance-bench or lance repo?
# Install dependencies
uv sync
# Run linter
ruff check scripts/
# Format code
ruff format scripts/
# Type check
mypy scripts/
# Test database connection
python -c "from lance_bench_db.dataset import connect; print(connect())"
# Query results for a commit
python -c "
from lance_bench_db.dataset import connect
from lance_bench_db.models import Result
db = connect()
table = Result.open_table(db)
results = table.search().where('dut.version LIKE \"%abc1234%\"').to_list()
print(len(results))
"