This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
VertiBench is a Python library for benchmarking vertical federated learning (VFL). It generates synthetic VFL datasets with tunable feature importance imbalance and inter-party correlation, then evaluates the quality of vertical data partitions along those two dimensions.
# Install from source (editable)
pip install -e .
# Install with test dependencies (adds xgboost)
pip install -e ".[test]"
# Build distribution
python -m build
# Run all tests
python -m unittest discover test/
# Run individual test files
python -m unittest test.test_splitter
python -m unittest test.test_evaluator
python -m unittest test.test_evaluate_alpha
# Run a single test case
python -m unittest test.test_splitter.TestImportanceSplitter.test_split_tabularNo linter or formatter is configured for this project.
The library lives in src/vertibench/ and has two core modules:
Abstract base class Splitter defines the interface: split_indices() returns per-party feature index lists, and split() applies them to datasets.
Three implementations:
- ImportanceSplitter — Uses Dirichlet distribution to assign features to parties with controllable importance imbalance. The
weightsparameter controls expected importance per party (higher weight = more features). - CorrelationSplitter — Uses BRKGA (pymoo) genetic algorithm to find partitions that match a target inter/intra-party correlation ratio. Parameter
beta∈ [0,1] controls the balance. Requiresfit()on data before splitting. - SimpleSplitter — Uniform contiguous split of features across parties.
- ImportanceEvaluator — Computes per-party feature importance using SHAP Permutation explainer.
evaluate_alpha()recovers the Dirichlet concentration parameter from importance scores. - CorrelationEvaluator — Computes correlation matrices and scores inner vs. inter-party correlation.
evaluate_beta()recovers the correlation concentration metric. Supports GPU acceleration via PyTorch (gpu_idparameter). Uses multiple SVD strategies depending on feature count (exact for <100, randomized for larger).
- Generate data (e.g.,
sklearn.datasets.make_classification) Splitter.split(X)→ list of per-party feature matricesXsEvaluator.evaluate(Xs, ...)→ quality scoresevaluate_alpha()/evaluate_beta()→ concentration metrics
Splitteruses ABC + template method: concrete classes implementsplit_indices(), base class handlessplit()logic.CorrelationSplittercomposes aCorrelationEvaluatorinternally for optimization.- Correlation computation has multiple backends: Spearman (pandas), Pearson (numpy/torch), with CPU/GPU variants.
Tests use unittest with subTest() for parameterized variants. Test data is generated synthetically via generate_data() and split_data() helpers in each test file. The evaluator tests train actual XGBoost models, so the [test] extras are required.
Key: numpy, scipy, scikit-learn, torch, shap, pymoo, matplotlib. Python >= 3.9.