Skip to content

Commit dd17d8f

Browse files
authored
Merge pull request #641 from aai-institute/feature/owen-samplers-and-more
YAMPR (Yet Another Monster PR)
2 parents 489d9ce + 257de62 commit dd17d8f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+4878
-3924
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,10 @@ repos:
88
# HACK: ruff-pre-commit ignores pyproject.toml
99
# https://github.com/astral-sh/ruff-pre-commit/issues/54
1010
args: [ "--extend-per-file-ignores", "tests/**/*.py:F811",
11-
"--extend-per-file-ignores", "tests/**/*.py:F401",
12-
"--fix" ]
11+
"--extend-per-file-ignores", "tests/**/*.py:F401" ]
1312
- id: ruff-format
1413
- repo: https://github.com/kynan/nbstripout
1514
rev: 0.6.1
1615
hooks:
1716
- id: nbstripout
18-
args: ["--keep-output", "--keep-count", "--drop-empty-cells", "--extra-keys", "metadata.pycharm cell.metadata.pycharm"]
17+
args: [ "--keep-output", "--keep-count", "--drop-empty-cells", "--extra-keys", "metadata.pycharm cell.metadata.pycharm" ]

.test_durations

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1493,22 +1493,22 @@
14931493
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[None-sampler_kwargs4-GroupTestingShapleyValuation-valuation_kwargs4-0.1-0.01-test_game1]": 3.596974375000002,
14941494
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[OwenSampler-sampler_kwargs2-OwenShapleyValuation-valuation_kwargs2-0.2-0.0001-test_game0]": 2.566003500000022,
14951495
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[OwenSampler-sampler_kwargs2-OwenShapleyValuation-valuation_kwargs2-0.2-0.0001-test_game1]": 3.0255352490000007,
1496-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[PermutationSampler-sampler_kwargs0-DataShapleyValuation-valuation_kwargs0-0.2-0.0001-test_game0]": 2.536671957999971,
1497-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[PermutationSampler-sampler_kwargs0-DataShapleyValuation-valuation_kwargs0-0.2-0.0001-test_game1]": 0.37417354199996566,
1498-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[UniformSampler-sampler_kwargs1-DataShapleyValuation-valuation_kwargs1-0.2-0.0001-test_game0]": 3.23137270899997,
1499-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[UniformSampler-sampler_kwargs1-DataShapleyValuation-valuation_kwargs1-0.2-0.0001-test_game1]": 3.4768419569999764,
1496+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[PermutationSampler-sampler_kwargs0-ShapleyValuation-valuation_kwargs0-0.2-0.0001-test_game0]": 2.536671957999971,
1497+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[PermutationSampler-sampler_kwargs0-ShapleyValuation-valuation_kwargs0-0.2-0.0001-test_game1]": 0.37417354199996566,
1498+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[UniformSampler-sampler_kwargs1-ShapleyValuation-valuation_kwargs1-0.2-0.0001-test_game0]": 3.23137270899997,
1499+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_games[UniformSampler-sampler_kwargs1-ShapleyValuation-valuation_kwargs1-0.2-0.0001-test_game1]": 3.4768419569999764,
15001500
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_grouped_linear_montecarlo_shapley[PermutationSampler-kwargs0-2-0-21-2]": 0.7323970420000023,
15011501
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_hoeffding_bound_montecarlo[PermutationSampler-6-0.1-0.1]": 22.979635875999975,
15021502
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_hoeffding_bound_montecarlo[UniformSampler-6-0.1-0.1]": 26.41515983300002,
15031503
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_linear_montecarlo_with_outlier[AntitheticOwenSampler-sampler_kwargs2-OwenShapleyValuation-valuation_kwargs2-2-0-21]": 14.262070917000017,
15041504
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_linear_montecarlo_with_outlier[None-sampler_kwargs3-GroupTestingShapleyValuation-valuation_kwargs3-2-0-21]": 19.76072416599999,
15051505
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_linear_montecarlo_with_outlier[OwenSampler-sampler_kwargs1-OwenShapleyValuation-valuation_kwargs1-2-0-21]": 9.141020416000003,
1506-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_linear_montecarlo_with_outlier[PermutationSampler-sampler_kwargs0-DataShapleyValuation-valuation_kwargs0-2-0-21]": 3.0010637080000038,
1506+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_linear_montecarlo_with_outlier[PermutationSampler-sampler_kwargs0-ShapleyValuation-valuation_kwargs0-2-0-21]": 3.0010637080000038,
15071507
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[AntitheticOwenSampler-sampler_kwargs3-OwenShapleyValuation-valuation_kwargs3-test_game0]": 0.25927716699999337,
15081508
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[None-sampler_kwargs4-GroupTestingShapleyValuation-valuation_kwargs4-test_game0]": 0.3608742090000021,
15091509
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[OwenSampler-sampler_kwargs2-OwenShapleyValuation-valuation_kwargs2-test_game0]": 0.10323104100001501,
1510-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[PermutationSampler-sampler_kwargs0-DataShapleyValuation-valuation_kwargs0-test_game0]": 0.006297582000001967,
1511-
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[UniformSampler-sampler_kwargs1-DataShapleyValuation-valuation_kwargs1-test_game0]": 0.00704366599998707,
1510+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[PermutationSampler-sampler_kwargs0-ShapleyValuation-valuation_kwargs0-test_game0]": 0.006297582000001967,
1511+
"tests/valuation/methods/test_montecarlo_shapley_valuations.py::test_seed[UniformSampler-sampler_kwargs1-ShapleyValuation-valuation_kwargs1-test_game0]": 0.00704366599998707,
15121512
"tests/valuation/methods/test_semivalues.py::test_banzhaf[AntitheticPermutationSampler-5]": 3.0455862080000005,
15131513
"tests/valuation/methods/test_semivalues.py::test_banzhaf[AntitheticSampler-5]": 0.4238251659999994,
15141514
"tests/valuation/methods/test_semivalues.py::test_banzhaf[DeterministicPermutationSampler-5]": 0.006310583000000314,
@@ -1523,8 +1523,8 @@
15231523
"tests/valuation/methods/test_semivalues.py::test_coefficients[BetaShapleyValuation-kwargs2-10]": 0.003863207999984297,
15241524
"tests/valuation/methods/test_semivalues.py::test_coefficients[DataBanzhafValuation-kwargs3-100]": 0.001800666000065121,
15251525
"tests/valuation/methods/test_semivalues.py::test_coefficients[DataBanzhafValuation-kwargs3-10]": 0.0016530420000435697,
1526-
"tests/valuation/methods/test_semivalues.py::test_coefficients[DataShapleyValuation-kwargs4-100]": 0.0018769589999578784,
1527-
"tests/valuation/methods/test_semivalues.py::test_coefficients[DataShapleyValuation-kwargs4-10]": 0.0016063749999375432,
1526+
"tests/valuation/methods/test_semivalues.py::test_coefficients[ShapleyValuation-kwargs4-100]": 0.0018769589999578784,
1527+
"tests/valuation/methods/test_semivalues.py::test_coefficients[ShapleyValuation-kwargs4-10]": 0.0016063749999375432,
15281528
"tests/valuation/methods/test_semivalues.py::test_msr_banzhaf[5]": 9.342398666999998,
15291529
"tests/valuation/methods/test_semivalues.py::test_shapley_batch_size[1-test_game0]": 0.07176091700006282,
15301530
"tests/valuation/methods/test_semivalues.py::test_shapley_batch_size[2-test_game0]": 3.8395362910000586,
@@ -1640,8 +1640,8 @@
16401640
"tests/valuation/test_interface.py::test_data_banzhaf_valuation[2]": 1.2780167490000025,
16411641
"tests/valuation/test_interface.py::test_data_beta_shapley_valuation[1]": 4.139234666999997,
16421642
"tests/valuation/test_interface.py::test_data_beta_shapley_valuation[2]": 3.603092916999998,
1643-
"tests/valuation/test_interface.py::test_data_shapley_valuation[1]": 0.27120083299999465,
1644-
"tests/valuation/test_interface.py::test_data_shapley_valuation[2]": 0.15037520699999618,
1643+
"tests/valuation/test_interface.py::test_shapley_valuation[1]": 0.27120083299999465,
1644+
"tests/valuation/test_interface.py::test_shapley_valuation[2]": 0.15037520699999618,
16451645
"tests/valuation/test_interface.py::test_data_utility_learning[1]": 0.026216332999993597,
16461646
"tests/valuation/test_interface.py::test_data_utility_learning[2]": 0.06457645800000478,
16471647
"tests/valuation/test_interface.py::test_delta_shapley_valuation[1]": 3.562169998999977,
@@ -1941,4 +1941,4 @@
19411941
"tests/value/test_stopping.py::test_standard_error": 0.0020545429999856424,
19421942
"tests/value/test_stopping.py::test_stopping_criterion": 0.0016162080000015067,
19431943
"tests/value/test_stopping.py::test_stopping_criterion_composition": 0.0024397500000077343
1944-
}
1944+
}

CHANGELOG.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,16 @@
22

33
## Unreleased
44

5+
56
### Added
67

8+
- Introduced the concept of `ResultUpdater` in order to allow samplers to
9+
declare the proper strategy to use by valuations
10+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
11+
- Added Banzhaf precomputed values to some games.
12+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
13+
- Introduced new `IndexIterations`, for consistent usage across all
14+
`PowersetSamplers` [PR #641](https://github.com/aai-institute/pyDVL/pull/641)
715
- Added `run_removal_experiment` for easy removal experiments
816
[PR #636](https://github.com/aai-institute/pyDVL/pull/636)
917
- Refactor Classwise Shapley valuation with the interfaces and sampler
@@ -12,22 +20,24 @@
1220
[PR #610](https://github.com/aai-institute/pyDVL/pull/610)
1321
- Refactor MSR Banzhaf semivalues with the new sampler architecture.
1422
[PR #605](https://github.com/aai-institute/pyDVL/pull/605)
23+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
1524
- Refactor group-testing shapley values with new sampler architecture
1625
[PR #602](https://github.com/aai-institute/pyDVL/pull/602)
1726
- Refactor least-core data valuation methods with more supported sampling
1827
methods and consistent interface.
1928
[PR #580](https://github.com/aai-institute/pyDVL/pull/580)
20-
- Refactor Owen-Shapley valuation with new sampler architecture
29+
- Refactor Owen-Shapley valuation with new sampler architecture. Enable use of
30+
`OwenSamplers` with all semi-values
2131
[PR #597](https://github.com/aai-institute/pyDVL/pull/597)
32+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
2233
- New method `InverseHarmonicMeanInfluence`, implementation for the paper
2334
`DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and
2435
Diffusion Models`
2536
[PR #582](https://github.com/aai-institute/pyDVL/pull/582)
26-
- Add new backend implementations for influence computation
27-
to account for block-diagonal approximations
37+
- Add new backend implementations for influence computation to account for
38+
block-diagonal approximations
2839
[PR #582](https://github.com/aai-institute/pyDVL/pull/582)
29-
- Extend `DirectInfluence` with block-diagonal and Gauss-Newton
30-
approximation
40+
- Extend `DirectInfluence` with block-diagonal and Gauss-Newton approximation
3141
[PR #591](https://github.com/aai-institute/pyDVL/pull/591)
3242
- Extend `LissaInfluence` with block-diagonal and Gauss-Newton approximation
3343
[PR #593](https://github.com/aai-institute/pyDVL/pull/593)
@@ -37,12 +47,19 @@
3747
- Extend `ArnoldiInfluence` with block-diagonal and Gauss-Newton
3848
approximation
3949
[PR #598](https://github.com/aai-institute/pyDVL/pull/598)
40-
- Extend `CgInfluence` with block-diagonal and Gauss-Newton
41-
approximation
50+
- Extend `CgInfluence` with block-diagonal and Gauss-Newton approximation
4251
[PR #601](https://github.com/aai-institute/pyDVL/pull/601)
4352

4453
### Fixed
4554

55+
- Fixed several bugs in diverse stopping criteria, including: iteration counts,
56+
computing completion and resetting
57+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
58+
- Fixed all weights of all samplers to ensure that mix-and-matching samplers and
59+
semi-value methods always works, for all possible combinations
60+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
61+
- Fixed a bug whereby progress bars would not report the last step and remain
62+
incomplete [PR #641](https://github.com/aai-institute/pyDVL/pull/641)
4663
- Fixed the analysis of the adult dataset in the Data-OOB notebook
4764
[PR #636](https://github.com/aai-institute/pyDVL/pull/636)
4865
- Replace `np.float_` with `np.float64` and `np.alltrue` with `np.all`,
@@ -59,6 +76,14 @@
5976

6077
### Changed
6178

79+
- Updated and rewrote some of the MSR banzhaf notebook
80+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
81+
- Updated Least-Core notebook
82+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
83+
- Restructured and generalized `StratifiedSampler` to allow using heuristics,
84+
thus subsuming Variance-Reduced stratified sampling into a unified framework.
85+
Implemented the heuristics proposed in that paper
86+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
6287
- Changed the way semi-value coefficients are composed with sampler weights in
6388
order to avoid `OverflowError` for very small or large values
6489
[PR #639](https://github.com/aai-institute/pyDVL/pull/639)

docs/value/classwise-shapley.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
---
22
title: Class-wise Shapley
3+
alias: classwise-shapley
34
---
45

5-
# Class-wise Shapley
6+
# Class-wise Shapley { #intro-to-cw-shapley }
67

78
Class-wise Shapley (CWS) [@schoch_csshapley_2022] offers a Shapley framework
89
tailored for classification problems. Given a sample $x_i$ with label $y_i \in

notebooks/data_oob.ipynb

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@
127127
" train, test = load_adult_data(\n",
128128
" train_size=train_size, subsample=0.01, random_state=random_state\n",
129129
" )\n",
130-
" n_jobs = 2\n",
130+
" n_jobs = 1\n",
131131
" n_runs = 1\n",
132132
" n_est = 10"
133133
]
@@ -536,6 +536,20 @@
536536
"removal_percentages = np.arange(0, 0.51, 0.02)"
537537
]
538538
},
539+
{
540+
"cell_type": "code",
541+
"execution_count": null,
542+
"metadata": {
543+
"tags": [
544+
"hide"
545+
]
546+
},
547+
"outputs": [],
548+
"source": [
549+
"if is_CI:\n",
550+
" removal_percentages = np.arange(0, 0.51, 0.3)"
551+
]
552+
},
539553
{
540554
"cell_type": "code",
541555
"execution_count": 15,
@@ -769,10 +783,6 @@
769783
"source": [
770784
"from support.common import ConstantBinaryClassifier\n",
771785
"\n",
772-
"train, test = load_adult_data(\n",
773-
" train_size=train_size, subsample=0.2, random_state=random_state\n",
774-
")\n",
775-
"\n",
776786
"probs = [0.01, 0.5, 0.99]\n",
777787
"all_values = []\n",
778788
"for p in probs:\n",

0 commit comments

Comments
 (0)