Skip to content

Commit bdf7d9b

Browse files
committed
Merge branch 'develop' into feature/logspace-coefficients
# Conflicts: # tests/utils/test_numeric.py # tests/valuation/samplers/test_sampler.py # tests/valuation/test_interface.py
2 parents 5453dd4 + dd17d8f commit bdf7d9b

23 files changed

+557
-1172
lines changed

CHANGELOG.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,16 @@
22

33
## Unreleased
44

5+
56
### Added
67

8+
- Introduced the concept of `ResultUpdater` in order to allow samplers to
9+
declare the proper strategy to use by valuations
10+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
11+
- Added Banzhaf precomputed values to some games.
12+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
13+
- Introduced new `IndexIterations`, for consistent usage across all
14+
`PowersetSamplers` [PR #641](https://github.com/aai-institute/pyDVL/pull/641)
715
- Added `run_removal_experiment` for easy removal experiments
816
[PR #636](https://github.com/aai-institute/pyDVL/pull/636)
917
- Refactor Classwise Shapley valuation with the interfaces and sampler
@@ -12,22 +20,24 @@
1220
[PR #610](https://github.com/aai-institute/pyDVL/pull/610)
1321
- Refactor MSR Banzhaf semivalues with the new sampler architecture.
1422
[PR #605](https://github.com/aai-institute/pyDVL/pull/605)
23+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
1524
- Refactor group-testing shapley values with new sampler architecture
1625
[PR #602](https://github.com/aai-institute/pyDVL/pull/602)
1726
- Refactor least-core data valuation methods with more supported sampling
1827
methods and consistent interface.
1928
[PR #580](https://github.com/aai-institute/pyDVL/pull/580)
20-
- Refactor Owen-Shapley valuation with new sampler architecture
29+
- Refactor Owen-Shapley valuation with new sampler architecture. Enable use of
30+
`OwenSamplers` with all semi-values
2131
[PR #597](https://github.com/aai-institute/pyDVL/pull/597)
32+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
2233
- New method `InverseHarmonicMeanInfluence`, implementation for the paper
2334
`DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and
2435
Diffusion Models`
2536
[PR #582](https://github.com/aai-institute/pyDVL/pull/582)
26-
- Add new backend implementations for influence computation
27-
to account for block-diagonal approximations
37+
- Add new backend implementations for influence computation to account for
38+
block-diagonal approximations
2839
[PR #582](https://github.com/aai-institute/pyDVL/pull/582)
29-
- Extend `DirectInfluence` with block-diagonal and Gauss-Newton
30-
approximation
40+
- Extend `DirectInfluence` with block-diagonal and Gauss-Newton approximation
3141
[PR #591](https://github.com/aai-institute/pyDVL/pull/591)
3242
- Extend `LissaInfluence` with block-diagonal and Gauss-Newton approximation
3343
[PR #593](https://github.com/aai-institute/pyDVL/pull/593)
@@ -37,12 +47,19 @@
3747
- Extend `ArnoldiInfluence` with block-diagonal and Gauss-Newton
3848
approximation
3949
[PR #598](https://github.com/aai-institute/pyDVL/pull/598)
40-
- Extend `CgInfluence` with block-diagonal and Gauss-Newton
41-
approximation
50+
- Extend `CgInfluence` with block-diagonal and Gauss-Newton approximation
4251
[PR #601](https://github.com/aai-institute/pyDVL/pull/601)
4352

4453
### Fixed
4554

55+
- Fixed several bugs in diverse stopping criteria, including: iteration counts,
56+
computing completion and resetting
57+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
58+
- Fixed all weights of all samplers to ensure that mix-and-matching samplers and
59+
semi-value methods always works, for all possible combinations
60+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
61+
- Fixed a bug whereby progress bars would not report the last step and remain
62+
incomplete [PR #641](https://github.com/aai-institute/pyDVL/pull/641)
4663
- Fixed the analysis of the adult dataset in the Data-OOB notebook
4764
[PR #636](https://github.com/aai-institute/pyDVL/pull/636)
4865
- Replace `np.float_` with `np.float64` and `np.alltrue` with `np.all`,
@@ -59,6 +76,14 @@
5976

6077
### Changed
6178

79+
- Updated and rewrote some of the MSR banzhaf notebook
80+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
81+
- Updated Least-Core notebook
82+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
83+
- Restructured and generalized `StratifiedSampler` to allow using heuristics,
84+
thus subsuming Variance-Reduced stratified sampling into a unified framework.
85+
Implemented the heuristics proposed in that paper
86+
[PR #641](https://github.com/aai-institute/pyDVL/pull/641)
6287
- Changed the way semi-value coefficients are composed with sampler weights in
6388
order to avoid `OverflowError` for very small or large values
6489
[PR #639](https://github.com/aai-institute/pyDVL/pull/639)

notebooks/data_oob.ipynb

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@
127127
" train, test = load_adult_data(\n",
128128
" train_size=train_size, subsample=0.01, random_state=random_state\n",
129129
" )\n",
130-
" n_jobs = 2\n",
130+
" n_jobs = 1\n",
131131
" n_runs = 1\n",
132132
" n_est = 10"
133133
]
@@ -536,6 +536,20 @@
536536
"removal_percentages = np.arange(0, 0.51, 0.02)"
537537
]
538538
},
539+
{
540+
"cell_type": "code",
541+
"execution_count": null,
542+
"metadata": {
543+
"tags": [
544+
"hide"
545+
]
546+
},
547+
"outputs": [],
548+
"source": [
549+
"if is_CI:\n",
550+
" removal_percentages = np.arange(0, 0.51, 0.3)"
551+
]
552+
},
539553
{
540554
"cell_type": "code",
541555
"execution_count": 15,
@@ -769,10 +783,6 @@
769783
"source": [
770784
"from support.common import ConstantBinaryClassifier\n",
771785
"\n",
772-
"train, test = load_adult_data(\n",
773-
" train_size=train_size, subsample=0.2, random_state=random_state\n",
774-
")\n",
775-
"\n",
776786
"probs = [0.01, 0.5, 0.99]\n",
777787
"all_values = []\n",
778788
"for p in probs:\n",

0 commit comments

Comments
 (0)