v0.10.0 - 💥📚🐞🆕 New valuation interface, improved docs, new methods, breaking changes and tons of improvements
After lots of work, bug-fixing, bug-introducing, fixing again, and a good measure of bike shedding, we bring a major update putting us closer to the final APIs. The main goals of this release were to improve usability, documentation, and extensibility.
- We have added a new module
pydvl.valuation. Thepydvl.valuemodule is deprecated and will be removed in the next release. The new interface allows for a more consistent and flexible way to define and use valuation methods. It also simplifies experimentation, manipulation of results and data, as well as parallelization. - We have many improvements to the
influencemodule including several new methods and approximations. - The whole documentation has been improved and consolidated, with detailed method descriptions and examples. See pydvl.org.
Added
- Simple result serialization to resume computation of values PR #666
- Simple memory monitor / reporting PR #663
- New stopping criterion
MaxSamplesPR #661 - Introduced
UtilityModeland two implementationsIndicatorUtilityModelandDeepSetsUtilityModelfor data utility learning PR #650 - Introduced the concept of
ResultUpdaterin order to allow samplers to declare the proper strategy to use by valuations PR #641 - Added Banzhaf precomputed values to some games. PR #641
- Introduced new
IndexIterations, for consistent usage across allPowersetSamplersPR #641 - Added
run_removal_experimentfor easy removal experiments PR #636 - Refactor Classwise Shapley valuation with the interfaces and sampler architecture PR #616
- Refactor KNN Shapley values with the new interface PR #610 PR #645
- Refactor MSR Banzhaf semivalues with the new sampler architecture. PR #605 PR #641
- Refactor group-testing shapley values with new sampler architecture PR #602
- Refactor least-core data valuation methods with more supported sampling methods and consistent interface. PR #580
- Refactor Owen-Shapley valuation with new sampler architecture. Enable use of
OwenSamplerswith all semi-values PR #597 PR #641 - New method
InverseHarmonicMeanInfluence, implementation for the paperDataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion ModelsPR #582 - Add new backend implementations for influence computation to account for block-diagonal approximations PR #582
- Extend
DirectInfluencewith block-diagonal and Gauss-Newton approximation PR #591 - Extend
LissaInfluencewith block-diagonal and Gauss-Newton approximation PR #593 - Extend
NystroemSketchInfluencewith block-diagonal and Gauss-Newton approximation PR #596 - Extend
ArnoldiInfluencewith block-diagonal and Gauss-Newton approximation PR #598 - Extend
CgInfluencewith block-diagonal and Gauss-Newton approximation PR #601
Fixed
- Fixed
show_warnings=Falsenot being respected in subprocesses. Introducedsuppress_warninigsdecorator for more flexibility PR #647 PR #662 - Fixed several bugs in diverse stopping criteria, including: iteration counts, computing completion, resetting, nested composition PR #641 PR #650
- Fixed all weights of all samplers to ensure that mix-and-matching samplers and semi-value methods always works, for all possible combinations PR #641
- Fixed a bug whereby progress bars would not report the last step and remain incomplete PR #641
- Fixed the analysis of the adult dataset in the Data-OOB notebook PR #636
- Replace
np.float_withnp.float64andnp.alltruewithnp.all, as the old aliases are removed in NumPy 2.0 PR #604 - Fix a bug in
pydvl.utils.numeric.random_subsetwhere1 - qwas used instead ofqas the probability of an element being sampled PR #597 - Fix a bug in the calculation of variance estimates for MSR Banzhaf PR #605
- Fix a bug in KNN Shapley values. See Issue 613 for details.
- Backport the KNN Shapley fix to the
valuemodule PR #633
Changed
- Slicing, comparing and setting of
ValuationResultbehave in a more natural and consistent way PR #660 PR #666 - Switched all semi-value coefficients and sampler weights to log-space in order to avoid overflows PR #643
- Updated and rewrote some of the MSR banzhaf notebook PR #641
- Updated Least-Core notebook PR #641
- Updated Shapley spotify notebook PR #628
- Updated Data Utility notebook PR #650
- Restructured and generalized
StratifiedSamplerto allow using heuristics, thus subsuming Variance-Reduced stratified sampling into a unified framework. Implemented the heuristics proposed in that paper PR #641 - Uniformly distribute test points across processes for KNNShapley. Fail for
GroupedDatasetPR #632 - Introduced the concept of logical vs data indices for
Dataset, andGroupedDataset, fixing inconsistencies in how the latter operates on indices. Also, both now return objects of the same type when slicing. PR #631 PR #648 - Use tighter bounds for the calculation of the minimal sample size that guarantees an epsilon-delta approximation in group testing (Jia et al. 2023) PR #602
- Dropped black, isort and pylint from the CI pipeline, in favour of ruff PR #633
- Breaking Changes
- Changed
DataOOBValuationto only accept bagged models PR #636 - Dropped support for python 3.8 after EOL PR #633 - Rename parameter
hessian_regularizationofDirectInfluencetoregularizationand change the type annotation to allow for block-wise regularization parameters PR #591 - Rename parameter
hessian_regularizationofLissaInfluencetoregularizationand change the type annotation to allow for block-wise regularization parameters PR #593 - Remove parameter
h0from init ofLissaInfluencePR #593 - Rename parameter
hessian_regularizationofNystroemSketchInfluencetoregularizationand change the type annotation to allow for block-wise regularization parameters PR #596 - Renaming of parameters of
ArnoldiInfluence,hessian_regularization->regularization(modify type annotation),rank_estimate->rankPR #598 - Remove functions remove obsolete functions
lanczos_low_rank_hessian_approximation,model_hessian_low_rank
frominfluence.torch.functionalPR #598 - Renaming of parameters of
CgInfluence,hessian_regularization->regularization(modify type annotation),pre_conditioner->preconditioner,use_block_cg->solve_simultaneouslyPR #601 - Remove parameter
x0fromCgInfluencePR #601 - Rename module
influence.torch.pre_conditioner->influence.torch.preconditionerPR #601 - Refactor preconditioner:
- Changed
Full diff: v0.9.2...v0.10.0