Release v0.10.0 · aai-institute/pyDVL

v0.10.0 - 💥📚🐞🆕 New valuation interface, improved docs, new methods, breaking changes and tons of improvements

After lots of work, bug-fixing, bug-introducing, fixing again, and a good measure of bike shedding, we bring a major update putting us closer to the final APIs. The main goals of this release were to improve usability, documentation, and extensibility.

We have added a new module pydvl.valuation. The pydvl.value module is deprecated and will be removed in the next release. The new interface allows for a more consistent and flexible way to define and use valuation methods. It also simplifies experimentation, manipulation of results and data, as well as parallelization.
We have many improvements to the influence module including several new methods and approximations.
The whole documentation has been improved and consolidated, with detailed method descriptions and examples. See pydvl.org.

Added

Simple result serialization to resume computation of values PR #666
Simple memory monitor / reporting PR #663
New stopping criterion MaxSamples PR #661
Introduced UtilityModel and two implementations IndicatorUtilityModel and DeepSetsUtilityModel for data utility learning PR #650
Introduced the concept of ResultUpdater in order to allow samplers to declare the proper strategy to use by valuations PR #641
Added Banzhaf precomputed values to some games. PR #641
Introduced new IndexIterations, for consistent usage across all PowersetSamplers PR #641
Added run_removal_experiment for easy removal experiments PR #636
Refactor Classwise Shapley valuation with the interfaces and sampler architecture PR #616
Refactor KNN Shapley values with the new interface PR #610 PR #645
Refactor MSR Banzhaf semivalues with the new sampler architecture. PR #605 PR #641
Refactor group-testing shapley values with new sampler architecture PR #602
Refactor least-core data valuation methods with more supported sampling methods and consistent interface. PR #580
Refactor Owen-Shapley valuation with new sampler architecture. Enable use of OwenSamplers with all semi-values PR #597 PR #641
New method InverseHarmonicMeanInfluence, implementation for the paper DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models PR #582
Add new backend implementations for influence computation to account for block-diagonal approximations PR #582
Extend DirectInfluence with block-diagonal and Gauss-Newton approximation PR #591
Extend LissaInfluence with block-diagonal and Gauss-Newton approximation PR #593
Extend NystroemSketchInfluence with block-diagonal and Gauss-Newton approximation PR #596
Extend ArnoldiInfluence with block-diagonal and Gauss-Newton approximation PR #598
Extend CgInfluence with block-diagonal and Gauss-Newton approximation PR #601

Fixed

Fixed show_warnings=False not being respected in subprocesses. Introduced suppress_warninigs decorator for more flexibility PR #647 PR #662
Fixed several bugs in diverse stopping criteria, including: iteration counts, computing completion, resetting, nested composition PR #641 PR #650
Fixed all weights of all samplers to ensure that mix-and-matching samplers and semi-value methods always works, for all possible combinations PR #641
Fixed a bug whereby progress bars would not report the last step and remain incomplete PR #641
Fixed the analysis of the adult dataset in the Data-OOB notebook PR #636
Replace np.float_ with np.float64 and np.alltrue with np.all, as the old aliases are removed in NumPy 2.0 PR #604
Fix a bug in pydvl.utils.numeric.random_subset where 1 - q was used instead of q as the probability of an element being sampled PR #597
Fix a bug in the calculation of variance estimates for MSR Banzhaf PR #605
Fix a bug in KNN Shapley values. See Issue 613 for details.
Backport the KNN Shapley fix to the value module PR #633

Changed

Slicing, comparing and setting of ValuationResult behave in a more natural and consistent way PR #660 PR #666
Switched all semi-value coefficients and sampler weights to log-space in order to avoid overflows PR #643
Updated and rewrote some of the MSR banzhaf notebook PR #641
Updated Least-Core notebook PR #641
Updated Shapley spotify notebook PR #628
Updated Data Utility notebook PR #650
Restructured and generalized StratifiedSampler to allow using heuristics, thus subsuming Variance-Reduced stratified sampling into a unified framework. Implemented the heuristics proposed in that paper PR #641
Uniformly distribute test points across processes for KNNShapley. Fail for GroupedDataset PR #632
Introduced the concept of logical vs data indices for Dataset, and GroupedDataset, fixing inconsistencies in how the latter operates on indices. Also, both now return objects of the same type when slicing. PR #631 PR #648
Use tighter bounds for the calculation of the minimal sample size that guarantees an epsilon-delta approximation in group testing (Jia et al. 2023) PR #602
Dropped black, isort and pylint from the CI pipeline, in favour of ruff PR #633
Breaking Changes
- Changed DataOOBValuation to only accept bagged models PR #636
- Dropped support for python 3.8 after EOL PR #633 - Rename parameter hessian_regularization of DirectInfluence to regularization and change the type annotation to allow for block-wise regularization parameters PR #591
- Rename parameter hessian_regularization of LissaInfluence to regularization and change the type annotation to allow for block-wise regularization parameters PR #593
- Remove parameter h0 from init of LissaInfluence PR #593
- Rename parameter hessian_regularization of NystroemSketchInfluence to regularization and change the type annotation to allow for block-wise regularization parameters PR #596
- Renaming of parameters of ArnoldiInfluence, hessian_regularization -> regularization (modify type annotation), rank_estimate -> rank PR #598
- Remove functions remove obsolete functions lanczos_low_rank_hessian_approximation, model_hessian_low_rank
  from influence.torch.functional PR #598
- Renaming of parameters of CgInfluence, hessian_regularization -> regularization (modify type annotation), pre_conditioner -> preconditioner, use_block_cg -> solve_simultaneously PR #601
- Remove parameter x0 from CgInfluence PR #601
- Rename module influence.torch.pre_conditioner -> influence.torch.preconditioner PR #601
- Refactor preconditioner:
  - renaming PreConditioner -> Preconditioner
  - fit to TensorOperator PR #601
  - Bumped zarr dependency to v3 PR #668

Full diff: v0.9.2...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.10.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.10.0 - 💥📚🐞🆕 New valuation interface, improved docs, new methods, breaking changes and tons of improvements

Added

Fixed

Changed

Uh oh!