@@ -32,24 +32,28 @@ Data Valuation is the task of estimating the intrinsic value of a data point
3232wrt. the training set, the model and a scoring function. We currently implement
3333methods from the following papers:
3434
35- - Ghorbani, Amirata, and James Zou. ‘Data Shapley: Equitable Valuation of Data for
36- Machine Learning’. In International Conference on Machine Learning, 2242–51 .
37- PMLR, 2019. http://proceedings.mlr.press/v97/ghorbani19c.html .
38- - Wang, Tianhao, Yu Yang, and Ruoxi Jia. ‘Improving Cooperative Game Theory-Based
39- Data Valuation via Data Utility Learning’. arXiv, 2022 .
40- https://doi.org/10.48550/ arXiv.2107.06336 .
35+ - Ghorbani, Amirata, and James Zou.
36+ [ Data Shapley: Equitable Valuation of Data for Machine Learning] ( http://proceedings.mlr.press/v97/ghorbani19c.html ) .
37+ In International Conference on Machine Learning, 2242–51. PMLR, 2019 .
38+ - Wang, Tianhao, Yu Yang, and Ruoxi Jia.
39+ [ Improving Cooperative Game Theory-Based Data Valuation via Data Utility Learning] ( https://doi.org/10.48550/ arXiv.2107.06336 ) .
40+ arXiv, 2022 .
4141- Jia, Ruoxi, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li,
42- Ce Zhang, Costas Spanos, and Dawn Song. ‘Efficient Task-Specific Data Valuation
43- for Nearest Neighbor Algorithms’. Proceedings of the VLDB Endowment 12, no. 11 (1
44- July 2019): 1610–23. https://doi.org/10.14778/3342263.3342637 .
42+ Ce Zhang, Costas Spanos, and Dawn Song.
43+ [ Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms] ( https://doi.org/10.14778/3342263.3342637 ) .
44+ Proceedings of the VLDB Endowment 12, no. 11 (1 July 2019): 1610–23.
45+ - Okhrati, Ramin, and Aldo Lipani.
46+ [ A Multilinear Sampling Algorithm to Estimate Shapley Values] ( https://doi.org/10.1109/ICPR48806.2021.9412511 ) .
47+ In 2020 25th International Conference on Pattern Recognition (ICPR), 7992–99.
48+ IEEE, 2021.
4549
4650Influence Functions compute the effect that single points have on an estimator /
4751model. We implement methods from the following papers:
4852
49- - Koh, Pang Wei, and Percy Liang. ‘Understanding Black-Box Predictions via
50- Influence Functions’. In Proceedings of the 34th International Conference on
51- Machine Learning, 70:1885–94. Sydney, Australia: PMLR, 2017.
52- http://proceedings.mlr.press/v70/koh17a.html .
53+ - Koh, Pang Wei, and Percy Liang.
54+ [ Understanding Black-Box Predictions via Influence Functions ] ( http://proceedings.mlr.press/v70/koh17a.html ) .
55+ In Proceedings of the 34th International Conference on Machine Learning,
56+ 70:1885–94. Sydney, Australia: PMLR, 2017 .
5357
5458# Installation
5559
@@ -98,18 +102,20 @@ Data Shapley values:
98102``` python
99103import numpy as np
100104from pydvl.utils import Dataset, Utility
101- from pydvl.shapley import compute_shapley_values
105+ from pydvl.value. shapley import compute_shapley_values
102106from sklearn.linear_model import LinearRegression
103107from sklearn.model_selection import train_test_split
104108
105109X, y = np.arange(100 ).reshape((50 , 2 )), np.arange(50 )
106110X_train, X_test, y_train, y_test = train_test_split(
107- X, y, test_size = 0.5 , random_state = 16
108- )
111+ X, y, test_size = 0.5 , random_state = 16
112+ )
109113dataset = Dataset(X_train, y_train, X_test, y_test)
110114model = LinearRegression()
111115utility = Utility(model, dataset)
112- values, errors = compute_shapley_values(u = utility, max_iterations = 100 )
116+ values = compute_shapley_values(
117+ u = utility, max_iterations = 100 , mode = " truncated_montecarlo"
118+ )
113119```
114120
115121For more instructions and information refer to [ Getting
0 commit comments