aai-institute
diff --git a/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 23 additions & 17 deletions b/‎README.md‎
Lines changed: 23 additions & 17 deletions
diff --git a/‎docs/10-getting-started.rst‎
Lines changed: 15 additions & 15 deletions b/‎docs/10-getting-started.rst‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎docs/20-install.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/20-install.rst‎
Lines changed: 1 addition & 1 deletion
@@ -1,6 +1,6 @@
 # Changelog
 
-## Unreleased
+## 0.3.0 - 💥 Breaking changes
 
 - Simplified and fixed powerset sampling and testing
   [PR #181](https://github.com/appliedAI-Initiative/pyDVL/pull/181)
@@ -12,6 +12,12 @@
   [PR #185](https://github.com/appliedAI-Initiative/pyDVL/pull/185)
 - Modified Pull Request template to automatically link PR to issue
   [PR ##186](https://github.com/appliedAI-Initiative/pyDVL/pull/186)
+- First implementation of Owen Sampling, squashed scores, better testing
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
+- Improved documentation on caching, Shapley, caveats of values, bibtex
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
+- **Breaking change:** Rearranging of modules to accommodate for new methods
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
 
 
 ## 0.2.0 - 📚 Better docs
 
@@ -32,24 +32,28 @@ Data Valuation is the task of estimating the intrinsic value of a data point
 wrt. the training set, the model and a scoring function. We currently implement
 methods from the following papers:
 
-- Ghorbani, Amirata, and James Zou. ‘Data Shapley: Equitable Valuation of Data for
-  Machine Learning’. In International Conference on Machine Learning, 2242–51.
-  PMLR, 2019. http://proceedings.mlr.press/v97/ghorbani19c.html.
-- Wang, Tianhao, Yu Yang, and Ruoxi Jia. ‘Improving Cooperative Game Theory-Based
-  Data Valuation via Data Utility Learning’. arXiv, 2022.
-  https://doi.org/10.48550/arXiv.2107.06336.
+- Ghorbani, Amirata, and James Zou. 
+  [Data Shapley: Equitable Valuation of Data for Machine Learning](http://proceedings.mlr.press/v97/ghorbani19c.html).
+  In International Conference on Machine Learning, 2242–51. PMLR, 2019.
+- Wang, Tianhao, Yu Yang, and Ruoxi Jia. 
+  [Improving Cooperative Game Theory-Based Data Valuation via Data Utility Learning](https://doi.org/10.48550/arXiv.2107.06336).
+  arXiv, 2022.
 - Jia, Ruoxi, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li,
-  Ce Zhang, Costas Spanos, and Dawn Song. ‘Efficient Task-Specific Data Valuation
-  for Nearest Neighbor Algorithms’. Proceedings of the VLDB Endowment 12, no. 11 (1
-  July 2019): 1610–23. https://doi.org/10.14778/3342263.3342637.
+  Ce Zhang, Costas Spanos, and Dawn Song.
+  [Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms](https://doi.org/10.14778/3342263.3342637).
+  Proceedings of the VLDB Endowment 12, no. 11 (1 July 2019): 1610–23.
+- Okhrati, Ramin, and Aldo Lipani.
+  [A Multilinear Sampling Algorithm to Estimate Shapley Values](https://doi.org/10.1109/ICPR48806.2021.9412511).
+  In 2020 25th International Conference on Pattern Recognition (ICPR), 7992–99.
+  IEEE, 2021.
 
 Influence Functions compute the effect that single points have on an estimator /
 model. We implement methods from the following papers:
 
-- Koh, Pang Wei, and Percy Liang. ‘Understanding Black-Box Predictions via
-  Influence Functions’. In Proceedings of the 34th International Conference on
-  Machine Learning, 70:1885–94. Sydney, Australia: PMLR, 2017.
-  http://proceedings.mlr.press/v70/koh17a.html.
+- Koh, Pang Wei, and Percy Liang.
+  [Understanding Black-Box Predictions via Influence Functions](http://proceedings.mlr.press/v70/koh17a.html).
+  In Proceedings of the 34th International Conference on Machine Learning,
+  70:1885–94. Sydney, Australia: PMLR, 2017.
 
 # Installation
 
@@ -98,18 +102,20 @@ Data Shapley values:
 ```python
 import numpy as np
 from pydvl.utils import Dataset, Utility
-from pydvl.shapley import compute_shapley_values
+from pydvl.value.shapley import compute_shapley_values
 from sklearn.linear_model import LinearRegression
 from sklearn.model_selection import train_test_split
 
 X, y = np.arange(100).reshape((50, 2)), np.arange(50)
 X_train, X_test, y_train, y_test = train_test_split(
-    X, y, test_size=0.5, random_state=16
-)
+        X, y, test_size=0.5, random_state=16
+        )
 dataset = Dataset(X_train, y_train, X_test, y_test)
 model = LinearRegression()
 utility = Utility(model, dataset)
-values, errors = compute_shapley_values(u=utility, max_iterations=100)
+values = compute_shapley_values(
+        u=utility, max_iterations=100, mode="truncated_montecarlo"
+    )
 ```
 
 For more instructions and information refer to [Getting
 
@@ -4,13 +4,22 @@
 Getting started
 ===============
 
-Make sure you have :ref:`installed pyDVL <pyDVL Installation>` before proceeding
-further.
+.. warning::
+   Make sure you have read :ref:`the installation instructions
+   <pyDVL Installation>` before using the library. In particular read about how
+   caching and parallelization work, since they require additional setup.
 
-.. note::
-   We provide minimal overviews of key concepts in :ref:`data valuation` and
-   :ref:`influence`. For an in-depth survey of the field, we refer to the review on
-   the topic at the :tfl:`TransferLab website <>`.
+pyDVL aims to be a repository of production-ready, reference implementations of
+algorithms for data valuation and influence functions. You can read:
+
+* :ref:`data valuation` for key objects and usage patterns for Shapley value
+  computation and related methods.
+* :ref:`influence` for instruction on how to compute influence functions (still
+  in a pre-alpha state)
+
+We only briefly introduce key concepts in the documentation. For a thorough
+introduction and survey of the field, we refer to **the upcoming review** at the
+:tfl:`TransferLab website <>`.
 
 Running the examples
 ====================
@@ -24,12 +33,3 @@ by browsing our worked-out examples illustrating pyDVL's capabilities either:
 - Locally, by starting a jupyter server at the root of the project. You will
   have to install jupyter first manually since it's not a dependency of the
   library.
-
-Methods covered
-===============
-
-pyDVL offers algorithms for data valuation and computation of influence
-functions. You can read more about each family of methods here:
-
-- :ref:`data valuation`.
-- :ref:`influence`.
@@ -45,7 +45,7 @@ the instructions in their documentation for installation.
 .. _caching setup:
 
 Setting up the cache
---------------------
+====================
 
 memcached is an in-memory key-value store accessible over the network. pyDVL
 uses it to cache certain results and speed-up the computations. You can either