aai-institute
diff --git a/‎.bumpversion.cfg‎
Lines changed: 1 addition & 1 deletion b/‎.bumpversion.cfg‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 20 additions & 0 deletions b/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎.github/workflows/publish.yaml‎
Lines changed: 6 additions & 2 deletions b/‎.github/workflows/publish.yaml‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎.github/workflows/tox.yaml‎
Lines changed: 0 additions & 1 deletion b/‎.github/workflows/tox.yaml‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 38 additions & 8 deletions b/‎CONTRIBUTING.md‎
Lines changed: 38 additions & 8 deletions
diff --git a/‎README.md‎
Lines changed: 23 additions & 17 deletions b/‎README.md‎
Lines changed: 23 additions & 17 deletions
diff --git a/‎build_scripts/release-version.sh‎
Lines changed: 4 additions & 1 deletion b/‎build_scripts/release-version.sh‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/10-getting-started.rst‎
Lines changed: 15 additions & 15 deletions b/‎docs/10-getting-started.rst‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎docs/20-install.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/20-install.rst‎
Lines changed: 1 addition & 1 deletion
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.2.0
+current_version = 0.3.0
 commit = False
 tag = False
 allow_dirty = False
 
@@ -0,0 +1,20 @@
+<!--
+Thanks for making a contribution! 
+Please make sure you have read the contributing guide: 
+https://github.com/appliedAI-Initiative/pyDVL/blob/develop/CONTRIBUTING.md
+-->
+
+### Description
+
+This PR closes #XXX
+
+### Changes
+
+- 
+
+### Checklist
+
+- [ ] Wrote Unit tests (if necessary)
+- [ ] Updated Documentation (if necessary)
+- [ ] Updated Changelog
+- [ ] If notebooks were added/changed, added boilerplate cells are tagged with `"nbsphinx":"hidden"`
@@ -1,8 +1,9 @@
 name: Upload Python Package to PyPI
 
 on:
-  release:
-    types: [created]
+  push:
+    tags:
+    - "v*"
 
 env:
   PY_COLORS: 1
@@ -16,6 +17,9 @@ jobs:
       - uses: actions/checkout@v3
         with:
           fetch-depth: 0
+      - name: Fail if not on 'master' branch
+        if: github.ref != 'refs/heads/master'
+        run: exit -1
       - name: Set up Python 3.8
         uses: actions/setup-python@v4
         with:
 
@@ -39,7 +39,6 @@ jobs:
         path: .tox
     - name: Lint Code
       run: tox -e linting
-      continue-on-error: true
     - name: Check Type Hints
       run: tox -e type-checking
   docs:
 
@@ -1,5 +1,25 @@
 # Changelog
 
+## 0.3.0 - 💥 Breaking changes
+
+- Simplified and fixed powerset sampling and testing
+  [PR #181](https://github.com/appliedAI-Initiative/pyDVL/pull/181)
+- Simplified and fixed publishing to PyPI from CI
+  [PR #183](https://github.com/appliedAI-Initiative/pyDVL/pull/183)
+- Fixed bug in release script and updated contributing docs.
+  [PR #184](https://github.com/appliedAI-Initiative/pyDVL/pull/184)
+- Added Pull Request template
+  [PR #185](https://github.com/appliedAI-Initiative/pyDVL/pull/185)
+- Modified Pull Request template to automatically link PR to issue
+  [PR ##186](https://github.com/appliedAI-Initiative/pyDVL/pull/186)
+- First implementation of Owen Sampling, squashed scores, better testing
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
+- Improved documentation on caching, Shapley, caveats of values, bibtex
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
+- **Breaking change:** Rearranging of modules to accommodate for new methods
+  [PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
+
+
 ## 0.2.0 - 📚 Better docs
 
 Mostly API documentation and notebooks, plus some bugfixes.
 
@@ -173,6 +173,20 @@ any rst files which are not manually created), you can use a file watcher.
 This is not part of the development setup of pyDVL (yet! PRs welcome), but
 modern IDEs provide functionality for this.
 
+Use the **docs** tox environment to build the documentation the same way it is done in CI:
+
+```bash
+tox -e docs
+```
+
+Locally, you can use the **docs-dev** tox environment to continuously rebuild docs on changes:
+
+```bash
+tox -e docs-dev
+```
+
+**NOTE:** This currently only rebuilds on changes to `.rst` files and notebooks.
+
 ### Writing mathematics
 
 In sphinx one can write mathematics with the directives `:math:` (inline) or
@@ -201,7 +215,19 @@ def f(x: float) -> float:
 
 ## CI and release processes
 
-#### Automatic release process
+### Skipping CI run
+
+You sometimes would like to skip CI for certain commits (e.g. updating the readme). 
+In order to do that you can simply prefix the commit message with `[skip ci]`.
+
+- Other strings, like `[ci skip]` are allowed, but we prefer `[skip ci]`.
+- The string doesn't have to be at the beginning of the commit message, but we prefer doing it 
+  that way because it makes it immediately apparent when looking at commits in a PR.
+
+Refer to the official [Github documentation](https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs) 
+for more information.
+
+### Automatic release process
 
 In order to create an automatic release, a few prerequisites need to be
 satisfied:
@@ -212,7 +238,7 @@ satisfied:
 
 Then, a new release can be created using the script
 `build_scripts/release-version.sh` (leave out the version parameter to have
-`bumpversion` automatically derive the next release version):
+`bumpversion` automatically derive the next release version by bumping the patch part):
 
 ```shell script
 ./scripts/release-version.sh 0.1.6
@@ -228,14 +254,17 @@ If running in interactive mode (without `-y|--yes`), the script will output a
 summary of pending changes and ask for confirmation before executing the
 actions.
 
-#### Manual release process
+Once this is done, a package will be automatically created and published from CI to PyPI.
+
+### Manual release process
+
 If the automatic release process doesn't cover your use case, you can also
 create a new release manually by following these steps:
 
 1. (Repeat as needed) implement features on feature branches merged into
-  `develop`. Each merge into develop will advance the `.devNNN` version suffix
-   and publish the pre-release version into the package registry. These versions
-   can be installed using `pip install --pre`.
+  `develop`. Each merge into develop will publish a new pre-release version 
+   to TestPyPI. These versions can be installed using `pip install --pre 
+   --index-url https://test.pypi.org/simple/`.
 2. When ready to release: From the develop branch create the release branch and
    perform release activities (update changelog, news, ...). For your own
    convenience, define an env variable for the release version
@@ -269,6 +298,7 @@ create a new release manually by following these steps:
 7. Delete the release branch if necessary: 
    `git branch -d release/${RELEASE_VERSION}`
 8. Pour yourself a cup of coffee, you earned it! :coffee: :sparkles:
+9. A package will be automatically created and published from CI to PyPI.
 
 ### CI and requirements for releases
 
@@ -296,9 +326,9 @@ part of the version number, create a tag and push it from CI.
 
 To do that, we use 2 different tox environments:
 
+- **bump-dev-version**: Uses bump2version to bump the dev version,
+  without committing  the new version or creating a corresponding git tag.
 - **publish-test-package**: Builds and publishes a package to TestPyPI
-- **bump-dev-version-and-create-tag**: Uses bump2version to bump the dev version, 
-  commit the new version and create a corresponding git tag.
 
 
 ## Other useful information
 
@@ -32,24 +32,28 @@ Data Valuation is the task of estimating the intrinsic value of a data point
 wrt. the training set, the model and a scoring function. We currently implement
 methods from the following papers:
 
-- Ghorbani, Amirata, and James Zou. ‘Data Shapley: Equitable Valuation of Data for
-  Machine Learning’. In International Conference on Machine Learning, 2242–51.
-  PMLR, 2019. http://proceedings.mlr.press/v97/ghorbani19c.html.
-- Wang, Tianhao, Yu Yang, and Ruoxi Jia. ‘Improving Cooperative Game Theory-Based
-  Data Valuation via Data Utility Learning’. arXiv, 2022.
-  https://doi.org/10.48550/arXiv.2107.06336.
+- Ghorbani, Amirata, and James Zou. 
+  [Data Shapley: Equitable Valuation of Data for Machine Learning](http://proceedings.mlr.press/v97/ghorbani19c.html).
+  In International Conference on Machine Learning, 2242–51. PMLR, 2019.
+- Wang, Tianhao, Yu Yang, and Ruoxi Jia. 
+  [Improving Cooperative Game Theory-Based Data Valuation via Data Utility Learning](https://doi.org/10.48550/arXiv.2107.06336).
+  arXiv, 2022.
 - Jia, Ruoxi, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li,
-  Ce Zhang, Costas Spanos, and Dawn Song. ‘Efficient Task-Specific Data Valuation
-  for Nearest Neighbor Algorithms’. Proceedings of the VLDB Endowment 12, no. 11 (1
-  July 2019): 1610–23. https://doi.org/10.14778/3342263.3342637.
+  Ce Zhang, Costas Spanos, and Dawn Song.
+  [Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms](https://doi.org/10.14778/3342263.3342637).
+  Proceedings of the VLDB Endowment 12, no. 11 (1 July 2019): 1610–23.
+- Okhrati, Ramin, and Aldo Lipani.
+  [A Multilinear Sampling Algorithm to Estimate Shapley Values](https://doi.org/10.1109/ICPR48806.2021.9412511).
+  In 2020 25th International Conference on Pattern Recognition (ICPR), 7992–99.
+  IEEE, 2021.
 
 Influence Functions compute the effect that single points have on an estimator /
 model. We implement methods from the following papers:
 
-- Koh, Pang Wei, and Percy Liang. ‘Understanding Black-Box Predictions via
-  Influence Functions’. In Proceedings of the 34th International Conference on
-  Machine Learning, 70:1885–94. Sydney, Australia: PMLR, 2017.
-  http://proceedings.mlr.press/v70/koh17a.html.
+- Koh, Pang Wei, and Percy Liang.
+  [Understanding Black-Box Predictions via Influence Functions](http://proceedings.mlr.press/v70/koh17a.html).
+  In Proceedings of the 34th International Conference on Machine Learning,
+  70:1885–94. Sydney, Australia: PMLR, 2017.
 
 # Installation
 
@@ -98,18 +102,20 @@ Data Shapley values:
 ```python
 import numpy as np
 from pydvl.utils import Dataset, Utility
-from pydvl.shapley import compute_shapley_values
+from pydvl.value.shapley import compute_shapley_values
 from sklearn.linear_model import LinearRegression
 from sklearn.model_selection import train_test_split
 
 X, y = np.arange(100).reshape((50, 2)), np.arange(50)
 X_train, X_test, y_train, y_test = train_test_split(
-    X, y, test_size=0.5, random_state=16
-)
+        X, y, test_size=0.5, random_state=16
+        )
 dataset = Dataset(X_train, y_train, X_test, y_test)
 model = LinearRegression()
 utility = Utility(model, dataset)
-values, errors = compute_shapley_values(u=utility, max_iterations=100)
+values = compute_shapley_values(
+        u=utility, max_iterations=100, mode="truncated_montecarlo"
+    )
 ```
 
 For more instructions and information refer to [Getting
 
@@ -52,6 +52,7 @@ function _parse_opts() {
 
   DEBUG=
   EDIT_CHANGELOG=
+  DELETE_BRANCH=1
   FORCE_YES=
   HELP=
   REMOTE="origin"
@@ -99,6 +100,7 @@ function _parse_opts() {
   fi
 
   export DEBUG
+  export DELETE_BRANCH
   export EDIT_CHANGELOG
   export FORCE_YES
   export HELP
@@ -156,7 +158,7 @@ function _confirm() {
 🔍 Summary of changes:
     - Pull latest remote version of ${bold}develop${normal} (fast-forward only) from $REMOTE
     - Create branch ${bold}$RELEASE_BRANCH${normal}
-    - Bump version number: ${bold}$CURRENT_VERSION ⟶ $RELEASE_VERSION${normal}
+    - Bump version number: ${bold}$CURRENT_VERSION ⟶   $RELEASE_VERSION${normal}
 EOF
 
   if [[ -n "$EDIT_CHANGELOG" ]]; then
@@ -192,6 +194,7 @@ if [[ -n "$DEBUG" ]]; then
   echo "DEBUG:           ${DEBUG}"
   echo "EDIT_CHANGELOG:  ${EDIT_CHANGELOG}"
   echo "FORCE_YES:       ${FORCE_YES}"
+  echo "DELETE_BRANCH:   ${DELETE_BRANCH}"
   echo "RELEASE_BRANCH:  ${RELEASE_BRANCH}"
   echo "RELEASE_TAG:     ${RELEASE_TAG}"
   echo "CURRENT_VERSION: ${CURRENT_VERSION}"
 
@@ -4,13 +4,22 @@
 Getting started
 ===============
 
-Make sure you have :ref:`installed pyDVL <pyDVL Installation>` before proceeding
-further.
+.. warning::
+   Make sure you have read :ref:`the installation instructions
+   <pyDVL Installation>` before using the library. In particular read about how
+   caching and parallelization work, since they require additional setup.
 
-.. note::
-   We provide minimal overviews of key concepts in :ref:`data valuation` and
-   :ref:`influence`. For an in-depth survey of the field, we refer to the review on
-   the topic at the :tfl:`TransferLab website <>`.
+pyDVL aims to be a repository of production-ready, reference implementations of
+algorithms for data valuation and influence functions. You can read:
+
+* :ref:`data valuation` for key objects and usage patterns for Shapley value
+  computation and related methods.
+* :ref:`influence` for instruction on how to compute influence functions (still
+  in a pre-alpha state)
+
+We only briefly introduce key concepts in the documentation. For a thorough
+introduction and survey of the field, we refer to **the upcoming review** at the
+:tfl:`TransferLab website <>`.
 
 Running the examples
 ====================
@@ -24,12 +33,3 @@ by browsing our worked-out examples illustrating pyDVL's capabilities either:
 - Locally, by starting a jupyter server at the root of the project. You will
   have to install jupyter first manually since it's not a dependency of the
   library.
-
-Methods covered
-===============
-
-pyDVL offers algorithms for data valuation and computation of influence
-functions. You can read more about each family of methods here:
-
-- :ref:`data valuation`.
-- :ref:`influence`.
@@ -45,7 +45,7 @@ the instructions in their documentation for installation.
 .. _caching setup:
 
 Setting up the cache
---------------------
+====================
 
 memcached is an in-memory key-value store accessible over the network. pyDVL
 uses it to cache certain results and speed-up the computations. You can either