Skip to content

Commit 66f0e44

Browse files
committed
Merge branch 'release/v0.3.0'
2 parents 3f64251 + 554ddfc commit 66f0e44

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1820
-1004
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.2.0
2+
current_version = 0.3.0
33
commit = False
44
tag = False
55
allow_dirty = False

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<!--
2+
Thanks for making a contribution!
3+
Please make sure you have read the contributing guide:
4+
https://github.com/appliedAI-Initiative/pyDVL/blob/develop/CONTRIBUTING.md
5+
-->
6+
7+
### Description
8+
9+
This PR closes #XXX
10+
11+
### Changes
12+
13+
-
14+
15+
### Checklist
16+
17+
- [ ] Wrote Unit tests (if necessary)
18+
- [ ] Updated Documentation (if necessary)
19+
- [ ] Updated Changelog
20+
- [ ] If notebooks were added/changed, added boilerplate cells are tagged with `"nbsphinx":"hidden"`

.github/workflows/publish.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
name: Upload Python Package to PyPI
22

33
on:
4-
release:
5-
types: [created]
4+
push:
5+
tags:
6+
- "v*"
67

78
env:
89
PY_COLORS: 1
@@ -16,6 +17,9 @@ jobs:
1617
- uses: actions/checkout@v3
1718
with:
1819
fetch-depth: 0
20+
- name: Fail if not on 'master' branch
21+
if: github.ref != 'refs/heads/master'
22+
run: exit -1
1923
- name: Set up Python 3.8
2024
uses: actions/setup-python@v4
2125
with:

.github/workflows/tox.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ jobs:
3939
path: .tox
4040
- name: Lint Code
4141
run: tox -e linting
42-
continue-on-error: true
4342
- name: Check Type Hints
4443
run: tox -e type-checking
4544
docs:

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## 0.3.0 - 💥 Breaking changes
4+
5+
- Simplified and fixed powerset sampling and testing
6+
[PR #181](https://github.com/appliedAI-Initiative/pyDVL/pull/181)
7+
- Simplified and fixed publishing to PyPI from CI
8+
[PR #183](https://github.com/appliedAI-Initiative/pyDVL/pull/183)
9+
- Fixed bug in release script and updated contributing docs.
10+
[PR #184](https://github.com/appliedAI-Initiative/pyDVL/pull/184)
11+
- Added Pull Request template
12+
[PR #185](https://github.com/appliedAI-Initiative/pyDVL/pull/185)
13+
- Modified Pull Request template to automatically link PR to issue
14+
[PR ##186](https://github.com/appliedAI-Initiative/pyDVL/pull/186)
15+
- First implementation of Owen Sampling, squashed scores, better testing
16+
[PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
17+
- Improved documentation on caching, Shapley, caveats of values, bibtex
18+
[PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
19+
- **Breaking change:** Rearranging of modules to accommodate for new methods
20+
[PR #194](https://github.com/appliedAI-Initiative/pyDVL/pull/194)
21+
22+
323
## 0.2.0 - 📚 Better docs
424

525
Mostly API documentation and notebooks, plus some bugfixes.

CONTRIBUTING.md

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,20 @@ any rst files which are not manually created), you can use a file watcher.
173173
This is not part of the development setup of pyDVL (yet! PRs welcome), but
174174
modern IDEs provide functionality for this.
175175

176+
Use the **docs** tox environment to build the documentation the same way it is done in CI:
177+
178+
```bash
179+
tox -e docs
180+
```
181+
182+
Locally, you can use the **docs-dev** tox environment to continuously rebuild docs on changes:
183+
184+
```bash
185+
tox -e docs-dev
186+
```
187+
188+
**NOTE:** This currently only rebuilds on changes to `.rst` files and notebooks.
189+
176190
### Writing mathematics
177191

178192
In sphinx one can write mathematics with the directives `:math:` (inline) or
@@ -201,7 +215,19 @@ def f(x: float) -> float:
201215

202216
## CI and release processes
203217

204-
#### Automatic release process
218+
### Skipping CI run
219+
220+
You sometimes would like to skip CI for certain commits (e.g. updating the readme).
221+
In order to do that you can simply prefix the commit message with `[skip ci]`.
222+
223+
- Other strings, like `[ci skip]` are allowed, but we prefer `[skip ci]`.
224+
- The string doesn't have to be at the beginning of the commit message, but we prefer doing it
225+
that way because it makes it immediately apparent when looking at commits in a PR.
226+
227+
Refer to the official [Github documentation](https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs)
228+
for more information.
229+
230+
### Automatic release process
205231

206232
In order to create an automatic release, a few prerequisites need to be
207233
satisfied:
@@ -212,7 +238,7 @@ satisfied:
212238

213239
Then, a new release can be created using the script
214240
`build_scripts/release-version.sh` (leave out the version parameter to have
215-
`bumpversion` automatically derive the next release version):
241+
`bumpversion` automatically derive the next release version by bumping the patch part):
216242

217243
```shell script
218244
./scripts/release-version.sh 0.1.6
@@ -228,14 +254,17 @@ If running in interactive mode (without `-y|--yes`), the script will output a
228254
summary of pending changes and ask for confirmation before executing the
229255
actions.
230256

231-
#### Manual release process
257+
Once this is done, a package will be automatically created and published from CI to PyPI.
258+
259+
### Manual release process
260+
232261
If the automatic release process doesn't cover your use case, you can also
233262
create a new release manually by following these steps:
234263

235264
1. (Repeat as needed) implement features on feature branches merged into
236-
`develop`. Each merge into develop will advance the `.devNNN` version suffix
237-
and publish the pre-release version into the package registry. These versions
238-
can be installed using `pip install --pre`.
265+
`develop`. Each merge into develop will publish a new pre-release version
266+
to TestPyPI. These versions can be installed using `pip install --pre
267+
--index-url https://test.pypi.org/simple/`.
239268
2. When ready to release: From the develop branch create the release branch and
240269
perform release activities (update changelog, news, ...). For your own
241270
convenience, define an env variable for the release version
@@ -269,6 +298,7 @@ create a new release manually by following these steps:
269298
7. Delete the release branch if necessary:
270299
`git branch -d release/${RELEASE_VERSION}`
271300
8. Pour yourself a cup of coffee, you earned it! :coffee: :sparkles:
301+
9. A package will be automatically created and published from CI to PyPI.
272302

273303
### CI and requirements for releases
274304

@@ -296,9 +326,9 @@ part of the version number, create a tag and push it from CI.
296326
297327
To do that, we use 2 different tox environments:
298328
329+
- **bump-dev-version**: Uses bump2version to bump the dev version,
330+
without committing the new version or creating a corresponding git tag.
299331
- **publish-test-package**: Builds and publishes a package to TestPyPI
300-
- **bump-dev-version-and-create-tag**: Uses bump2version to bump the dev version,
301-
commit the new version and create a corresponding git tag.
302332
303333
304334
## Other useful information

README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -32,24 +32,28 @@ Data Valuation is the task of estimating the intrinsic value of a data point
3232
wrt. the training set, the model and a scoring function. We currently implement
3333
methods from the following papers:
3434

35-
- Ghorbani, Amirata, and James Zou. ‘Data Shapley: Equitable Valuation of Data for
36-
Machine Learning’. In International Conference on Machine Learning, 2242–51.
37-
PMLR, 2019. http://proceedings.mlr.press/v97/ghorbani19c.html.
38-
- Wang, Tianhao, Yu Yang, and Ruoxi Jia. ‘Improving Cooperative Game Theory-Based
39-
Data Valuation via Data Utility Learning’. arXiv, 2022.
40-
https://doi.org/10.48550/arXiv.2107.06336.
35+
- Ghorbani, Amirata, and James Zou.
36+
[Data Shapley: Equitable Valuation of Data for Machine Learning](http://proceedings.mlr.press/v97/ghorbani19c.html).
37+
In International Conference on Machine Learning, 2242–51. PMLR, 2019.
38+
- Wang, Tianhao, Yu Yang, and Ruoxi Jia.
39+
[Improving Cooperative Game Theory-Based Data Valuation via Data Utility Learning](https://doi.org/10.48550/arXiv.2107.06336).
40+
arXiv, 2022.
4141
- Jia, Ruoxi, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li,
42-
Ce Zhang, Costas Spanos, and Dawn Song. ‘Efficient Task-Specific Data Valuation
43-
for Nearest Neighbor Algorithms’. Proceedings of the VLDB Endowment 12, no. 11 (1
44-
July 2019): 1610–23. https://doi.org/10.14778/3342263.3342637.
42+
Ce Zhang, Costas Spanos, and Dawn Song.
43+
[Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms](https://doi.org/10.14778/3342263.3342637).
44+
Proceedings of the VLDB Endowment 12, no. 11 (1 July 2019): 1610–23.
45+
- Okhrati, Ramin, and Aldo Lipani.
46+
[A Multilinear Sampling Algorithm to Estimate Shapley Values](https://doi.org/10.1109/ICPR48806.2021.9412511).
47+
In 2020 25th International Conference on Pattern Recognition (ICPR), 7992–99.
48+
IEEE, 2021.
4549

4650
Influence Functions compute the effect that single points have on an estimator /
4751
model. We implement methods from the following papers:
4852

49-
- Koh, Pang Wei, and Percy Liang. ‘Understanding Black-Box Predictions via
50-
Influence Functions’. In Proceedings of the 34th International Conference on
51-
Machine Learning, 70:1885–94. Sydney, Australia: PMLR, 2017.
52-
http://proceedings.mlr.press/v70/koh17a.html.
53+
- Koh, Pang Wei, and Percy Liang.
54+
[Understanding Black-Box Predictions via Influence Functions](http://proceedings.mlr.press/v70/koh17a.html).
55+
In Proceedings of the 34th International Conference on Machine Learning,
56+
70:1885–94. Sydney, Australia: PMLR, 2017.
5357

5458
# Installation
5559

@@ -98,18 +102,20 @@ Data Shapley values:
98102
```python
99103
import numpy as np
100104
from pydvl.utils import Dataset, Utility
101-
from pydvl.shapley import compute_shapley_values
105+
from pydvl.value.shapley import compute_shapley_values
102106
from sklearn.linear_model import LinearRegression
103107
from sklearn.model_selection import train_test_split
104108

105109
X, y = np.arange(100).reshape((50, 2)), np.arange(50)
106110
X_train, X_test, y_train, y_test = train_test_split(
107-
X, y, test_size=0.5, random_state=16
108-
)
111+
X, y, test_size=0.5, random_state=16
112+
)
109113
dataset = Dataset(X_train, y_train, X_test, y_test)
110114
model = LinearRegression()
111115
utility = Utility(model, dataset)
112-
values, errors = compute_shapley_values(u=utility, max_iterations=100)
116+
values = compute_shapley_values(
117+
u=utility, max_iterations=100, mode="truncated_montecarlo"
118+
)
113119
```
114120

115121
For more instructions and information refer to [Getting

build_scripts/release-version.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ function _parse_opts() {
5252

5353
DEBUG=
5454
EDIT_CHANGELOG=
55+
DELETE_BRANCH=1
5556
FORCE_YES=
5657
HELP=
5758
REMOTE="origin"
@@ -99,6 +100,7 @@ function _parse_opts() {
99100
fi
100101

101102
export DEBUG
103+
export DELETE_BRANCH
102104
export EDIT_CHANGELOG
103105
export FORCE_YES
104106
export HELP
@@ -156,7 +158,7 @@ function _confirm() {
156158
🔍 Summary of changes:
157159
- Pull latest remote version of ${bold}develop${normal} (fast-forward only) from $REMOTE
158160
- Create branch ${bold}$RELEASE_BRANCH${normal}
159-
- Bump version number: ${bold}$CURRENT_VERSION$RELEASE_VERSION${normal}
161+
- Bump version number: ${bold}$CURRENT_VERSION $RELEASE_VERSION${normal}
160162
EOF
161163

162164
if [[ -n "$EDIT_CHANGELOG" ]]; then
@@ -192,6 +194,7 @@ if [[ -n "$DEBUG" ]]; then
192194
echo "DEBUG: ${DEBUG}"
193195
echo "EDIT_CHANGELOG: ${EDIT_CHANGELOG}"
194196
echo "FORCE_YES: ${FORCE_YES}"
197+
echo "DELETE_BRANCH: ${DELETE_BRANCH}"
195198
echo "RELEASE_BRANCH: ${RELEASE_BRANCH}"
196199
echo "RELEASE_TAG: ${RELEASE_TAG}"
197200
echo "CURRENT_VERSION: ${CURRENT_VERSION}"

docs/10-getting-started.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,22 @@
44
Getting started
55
===============
66

7-
Make sure you have :ref:`installed pyDVL <pyDVL Installation>` before proceeding
8-
further.
7+
.. warning::
8+
Make sure you have read :ref:`the installation instructions
9+
<pyDVL Installation>` before using the library. In particular read about how
10+
caching and parallelization work, since they require additional setup.
911

10-
.. note::
11-
We provide minimal overviews of key concepts in :ref:`data valuation` and
12-
:ref:`influence`. For an in-depth survey of the field, we refer to the review on
13-
the topic at the :tfl:`TransferLab website <>`.
12+
pyDVL aims to be a repository of production-ready, reference implementations of
13+
algorithms for data valuation and influence functions. You can read:
14+
15+
* :ref:`data valuation` for key objects and usage patterns for Shapley value
16+
computation and related methods.
17+
* :ref:`influence` for instruction on how to compute influence functions (still
18+
in a pre-alpha state)
19+
20+
We only briefly introduce key concepts in the documentation. For a thorough
21+
introduction and survey of the field, we refer to **the upcoming review** at the
22+
:tfl:`TransferLab website <>`.
1423

1524
Running the examples
1625
====================
@@ -24,12 +33,3 @@ by browsing our worked-out examples illustrating pyDVL's capabilities either:
2433
- Locally, by starting a jupyter server at the root of the project. You will
2534
have to install jupyter first manually since it's not a dependency of the
2635
library.
27-
28-
Methods covered
29-
===============
30-
31-
pyDVL offers algorithms for data valuation and computation of influence
32-
functions. You can read more about each family of methods here:
33-
34-
- :ref:`data valuation`.
35-
- :ref:`influence`.

docs/20-install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ the instructions in their documentation for installation.
4545
.. _caching setup:
4646

4747
Setting up the cache
48-
--------------------
48+
====================
4949

5050
memcached is an in-memory key-value store accessible over the network. pyDVL
5151
uses it to cache certain results and speed-up the computations. You can either

0 commit comments

Comments
 (0)