Skip to content

Commit 65313e7

Browse files
Document mapreducejob and futures executor interfaces
1 parent 55e06af commit 65313e7

File tree

2 files changed

+64
-1
lines changed

2 files changed

+64
-1
lines changed

docs/getting-started/advanced-usage.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ keep in mind when using pyDVL namely Parallelization and Caching.
1616
pyDVL uses parallelization to scale and speed up computations. It does so
1717
using one of Dask, Ray or Joblib. The first is used in
1818
the [influence][pydvl.influence] package whereas the other two
19-
are used in the [value][pydvl.value] package.
19+
are used in the [value][pydvl.value] package.
2020

2121
### Data valuation
2222

@@ -37,6 +37,24 @@ and to provide a running cluster (or run ray in local mode).
3737
if the re-training only happens on a subset of the data. This means that you
3838
should make sure that each worker has enough memory to handle the whole dataset.
3939

40+
We use backend classes for both joblib and ray as well as two types
41+
of executors for the different algorithms: the first uses a map reduce pattern as seen in
42+
the [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] class
43+
and the second implements the futures executor interface from [concurrent.futures][].
44+
45+
!!! info
46+
47+
The executor classes are not meant to be instantiated and used by users
48+
of pyDVL. They are used internally as part of the computations of the
49+
different methods.
50+
51+
!!! info
52+
53+
We are currently planning to deprecate
54+
[MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] in favour of the
55+
futures executor interface because it allows for more diverse computation
56+
patterns with interruptions.
57+
4058
#### Joblib
4159

4260
Please follow the instructions in Joblib's documentation
@@ -105,6 +123,50 @@ u = Utility(...)
105123
vaues = combinatorial_exact_shapley(u, parallel_backend=parallel_backend)
106124
```
107125

126+
#### Futures executor
127+
128+
For the futures executor interface, we have implemented an executor
129+
class for ray in [RayExecutor][pydvl.parallel.futures.ray.RayExecutor]
130+
and rely on joblib's loky [get_reusable_executor][loky.get_reusable_executor]
131+
function to instantiate an executor for local parallelization.
132+
133+
They are both compatibles with the builtin
134+
[ThreadPoolExecutor][concurrent.futures.ThreadPoolExecutor]
135+
and [ProcessPoolExecutor][concurrent.futures.ProcessPoolExecutor]
136+
classes.
137+
138+
```pycon
139+
>>> from joblib.externals.loky import _ReusablePoolExecutor
140+
>>> from pydvl.parallel import JoblibParallelBackend
141+
>>> parallel_backend = JoblibParallelBackend()
142+
>>> with parallel_backend.executor() as executor:
143+
... results = list(executor.map(lambda x: x + 1, range(3)))
144+
...
145+
>>> results
146+
[1, 2, 3]
147+
```
148+
149+
#### Map reduce
150+
151+
The map reduce interface is older and more limited in the patterns
152+
it allows us to use.
153+
154+
To reproduce the previous example using
155+
[MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob]we would use:
156+
157+
```pycon
158+
>>> from pydvl.parallel import JoblibParallelBackend, MapReduceJob
159+
>>> parallel_backend = JoblibParallelBackend()
160+
>>> map_reduce_job = MapReduceJob(
161+
... list(range(3)),
162+
... map_func=lambda x: x[0] + 1,
163+
... parallel_backend=parallel_backend,
164+
... )
165+
>>> results = map_reduce_job()
166+
>>> results
167+
[1, 2, 3]
168+
```
169+
108170
### Influence functions
109171

110172
Refer to [Scaling influence computation][scaling-influence-computation] for

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ plugins:
108108
- https://pytorch.org/docs/stable/objects.inv
109109
- https://pymemcache.readthedocs.io/en/latest/objects.inv
110110
- https://joblib.readthedocs.io/en/stable/objects.inv
111+
- https://loky.readthedocs.io/en/stable/objects.inv
111112
- https://docs.dask.org/en/latest/objects.inv
112113
- https://distributed.dask.org/en/latest/objects.inv
113114
- https://docs.ray.io/en/latest/objects.inv

0 commit comments

Comments
 (0)