@@ -16,7 +16,7 @@ keep in mind when using pyDVL namely Parallelization and Caching.
1616pyDVL uses parallelization to scale and speed up computations. It does so
1717using one of Dask, Ray or Joblib. The first is used in
1818the [ influence] [ pydvl.influence ] package whereas the other two
19- are used in the [ value] [ pydvl.value ] package.
19+ are used in the [ value] [ pydvl.value ] package.
2020
2121### Data valuation
2222
@@ -37,6 +37,24 @@ and to provide a running cluster (or run ray in local mode).
3737 if the re-training only happens on a subset of the data. This means that you
3838 should make sure that each worker has enough memory to handle the whole dataset.
3939
40+ We use backend classes for both joblib and ray as well as two types
41+ of executors for the different algorithms: the first uses a map reduce pattern as seen in
42+ the [ MapReduceJob] [ pydvl.parallel.map_reduce.MapReduceJob ] class
43+ and the second implements the futures executor interface from [ concurrent.futures] [ ] .
44+
45+ !!! info
46+
47+ The executor classes are not meant to be instantiated and used by users
48+ of pyDVL. They are used internally as part of the computations of the
49+ different methods.
50+
51+ !!! info
52+
53+ We are currently planning to deprecate
54+ [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] in favour of the
55+ futures executor interface because it allows for more diverse computation
56+ patterns with interruptions.
57+
4058#### Joblib
4159
4260Please follow the instructions in Joblib's documentation
@@ -105,6 +123,50 @@ u = Utility(...)
105123vaues = combinatorial_exact_shapley(u, parallel_backend = parallel_backend)
106124```
107125
126+ #### Futures executor
127+
128+ For the futures executor interface, we have implemented an executor
129+ class for ray in [ RayExecutor] [ pydvl.parallel.futures.ray.RayExecutor ]
130+ and rely on joblib's loky [ get_reusable_executor] [ loky.get_reusable_executor ]
131+ function to instantiate an executor for local parallelization.
132+
133+ They are both compatibles with the builtin
134+ [ ThreadPoolExecutor] [ concurrent.futures.ThreadPoolExecutor ]
135+ and [ ProcessPoolExecutor] [ concurrent.futures.ProcessPoolExecutor ]
136+ classes.
137+
138+ ``` pycon
139+ >>> from joblib.externals.loky import _ReusablePoolExecutor
140+ >>> from pydvl.parallel import JoblibParallelBackend
141+ >>> parallel_backend = JoblibParallelBackend()
142+ >>> with parallel_backend.executor() as executor:
143+ ... results = list (executor.map(lambda x : x + 1 , range (3 )))
144+ ...
145+ >>> results
146+ [1, 2, 3]
147+ ```
148+
149+ #### Map reduce
150+
151+ The map reduce interface is older and more limited in the patterns
152+ it allows us to use.
153+
154+ To reproduce the previous example using
155+ [ MapReduceJob] [ pydvl.parallel.map_reduce.MapReduceJob ] we would use:
156+
157+ ``` pycon
158+ >>> from pydvl.parallel import JoblibParallelBackend, MapReduceJob
159+ >>> parallel_backend = JoblibParallelBackend()
160+ >>> map_reduce_job = MapReduceJob(
161+ ... list (range (3 )),
162+ ... map_func= lambda x : x[0 ] + 1 ,
163+ ... parallel_backend= parallel_backend,
164+ ... )
165+ >>> results = map_reduce_job()
166+ >>> results
167+ [1, 2, 3]
168+ ```
169+
108170### Influence functions
109171
110172Refer to [ Scaling influence computation] [ scaling-influence-computation ] for
0 commit comments