@@ -16,7 +16,7 @@ keep in mind when using pyDVL namely Parallelization and Caching.
1616pyDVL uses parallelization to scale and speed up computations. It does so
1717using one of Dask, Ray or Joblib. The first is used in
1818the [ influence] [ pydvl.influence ] package whereas the other two
19- are used in the [ value] [ pydvl.value ] package.
19+ are used in the [ value] [ pydvl.value ] package.
2020
2121### Data valuation
2222
@@ -37,6 +37,33 @@ and to provide a running cluster (or run ray in local mode).
3737 if the re-training only happens on a subset of the data. This means that you
3838 should make sure that each worker has enough memory to handle the whole dataset.
3939
40+ We use backend classes for both joblib and ray as well as two types
41+ of executors for the different algorithms: the first uses a map reduce pattern as seen in
42+ the [ MapReduceJob] [ pydvl.parallel.map_reduce.MapReduceJob ] class
43+ and the second implements the futures executor interface from [ concurrent.futures] [ ] .
44+
45+ As a convenience, you can also instantiate a parallel backend class
46+ by using the [ init_parallel_backend] [ pydvl.parallel.init_parallel_backend ]
47+ function:
48+
49+ ``` python
50+ from pydvl.parallel import init_parallel_backend
51+ parallel_backend = init_parallel_backend(backend_name = " joblib" )
52+ ```
53+
54+ !!! info
55+
56+ The executor classes are not meant to be instantiated and used by users
57+ of pyDVL. They are used internally as part of the computations of the
58+ different methods.
59+
60+ !!! danger "Deprecation notice"
61+
62+ We are currently planning to deprecate
63+ [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] in favour of the
64+ futures executor interface because it allows for more diverse computation
65+ patterns with interruptions.
66+
4067#### Joblib
4168
4269Please follow the instructions in Joblib's documentation
@@ -48,19 +75,24 @@ to compute exact shapley values you would use:
4875
4976``` python
5077import joblib
51- from pydvl.parallel import ParallelConfig
78+ from pydvl.parallel import JoblibParallelBackend
5279from pydvl.value.shapley import combinatorial_exact_shapley
5380from pydvl.utils.utility import Utility
5481
55- config = ParallelConfig( backend = " joblib " )
82+ parallel_backend = JoblibParallelBackend( )
5683u = Utility(... )
5784
5885with joblib.parallel_config(backend = " loky" , verbose = 100 ):
59- combinatorial_exact_shapley(u, config = config )
86+ values = combinatorial_exact_shapley(u, parallel_backend = parallel_backend )
6087```
6188
6289#### Ray
6390
91+ !!! warning "Additional dependencies"
92+
93+ The Ray parallel backend requires optional dependencies.
94+ See [Extras][installation-extras] for more information.
95+
6496Please follow the instructions in Ray's documentation to
6597[ set up a remote cluster] ( https://docs.ray.io/en/latest/cluster/key-concepts.html ) .
6698You could alternatively use a local cluster and in that case you don't have to set
@@ -90,14 +122,58 @@ To use the ray parallel backend to compute exact shapley values you would use:
90122
91123``` python
92124import ray
93- from pydvl.parallel import ParallelConfig
125+ from pydvl.parallel import RayParallelBackend
94126from pydvl.value.shapley import combinatorial_exact_shapley
95127from pydvl.utils.utility import Utility
96128
97129ray.init()
98- config = ParallelConfig( backend = " ray " )
130+ parallel_backend = RayParallelBackend( )
99131u = Utility(... )
100- combinatorial_exact_shapley(u, config = config)
132+ vaues = combinatorial_exact_shapley(u, parallel_backend = parallel_backend)
133+ ```
134+
135+ #### Futures executor
136+
137+ For the futures executor interface, we have implemented an executor
138+ class for ray in [ RayExecutor] [ pydvl.parallel.futures.ray.RayExecutor ]
139+ and rely on joblib's loky [ get_reusable_executor] [ loky.get_reusable_executor ]
140+ function to instantiate an executor for local parallelization.
141+
142+ They are both compatibles with the builtin
143+ [ ThreadPoolExecutor] [ concurrent.futures.ThreadPoolExecutor ]
144+ and [ ProcessPoolExecutor] [ concurrent.futures.ProcessPoolExecutor ]
145+ classes.
146+
147+ ``` pycon
148+ >>> from joblib.externals.loky import _ReusablePoolExecutor
149+ >>> from pydvl.parallel import JoblibParallelBackend
150+ >>> parallel_backend = JoblibParallelBackend()
151+ >>> with parallel_backend.executor() as executor:
152+ ... results = list (executor.map(lambda x : x + 1 , range (3 )))
153+ ...
154+ >>> results
155+ [1, 2, 3]
156+ ```
157+
158+ #### Map-reduce
159+
160+ The map-reduce interface is older and more limited in the patterns
161+ it allows us to use.
162+
163+ To reproduce the previous example using
164+ [ MapReduceJob] [ pydvl.parallel.map_reduce.MapReduceJob ] , we would use:
165+
166+ ``` pycon
167+ >>> from pydvl.parallel import JoblibParallelBackend, MapReduceJob
168+ >>> parallel_backend = JoblibParallelBackend()
169+ >>> map_reduce_job = MapReduceJob(
170+ ... list (range (3 )),
171+ ... map_func= lambda x : x[0 ] + 1 ,
172+ ... parallel_backend= parallel_backend,
173+ ... )
174+ >>> results = map_reduce_job()
175+ >>> results
176+ [1, 2, 3]
101177```
102178
103179### Influence functions
0 commit comments