aai-institute
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/getting-started/advanced-usage.md‎
Lines changed: 83 additions & 7 deletions b/‎docs/getting-started/advanced-usage.md‎
Lines changed: 83 additions & 7 deletions
diff --git a/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions b/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/pydvl/parallel/__init__.py‎
Lines changed: 27 additions & 16 deletions b/‎src/pydvl/parallel/__init__.py‎
Lines changed: 27 additions & 16 deletions
@@ -36,6 +36,8 @@
 - Documentation improvements and cleanup
   [PR #521](https://github.com/aai-institute/pyDVL/pull/521),
   [PR #522](https://github.com/aai-institute/pyDVL/pull/522)
+- Simplified parallel backend configuration
+  [PR #549](https://github.com/mkdocstrings/mkdocstrings/issues/615)
 
 ## 0.8.1 - 🆕 🏗  New method and notebook, Games with exact shapley values, bug fixes and cleanup
 
 
@@ -16,7 +16,7 @@ keep in mind when using pyDVL namely Parallelization and Caching.
 pyDVL uses parallelization to scale and speed up computations. It does so
 using one of Dask, Ray or Joblib. The first is used in
 the [influence][pydvl.influence] package whereas the other two
-are used in the [value][pydvl.value] package. 
+are used in the [value][pydvl.value] package.
 
 ### Data valuation
 
@@ -37,6 +37,33 @@ and to provide a running cluster (or run ray in local mode).
     if the re-training only happens on a subset of the data. This means that you
     should make sure that each worker has enough memory to handle the whole dataset.
 
+We use backend classes for both joblib and ray as well as two types
+of executors for the different algorithms: the first uses a map reduce pattern as seen in
+the [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] class
+and the second implements the futures executor interface from [concurrent.futures][].
+
+As a convenience, you can also instantiate a parallel backend class
+by using the [init_parallel_backend][pydvl.parallel.init_parallel_backend]
+function:
+
+```python
+from pydvl.parallel import init_parallel_backend
+parallel_backend = init_parallel_backend(backend_name="joblib")
+```
+
+!!! info
+
+    The executor classes are not meant to be instantiated and used by users
+    of pyDVL. They are used internally as part of the computations of the 
+    different methods.
+
+!!! danger "Deprecation notice"
+
+    We are currently planning to deprecate
+    [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] in favour of the
+    futures executor interface because it allows for more diverse computation
+    patterns with interruptions.
+
 #### Joblib
 
 Please follow the instructions in Joblib's documentation
@@ -48,19 +75,24 @@ to compute exact shapley values you would use:
 
 ```python
 import joblib
-from pydvl.parallel import ParallelConfig
+from pydvl.parallel import JoblibParallelBackend
 from pydvl.value.shapley import combinatorial_exact_shapley
 from pydvl.utils.utility import Utility
 
-config = ParallelConfig(backend="joblib") 
+parallel_backend = JoblibParallelBackend() 
 u = Utility(...)
 
 with joblib.parallel_config(backend="loky", verbose=100):
-    combinatorial_exact_shapley(u, config=config)
+    values = combinatorial_exact_shapley(u, parallel_backend=parallel_backend)
 ```
 
 #### Ray
 
+!!! warning "Additional dependencies"
+   
+    The Ray parallel backend requires optional dependencies.
+    See [Extras][installation-extras] for more information.
+
 Please follow the instructions in Ray's documentation to
 [set up a remote cluster](https://docs.ray.io/en/latest/cluster/key-concepts.html).
 You could alternatively use a local cluster and in that case you don't have to set
@@ -90,14 +122,58 @@ To use the ray parallel backend to compute exact shapley values you would use:
 
 ```python
 import ray
-from pydvl.parallel import ParallelConfig
+from pydvl.parallel import RayParallelBackend
 from pydvl.value.shapley import combinatorial_exact_shapley
 from pydvl.utils.utility import Utility
 
 ray.init()
-config = ParallelConfig(backend="ray")
+parallel_backend = RayParallelBackend()
 u = Utility(...)
-combinatorial_exact_shapley(u, config=config)
+vaues = combinatorial_exact_shapley(u, parallel_backend=parallel_backend)
+```
+
+#### Futures executor
+
+For the futures executor interface, we have implemented an executor 
+class for ray in [RayExecutor][pydvl.parallel.futures.ray.RayExecutor]
+and rely on joblib's loky [get_reusable_executor][loky.get_reusable_executor]
+function to instantiate an executor for local parallelization.
+
+They are both compatibles with the builtin
+[ThreadPoolExecutor][concurrent.futures.ThreadPoolExecutor]
+and [ProcessPoolExecutor][concurrent.futures.ProcessPoolExecutor]
+classes.
+
+```pycon
+>>> from joblib.externals.loky import _ReusablePoolExecutor
+>>> from pydvl.parallel import JoblibParallelBackend
+>>> parallel_backend = JoblibParallelBackend() 
+>>> with parallel_backend.executor() as executor:
+...     results = list(executor.map(lambda x: x + 1, range(3)))
+...
+>>> results
+[1, 2, 3]
+```
+
+#### Map-reduce
+
+The map-reduce interface is older and more limited in the patterns
+it allows us to use.
+
+To reproduce the previous example using
+[MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob], we would use:
+
+```pycon
+>>> from pydvl.parallel import JoblibParallelBackend, MapReduceJob
+>>> parallel_backend = JoblibParallelBackend() 
+>>> map_reduce_job = MapReduceJob(
+...     list(range(3)),
+...     map_func=lambda x: x[0] + 1,
+...     parallel_backend=parallel_backend,
+... )
+>>> results = map_reduce_job()
+>>> results
+[1, 2, 3]
 ```
 
 ### Influence functions
 
@@ -108,6 +108,7 @@ plugins:
             - https://pytorch.org/docs/stable/objects.inv
             - https://pymemcache.readthedocs.io/en/latest/objects.inv
             - https://joblib.readthedocs.io/en/stable/objects.inv
+            - https://loky.readthedocs.io/en/stable/objects.inv
             - https://docs.dask.org/en/latest/objects.inv
             - https://distributed.dask.org/en/latest/objects.inv
             - https://docs.ray.io/en/latest/objects.inv
 
@@ -1,37 +1,48 @@
 """
 This module provides a common interface to parallelization backends. The list of
-supported backends is [here][pydvl.parallel.backends]. Backends can be
-selected with the `backend` argument of an instance of
-[ParallelConfig][pydvl.utils.config.ParallelConfig], as seen in the examples
-below.
+supported backends is [here][pydvl.parallel.backends]. Backends should be
+instantiated directly and passed to the respective valuation method.
 
-We use [executors][concurrent.futures.Executor] to submit tasks in parallel. The
-basic high-level pattern is
+We use executors that implement the [Executor][concurrent.futures.Executor]
+interface to submit tasks in parallel.
+The basic high-level pattern is:
 
 ```python
-from pydvl.parallel import init_executor, ParallelConfig
+from pydvl.parallel import JoblibParallelBackend
 
-config = ParallelConfig(backend="ray")
-with init_executor(max_workers=1, config=config) as executor:
+parallel_backend = JoblibParallelBackend()
+with parallel_backend.executor(max_workers=2) as executor:
     future = executor.submit(lambda x: x + 1, 1)
     result = future.result()
 assert result == 2
 ```
 
-Running a map-reduce job is also easy:
+Running a map-style job is also easy:
 
 ```python
-from pydvl.parallel import init_executor, ParallelConfig
+from pydvl.parallel import JoblibParallelBackend
 
-config = ParallelConfig(backend="joblib")
-with init_executor(config=config) as executor:
+parallel_backend = JoblibParallelBackend()
+with parallel_backend.executor(max_workers=2) as executor:
     results = list(executor.map(lambda x: x + 1, range(5)))
 assert results == [1, 2, 3, 4, 5]
 ```
-
+!!! tip "Passsing large objects"
+    When running tasks which accept heavy inputs, it is important
+    to first use `put()` on the object and use the returned reference
+    as argument to the callable within `submit()`. For example:    
+    ```python
+    u_ref = parallel_backend.put(u)
+    ...
+    executor.submit(task, utility=u)
+    ```
+    Note that `task()` does not need to be changed in any way:
+    the backend will `get()` the object and pass it to the function
+    upon invocation.
 There is an alternative map-reduce implementation
 [MapReduceJob][pydvl.parallel.map_reduce.MapReduceJob] which internally
-uses joblib's higher level API with `Parallel()`
+uses joblib's higher level API with `Parallel()` which then indirectly also
+supports the use of Dask and Ray.
 """
 # HACK to avoid circular imports
 from ..utils.types import *  # pylint: disable=wrong-import-order
@@ -41,5 +52,5 @@
 from .futures import *
 from .map_reduce import *
 
-if len(BaseParallelBackend.BACKENDS) == 0:
+if len(ParallelBackend.BACKENDS) == 0:
     raise ImportError("No parallel backend found. Please install ray or joblib.")