Skip to content

Commit 43a3de5

Browse files
Enables threadpoolctl to control number of numpy threads (#1161)
* Enabled threadpoolctl * doc thread variables * Fix documentation
1 parent 752dba8 commit 43a3de5

File tree

3 files changed

+17
-7
lines changed

3 files changed

+17
-7
lines changed

autosklearn/evaluation/abstract_evaluator.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
from smac.tae import StatusType
1414

15+
from threadpoolctl import threadpool_limits
16+
1517
import autosklearn.pipeline.classification
1618
import autosklearn.pipeline.regression
1719
from autosklearn.constants import (
@@ -193,6 +195,9 @@ def __init__(
193195
budget_type: Optional[str] = None,
194196
):
195197

198+
# Limit the number of threads that numpy uses
199+
threadpool_limits(limits=1)
200+
196201
self.starttime = time.time()
197202

198203
self.configuration = configuration

doc/manual.rst

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -172,13 +172,17 @@ is exhausted.
172172

173173
**Note:** *auto-sklearn* requires all workers to have access to a shared file system for storing training data and models.
174174

175-
Furthermore, depending on the installation of scikit-learn and numpy,
176-
the model building procedure may use up to all cores. Such behaviour is
177-
unintended by *auto-sklearn* and is most likely due to numpy being installed
178-
from `pypi` as a binary wheel (`see here <https://scikit-learn-general.narkive
179-
.com/44ywvAHA/binary-wheel-packages-for-linux-are-coming>`_). Executing
180-
``export OPENBLAS_NUM_THREADS=1`` should disable such behaviours and make numpy
181-
only use a single core at a time.
175+
*auto-sklearn* employs `threadpoolctl <https://github.com/joblib/threadpoolctl/>`_ to control the number of threads employed by scientific libraries like numpy or scikit-learn. This is done exclusively during the building procedure of models, not during inference. In particular, *auto-sklearn* allows each pipeline to use at most 1 thread during training. At predicting and scoring time this limitation is not enforced by *auto-sklearn*. You can control the number of resources
176+
employed by the pipelines by setting the following variables in your environment, prior to running *auto-sklearn*:
177+
178+
.. code-block:: shell-session
179+
180+
$ export OPENBLAS_NUM_THREADS=1
181+
$ export MKL_NUM_THREADS=1
182+
$ export OMP_NUM_THREADS=1
183+
184+
185+
For further information about how scikit-learn handles multiprocessing, please check the `Parallelism, resource management, and configuration <https://scikit-learn.org/stable/computing/parallelism.html>`_ documentation from the library.
182186

183187
Model persistence
184188
=================

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ distributed>=2.2.0
1111
pyyaml
1212
pandas>=1.0
1313
liac-arff
14+
threadpoolctl
1415

1516
ConfigSpace>=0.4.14,<0.5
1617
pynisher>=0.6.3

0 commit comments

Comments
 (0)