Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/sources/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Supported Algorithms

.. note::
To verify that oneDAL is being used for these algorithms, you can enable verbose mode.
See :ref:`verbose mode documentation <verbose>` for details.
See :ref:`verbose` for details.

Applying |sklearnex| impacts the following |sklearn| estimators:

Expand Down
163 changes: 106 additions & 57 deletions doc/sources/array_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,16 @@ Overview

Many estimators from the |sklearnex| support passing data classes that conform to the
`Array API <https://data-apis.org/array-api/>`_ specification as inputs to methods like ``.fit()``
and ``.predict()``, such as :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>` or
`torch.tensor <https://docs.pytorch.org/docs/stable/tensors.html>`__. This is particularly
useful for GPU computations, as it allows performing operations on inputs that are already
and ``.predict()``, such as |dpnp_array| or `torch.tensor <https://docs.pytorch.org/docs/stable/tensors.html>`__.
This is particularly useful for GPU computations, as it allows performing operations on inputs that are already
on GPU without moving the data from host to device.

.. important::
Array API is disabled by default in |sklearn|. In order to get array API support in the |sklearnex|, it must
be :external+sklearn:doc:`enabled in scikit-learn <modules/array_api>`, which requires either changing
global settings or using a ``config_context``, plus installing additional dependencies such as ``array-api-compat``.

When passing array API inputs whose data is on a SyCL-enabled device (e.g. an Intel GPU), as
When passing array API inputs whose data is on a SYCL-enabled device (e.g. an Intel GPU), as
supported for example by `PyTorch <https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html>`__
and |dpnp|, if array API support is enabled and the requested operation (e.g. call to ``.fit()`` / ``.predict()``
on the estimator class being used) is :ref:`supported on device/GPU <sklearn_algorithms_gpu>`, computations
Expand All @@ -51,10 +50,10 @@ through options ``allow_sklearn_after_onedal`` (default is ``True``) and ``allow

If array API is enabled for |sklearn| and the estimator being used has array API support on |sklearn| (which can be
verified by attribute ``array_api_support`` from :obj:`sklearn.utils.get_tags`), then array API inputs whose data
is allocated neither on CPU nor on a SyCL device will be forwarded directly to the unpatched methods from |sklearn|,
is allocated neither on CPU nor on a SYCL device will be forwarded directly to the unpatched methods from |sklearn|,
without using the accelerated versions from this library, regardless of option ``allow_sklearn_after_onedal``.

While other array API inputs (e.g. torch arrays with data allocated on a non-SyCL device) might be supported
While other array API inputs (e.g. torch arrays with data allocated on a non-SYCL device) might be supported
by the |sklearnex| in cases where the same class from |sklearn| doesn't support array API, note that the data will
be transferred to host if it isn't already, and the computations will happen on CPU.

Expand All @@ -80,6 +79,7 @@ in many cases they are.
classes that have :external+dpctl:doc:`USM data <api_reference/dpctl/memory>`. In order to ensure that computations
happen on the intended device under array API, make sure that the data is already on the desired device.

.. _array_api_estimators:

Supported classes
=================
Expand All @@ -98,11 +98,10 @@ The following patched classes have support for array API inputs:
- :obj:`sklearnex.linear_model.IncrementalRidge`

.. note::
While full array API support is currently not implemented for all classes, :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>`
and :external+dpctl:doc:`dpctl.tensor <api_reference/dpctl/tensor>` inputs are supported by all the classes
that have :ref:`GPU support <oneapi_gpu>`. Note however that if array API support is not enabled in |sklearn|,
when passing these classes as inputs, data will be transferred to host and then back to device instead of being
used directly.
While full array API support is currently not implemented for all classes, |dpnp_array| inputs are supported
by all the classes that have :ref:`GPU support <oneapi_gpu>`. Note however that if array API support is not
enabled in |sklearn|, when passing these classes as inputs, data will be transferred to host and then back to
device instead of being used directly.


Example usage
Expand All @@ -111,52 +110,102 @@ Example usage
GPU operations on GPU arrays
----------------------------

.. code-block:: python

# Array API support from sklearn requires enabling it on SciPy too
import os
os.environ["SCIPY_ARRAY_API"] = "1"

import numpy as np
import dpnp
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression

# Random data for a regression problem
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)

# DPNP offers an array-API-compliant class where data can be on GPU
X = dpnp.array(X_np, device="gpu")
y = dpnp.array(y_np, device="gpu")

# Important to note again that array API must be enabled on scikit-learn
model = LinearRegression()
with config_context(array_api_dispatch=True):
model.fit(X, y)

# Fitted attributes are now of the same class as inputs
assert isinstance(model.coef_, X.__class__)

# Predictions are also of the same class
with config_context(array_api_dispatch=True):
pred = model.predict(X[:5])
assert isinstance(pred, X.__class__)

# Fitted models can be passed array API inputs of a different class
# than the training data, as long as their data resides in the same
# device. This now fits a model using a non-NumPy class whose data is on CPU.
X_cpu = dpnp.array(X_np, device="cpu")
y_cpu = dpnp.array(y_np, device="cpu")
model_cpu = LinearRegression()
with config_context(array_api_dispatch=True):
model_cpu.fit(X_cpu, y_cpu)
pred_dpnp = model_cpu.predict(X_cpu[:5])
pred_np = model_cpu.predict(X_cpu[:5].asnumpy())
assert isinstance(pred_dpnp, X_cpu.__class__)
assert isinstance(pred_np, np.ndarray)
assert pred_dpnp.__class__ != pred_np.__class__
.. tabs::
.. tab:: With Torch tensors
.. code-block:: python

# Array API support from sklearn requires enabling it on SciPy too
import os
os.environ["SCIPY_ARRAY_API"] = "1"

import numpy as np
import torch
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression

# Random data for a regression problem
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)

# Torch offers an array-API-compliant class where data can be on GPU (referred to as 'xpu')
X = torch.tensor(X_np, device="xpu")
y = torch.tensor(y_np, device="xpu")

# Important to note again that array API must be enabled on scikit-learn
model = LinearRegression()
with config_context(array_api_dispatch=True):
model.fit(X, y)

# Fitted attributes are now of the same class as inputs
assert isinstance(model.coef_, torch.Tensor)

# Predictions are also of the same class
with config_context(array_api_dispatch=True):
pred = model.predict(X[:5])
assert isinstance(pred, torch.Tensor)

# Fitted models can be passed array API inputs of a different class
# than the training data, as long as their data resides in the same
# device. This now fits a model using a non-NumPy class whose data is on CPU.
X_cpu = torch.tensor(X_np, device="cpu")
y_cpu = torch.tensor(y_np, device="cpu")
model_cpu = LinearRegression()
with config_context(array_api_dispatch=True):
model_cpu.fit(X_cpu, y_cpu)
pred_torch = model_cpu.predict(X_cpu[:5])
pred_np = model_cpu.predict(X_cpu[:5].numpy())
assert isinstance(pred_torch, X_cpu.__class__)
assert isinstance(pred_np, np.ndarray)
assert pred_torch.__class__ != pred_np.__class__

.. tab:: With DPNP arrays
.. code-block:: python

# Array API support from sklearn requires enabling it on SciPy too
import os
os.environ["SCIPY_ARRAY_API"] = "1"

import numpy as np
import dpnp
from sklearnex import config_context
from sklearnex.linear_model import LinearRegression

# Random data for a regression problem
rng = np.random.default_rng(seed=123)
X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
y_np = rng.standard_normal(size=100, dtype=np.float32)

# DPNP offers an array-API-compliant class where data can be on GPU
X = dpnp.array(X_np, device="gpu")
y = dpnp.array(y_np, device="gpu")

# Important to note again that array API must be enabled on scikit-learn
model = LinearRegression()
with config_context(array_api_dispatch=True):
model.fit(X, y)

# Fitted attributes are now of the same class as inputs
assert isinstance(model.coef_, X.__class__)

# Predictions are also of the same class
with config_context(array_api_dispatch=True):
pred = model.predict(X[:5])
assert isinstance(pred, X.__class__)

# Fitted models can be passed array API inputs of a different class
# than the training data, as long as their data resides in the same
# device. This now fits a model using a non-NumPy class whose data is on CPU.
X_cpu = dpnp.array(X_np, device="cpu")
y_cpu = dpnp.array(y_np, device="cpu")
model_cpu = LinearRegression()
with config_context(array_api_dispatch=True):
model_cpu.fit(X_cpu, y_cpu)
pred_dpnp = model_cpu.predict(X_cpu[:5])
pred_np = model_cpu.predict(X_cpu[:5].asnumpy())
assert isinstance(pred_dpnp, X_cpu.__class__)
assert isinstance(pred_np, np.ndarray)
assert pred_dpnp.__class__ != pred_np.__class__


``array-api-strict``
Expand Down
91 changes: 91 additions & 0 deletions doc/sources/config-contexts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
.. Copyright contributors to the oneDAL project
..
.. Licensed under the Apache License, Version 2.0 (the "License");
.. you may not use this file except in compliance with the License.
.. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.
.. include:: substitutions.rst
.. _config_contexts:

=========================================
Configuration Contexts and Global Options
=========================================

Overview
========

Just like |sklearn|, the |sklearnex| offers configurable options which can be managed
locally through a configuration context, or globally through process-wide settings,
by extending the configuration-related functions from |sklearn| (see :obj:`sklearn.config_context`
for details).

Configurations in the |sklearnex| are particularly useful for :ref:`GPU functionalities <oneapi_gpu>`
and :ref:`SMPD mode <distributed>`, and are necessary to modify for enabling :ref:`array API <array_api>`.

Configuration context and global options manager for the |sklearnex| can either be imported directly
from the module ``sklearnex``, or can be imported from the ``sklearn`` module after applying patching.

Note that options in the |sklearnex| are a superset of options from |sklearn|, and options passed to
the configuration contexts and global settings of the |sklearnex| will also affect |sklearn| if the
option is supported by it - meaning: the same context manager or global option setter is used for
both libraries.

Example usage
=============

Example using the ``target_offload`` option to make computations run on a GPU:

With a local context
--------------------

Here, only the operations from |sklearn| and from the |sklearnex| that happen within the 'with'
block will be affected by the options:

.. code:: python

import numpy as np
from sklearnex import config_context
from sklearnex.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

As a global option
------------------

Here, all computations from |sklearn| and from the |sklearnex| that happen after the option
is modified are affected:

.. code:: python

import numpy as np
from sklearnex import set_config
from sklearnex.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)

set_config(target_offload="gpu") # set it globally
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
set_config(target_offload="auto") # restore it back

API Reference
=============

Note that all of the options accepted by these functions in |sklearn| are also accepted
here - these just list the additional options offered by the |sklearnex|.

.. autofunction:: sklearnex.config_context

.. autofunction:: sklearnex.get_config

.. autofunction:: sklearnex.set_config
2 changes: 1 addition & 1 deletion doc/sources/distributed-mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ data on device without this may lead to a runtime error): ::
export I_MPI_OFFLOAD=1

SMPD-aware versions of estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
desired, and should be transferred to a |dpctl| or `dpnp <https://github.com/IntelPython/dpnp>`__ array before being passed to the estimator.
desired, and should be transferred to a |dpnp_array| before being passed to the estimator.

Note that SPMD estimators allow an additional argument ``queue`` in their ``.fit`` / ``.predict`` methods, which accept :obj:`dpctl.SyclQueue` objects. For example, while the signature for :obj:`sklearn.linear_model.LinearRegression.predict` would be

Expand Down
Loading
Loading