diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst
index a5428f78a0..4c7509d384 100755
--- a/doc/sources/algorithms.rst
+++ b/doc/sources/algorithms.rst
@@ -21,7 +21,7 @@ Supported Algorithms
 
 .. note::
    To verify that oneDAL is being used for these algorithms, you can enable verbose mode. 
-   See :ref:`verbose mode documentation <verbose>` for details.
+   See :doc:`verbose` for details.
 
 Applying |sklearnex| impacts the following |sklearn| estimators:
 
diff --git a/doc/sources/array_api.rst b/doc/sources/array_api.rst
index b2eb7a8bee..f493878b36 100644
--- a/doc/sources/array_api.rst
+++ b/doc/sources/array_api.rst
@@ -23,9 +23,8 @@ Overview
 
 Many estimators from the |sklearnex| support passing data classes that conform to the
 `Array API <https://data-apis.org/array-api/>`_ specification as inputs to methods like ``.fit()``
-and ``.predict()``, such as :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>` or
-`torch.tensor <https://docs.pytorch.org/docs/stable/tensors.html>`__. This is particularly
-useful for GPU computations, as it allows performing operations on inputs that are already
+and ``.predict()``, such as |dpnp_array| or `torch.tensor <https://docs.pytorch.org/docs/stable/tensors.html>`__.
+This is particularly useful for GPU computations, as it allows performing operations on inputs that are already
 on GPU without moving the data from host to device.
 
 .. important::
@@ -33,7 +32,7 @@ on GPU without moving the data from host to device.
     be :external+sklearn:doc:`enabled in scikit-learn <modules/array_api>`, which requires either changing
     global settings or using a ``config_context``, plus installing additional dependencies such as ``array-api-compat``.
 
-When passing array API inputs whose data is on a SyCL-enabled device (e.g. an Intel GPU), as
+When passing array API inputs whose data is on a SYCL-enabled device (e.g. an Intel GPU), as
 supported for example by `PyTorch <https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html>`__
 and |dpnp|, if array API support is enabled and the requested operation (e.g. call to ``.fit()`` / ``.predict()``
 on the estimator class being used) is :ref:`supported on device/GPU <sklearn_algorithms_gpu>`, computations
@@ -51,10 +50,10 @@ through options ``allow_sklearn_after_onedal`` (default is ``True``) and ``allow
 
 If array API is enabled for |sklearn| and the estimator being used has array API support on |sklearn| (which can be
 verified by attribute ``array_api_support`` from :obj:`sklearn.utils.get_tags`), then array API inputs whose data
-is allocated neither on CPU nor on a SyCL device will be forwarded directly to the unpatched methods from |sklearn|,
+is allocated neither on CPU nor on a SYCL device will be forwarded directly to the unpatched methods from |sklearn|,
 without using the accelerated versions from this library, regardless of option ``allow_sklearn_after_onedal``.
 
-While other array API inputs (e.g. torch arrays with data allocated on a non-SyCL device) might be supported
+While other array API inputs (e.g. torch arrays with data allocated on a non-SYCL device) might be supported
 by the |sklearnex| in cases where the same class from |sklearn| doesn't support array API, note that the data will
 be transferred to host if it isn't already, and the computations will happen on CPU.
 
@@ -80,6 +79,7 @@ in many cases they are.
     classes that have :external+dpctl:doc:`USM data <api_reference/dpctl/memory>`. In order to ensure that computations
     happen on the intended device under array API, make sure that the data is already on the desired device.
 
+.. _array_api_estimators:
 
 Supported classes
 =================
@@ -98,11 +98,10 @@ The following patched classes have support for array API inputs:
 - :obj:`sklearnex.linear_model.IncrementalRidge`
 
 .. note::
-    While full array API support is currently not implemented for all classes, :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>`
-    and :external+dpctl:doc:`dpctl.tensor <api_reference/dpctl/tensor>` inputs are supported by all the classes
-    that have :ref:`GPU support <oneapi_gpu>`. Note however that if array API support is not enabled in |sklearn|,
-    when passing these classes as inputs, data will be transferred to host and then back to device instead of being
-    used directly.
+    While full array API support is currently not implemented for all classes, |dpnp_array| inputs are supported
+    by all the classes that have :ref:`GPU support <oneapi_gpu>`. Note however that if array API support is not
+    enabled in |sklearn|, when passing these classes as inputs, data will be transferred to host and then back to
+    device instead of being used directly.
 
 
 Example usage
@@ -111,52 +110,102 @@ Example usage
 GPU operations on GPU arrays
 ----------------------------
 
-.. code-block:: python
-
-    # Array API support from sklearn requires enabling it on SciPy too
-    import os
-    os.environ["SCIPY_ARRAY_API"] = "1"
-
-    import numpy as np
-    import dpnp
-    from sklearnex import config_context
-    from sklearnex.linear_model import LinearRegression
-
-    # Random data for a regression problem
-    rng = np.random.default_rng(seed=123)
-    X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
-    y_np = rng.standard_normal(size=100, dtype=np.float32)
-
-    # DPNP offers an array-API-compliant class where data can be on GPU
-    X = dpnp.array(X_np, device="gpu")
-    y = dpnp.array(y_np, device="gpu")
-
-    # Important to note again that array API must be enabled on scikit-learn
-    model = LinearRegression()
-    with config_context(array_api_dispatch=True):
-        model.fit(X, y)
-
-    # Fitted attributes are now of the same class as inputs
-    assert isinstance(model.coef_, X.__class__)
-
-    # Predictions are also of the same class
-    with config_context(array_api_dispatch=True):
-        pred = model.predict(X[:5])
-    assert isinstance(pred, X.__class__)
-
-    # Fitted models can be passed array API inputs of a different class
-    # than the training data, as long as their data resides in the same
-    # device. This now fits a model using a non-NumPy class whose data is on CPU.
-    X_cpu = dpnp.array(X_np, device="cpu")
-    y_cpu = dpnp.array(y_np, device="cpu")
-    model_cpu = LinearRegression()
-    with config_context(array_api_dispatch=True):
-        model_cpu.fit(X_cpu, y_cpu)
-        pred_dpnp = model_cpu.predict(X_cpu[:5])
-        pred_np = model_cpu.predict(X_cpu[:5].asnumpy())
-    assert isinstance(pred_dpnp, X_cpu.__class__)
-    assert isinstance(pred_np, np.ndarray)
-    assert pred_dpnp.__class__ != pred_np.__class__
+.. tabs::
+    .. tab:: With Torch tensors
+       .. code-block:: python
+
+           # Array API support from sklearn requires enabling it on SciPy too
+           import os
+           os.environ["SCIPY_ARRAY_API"] = "1"
+
+           import numpy as np
+           import torch
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+
+           # Random data for a regression problem
+           rng = np.random.default_rng(seed=123)
+           X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
+           y_np = rng.standard_normal(size=100, dtype=np.float32)
+
+           # Torch offers an array-API-compliant class where data can be on GPU (referred to as 'xpu')
+           X = torch.tensor(X_np, device="xpu")
+           y = torch.tensor(y_np, device="xpu")
+
+           # Important to note again that array API must be enabled on scikit-learn
+           model = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model.fit(X, y)
+
+           # Fitted attributes are now of the same class as inputs
+           assert isinstance(model.coef_, torch.Tensor)
+
+           # Predictions are also of the same class
+           with config_context(array_api_dispatch=True):
+               pred = model.predict(X[:5])
+           assert isinstance(pred, torch.Tensor)
+
+           # Fitted models can be passed array API inputs of a different class
+           # than the training data, as long as their data resides in the same
+           # device. This now fits a model using a non-NumPy class whose data is on CPU.
+           X_cpu = torch.tensor(X_np, device="cpu")
+           y_cpu = torch.tensor(y_np, device="cpu")
+           model_cpu = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model_cpu.fit(X_cpu, y_cpu)
+               pred_torch = model_cpu.predict(X_cpu[:5])
+               pred_np = model_cpu.predict(X_cpu[:5].numpy())
+           assert isinstance(pred_torch, X_cpu.__class__)
+           assert isinstance(pred_np, np.ndarray)
+           assert pred_torch.__class__ != pred_np.__class__
+
+    .. tab:: With DPNP arrays
+       .. code-block:: python
+
+           # Array API support from sklearn requires enabling it on SciPy too
+           import os
+           os.environ["SCIPY_ARRAY_API"] = "1"
+
+           import numpy as np
+           import dpnp
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+
+           # Random data for a regression problem
+           rng = np.random.default_rng(seed=123)
+           X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
+           y_np = rng.standard_normal(size=100, dtype=np.float32)
+
+           # DPNP offers an array-API-compliant class where data can be on GPU
+           X = dpnp.array(X_np, device="gpu")
+           y = dpnp.array(y_np, device="gpu")
+
+           # Important to note again that array API must be enabled on scikit-learn
+           model = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model.fit(X, y)
+
+           # Fitted attributes are now of the same class as inputs
+           assert isinstance(model.coef_, X.__class__)
+
+           # Predictions are also of the same class
+           with config_context(array_api_dispatch=True):
+               pred = model.predict(X[:5])
+           assert isinstance(pred, X.__class__)
+
+           # Fitted models can be passed array API inputs of a different class
+           # than the training data, as long as their data resides in the same
+           # device. This now fits a model using a non-NumPy class whose data is on CPU.
+           X_cpu = dpnp.array(X_np, device="cpu")
+           y_cpu = dpnp.array(y_np, device="cpu")
+           model_cpu = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model_cpu.fit(X_cpu, y_cpu)
+               pred_dpnp = model_cpu.predict(X_cpu[:5])
+               pred_np = model_cpu.predict(X_cpu[:5].asnumpy())
+           assert isinstance(pred_dpnp, X_cpu.__class__)
+           assert isinstance(pred_np, np.ndarray)
+           assert pred_dpnp.__class__ != pred_np.__class__
 
 
 ``array-api-strict``
diff --git a/doc/sources/config-contexts.rst b/doc/sources/config-contexts.rst
new file mode 100644
index 0000000000..3719c5060d
--- /dev/null
+++ b/doc/sources/config-contexts.rst
@@ -0,0 +1,91 @@
+.. Copyright contributors to the oneDAL project
+..
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. include:: substitutions.rst
+.. _config_contexts:
+
+=========================================
+Configuration Contexts and Global Options
+=========================================
+
+Overview
+========
+
+Just like |sklearn|, the |sklearnex| offers configurable options which can be managed
+locally through a configuration context, or globally through process-wide settings,
+by extending the configuration-related functions from |sklearn| (see :obj:`sklearn.config_context`
+for details).
+
+Configurations in the |sklearnex| are particularly useful for :doc:`GPU functionalities <oneapi-gpu>`
+and :doc:`SMPD mode <distributed-mode>`, and are necessary to modify for enabling :doc:`array API <array_api>`.
+
+Configuration context and global options manager for the |sklearnex| can either be imported directly
+from the module ``sklearnex``, or can be imported from the ``sklearn`` module after applying patching.
+
+Note that options in the |sklearnex| are a superset of options from |sklearn|, and options passed to
+the configuration contexts and global settings of the |sklearnex| will also affect |sklearn| if the
+option is supported by it - meaning: the same context manager  or global option setter is used for
+both libraries.
+
+Example usage
+=============
+
+Example using the ``target_offload`` option to make computations run on a GPU:
+
+With a local context
+--------------------
+
+Here, only the operations from |sklearn| and from the |sklearnex| that happen within the 'with'
+block will be affected by the options:
+
+.. code:: python
+
+    import numpy as np
+    from sklearnex import config_context
+    from sklearnex.cluster import DBSCAN
+
+    X = np.array([[1., 2.], [2., 2.], [2., 3.],
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+    with config_context(target_offload="gpu"):
+        clustering = DBSCAN(eps=3, min_samples=2).fit(X)
+
+As a global option
+------------------
+
+Here, all computations from |sklearn| and from the |sklearnex| that happen after the option
+is modified are affected:
+
+.. code:: python
+
+    import numpy as np
+    from sklearnex import set_config
+    from sklearnex.cluster import DBSCAN
+
+    X = np.array([[1., 2.], [2., 2.], [2., 3.],
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+    
+    set_config(target_offload="gpu") # set it globally
+    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
+    set_config(target_offload="auto") # restore it back
+
+API Reference
+=============
+
+Note that all of the options accepted by these functions in |sklearn| are also accepted
+here - these just list the additional options offered by the |sklearnex|.
+
+.. autofunction:: sklearnex.config_context
+
+.. autofunction:: sklearnex.get_config
+
+.. autofunction:: sklearnex.set_config
diff --git a/doc/sources/distributed-mode.rst b/doc/sources/distributed-mode.rst
index 73e55839c9..924500a340 100644
--- a/doc/sources/distributed-mode.rst
+++ b/doc/sources/distributed-mode.rst
@@ -85,7 +85,7 @@ data on device without this may lead to a runtime error): ::
     export I_MPI_OFFLOAD=1
 
 SMPD-aware versions of estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
-desired, and should be transferred to a |dpctl| or `dpnp <https://github.com/IntelPython/dpnp>`__ array before being passed to the estimator.
+desired, and should be transferred to a |dpnp_array| before being passed to the estimator.
 
 Note that SPMD estimators allow an additional argument ``queue`` in their ``.fit`` / ``.predict`` methods, which accept :obj:`dpctl.SyclQueue` objects. For example, while the signature for :obj:`sklearn.linear_model.LinearRegression.predict` would be
 
diff --git a/doc/sources/index.rst b/doc/sources/index.rst
index 2b405c8d10..7d13456527 100755
--- a/doc/sources/index.rst
+++ b/doc/sources/index.rst
@@ -41,16 +41,16 @@ These performance charts use benchmarks that you can find in the `scikit-learn b
 
 
 Supported Algorithms
----------------------
+--------------------
 
 See all of the :ref:`sklearn_algorithms`.
 
 
 Optimizations
-----------------------------------
+-------------
 
 Enable CPU Optimizations
-*********************************
+************************
 
 .. tabs::
    .. tab:: By patching
@@ -78,7 +78,7 @@ Enable CPU Optimizations
 
 
 Enable GPU optimizations
-*********************************
+************************
 
 Note: executing on GPU has `additional system software requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`__ - see :doc:`oneapi-gpu`.
 
@@ -105,7 +105,7 @@ Note: executing on GPU has `additional system software requirements <https://www
                import os
                os.environ["SCIPY_ARRAY_API"] = "1"
                import numpy as np
-               import dpnp
+               import torch
                from sklearnex import patch_sklearn
                patch_sklearn()
                from sklearn import config_context
@@ -114,8 +114,8 @@ Note: executing on GPU has `additional system software requirements <https://www
 
                X = np.array([[1., 2.], [2., 2.], [2., 3.],
                              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
-               X = dpnp.array(X, device="gpu")
-               with config_context(array_api_dispatch=True)
+               X = torch.tensor(X, device="xpu")
+               with config_context(array_api_dispatch=True):
                    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
    .. tab:: Without patching
@@ -138,14 +138,14 @@ Note: executing on GPU has `additional system software requirements <https://www
                import os
                os.environ["SCIPY_ARRAY_API"] = "1"
                import numpy as np
-               import dpnp
+               import torch
                from sklearnex import config_context
                from sklearnex.cluster import DBSCAN
 
                X = np.array([[1., 2.], [2., 2.], [2., 3.],
                              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
-               X = dpnp.array(X, device="gpu")
-               with config_context(array_api_dispatch=True)
+               X = torch.tensor(X, device="xpu")
+               with config_context(array_api_dispatch=True):
                    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
 
@@ -168,6 +168,8 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU.
 
    algorithms.rst
    oneapi-gpu.rst
+   config-contexts.rst
+   array_api.rst
    distributed-mode.rst
    distributed_daal4py.rst
    non-scikit-algorithms.rst
@@ -175,7 +177,6 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU.
    model_builders.rst
    logistic_model_builder.rst
    input-types.rst
-   array_api.rst
    verbose.rst
    preview.rst
    deprecation.rst
diff --git a/doc/sources/input-types.rst b/doc/sources/input-types.rst
index 790080e6bf..ceb61c5f80 100644
--- a/doc/sources/input-types.rst
+++ b/doc/sources/input-types.rst
@@ -29,10 +29,7 @@ and work with different classes of input data, including:
 - SciPy :external+scipy:doc:`sparse arrays and sparse matrices <tutorial/sparse>` (depending on the estimator).
 - Pandas :external+pandas:doc:`DataFrame and Series <user_guide/dsintro>` classes.
 
-In addition, |sklearnex| also supports:
-
-- :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>`.
-- :external+dpctl:doc:`dpctl.tensor <api_reference/dpctl/tensor>`.
+In addition, |sklearnex| also supports |dpnp_array| arrays, which are particularly useful for GPU computations.
 
 Stock Scikit-Learn estimators, depending on the version, might offer support for additional
 input types beyond this list, such as ``DataFrame`` and ``Series`` classes from other libraries
@@ -50,7 +47,8 @@ enabled the input is unsupported).
   The affected cases are listed below.
 
   - Non-contiguous NumPy array - i.e. where strides are wider than one element across both rows and columns
-  - For SciPy CSR matrix / array, index arrays are always copied.
+  - For SciPy CSR matrix / array, index arrays are always copied. Note that sparse matrices in formats other than CSR
+    will be converted to CSR, which implies more than just data copying.
   - Heterogeneous NumPy array
   - If SYCL queue is provided for device without ``float64`` support but data are ``float64``, data are copied with reduced precision.
   - If :ref:`Array API <array_api>` is not enabled then data from GPU devices are always copied to the host device and then result table 
diff --git a/doc/sources/oneapi-gpu.rst b/doc/sources/oneapi-gpu.rst
index fa54437a77..52d9d30571 100644
--- a/doc/sources/oneapi-gpu.rst
+++ b/doc/sources/oneapi-gpu.rst
@@ -15,20 +15,25 @@
 .. include:: substitutions.rst
 .. _oneapi_gpu:
 
-##############################################################
-oneAPI and GPU support in |sklearnex|
-##############################################################
+###########
+GPU support
+###########
 
-|sklearnex| can execute computations on different devices (CPUs and GPUs, including integrated GPUs from laptops and desktops) through the SYCL framework in oneAPI.
+Overview
+--------
 
-The device used for computations can be easily controlled through the target offloading functionality (e.g. through ``sklearnex.config_context(target_offload="gpu")``, which moves data to GPU if it's not already there - see rest of this page for more details), but for finer-grained controlled (e.g. operating on arrays that are already in a given device's memory), it can also interact with objects from package |dpctl|, which offers a Python interface over SYCL concepts such as devices, queues, and USM (unified shared memory) arrays.
+|sklearnex| can execute computations on different devices (CPUs and GPUs, including integrated GPUs from laptops and desktops) supported by the SYCL framework.
 
-While not strictly required, package |dpctl| is recommended for a better experience on GPUs - for example, it can provide GPU-allocated arrays that enable compute-follows-data execution models (i.e. so that ``target_offload`` wouldn't need to move the data from CPU to GPU).
+The device used for computations can be easily controlled through the ``target_offload`` option in config contexts, which moves data to GPU if it's not already there - see :doc:`config-contexts` and rest of this page for more details).
+
+For finer-grained controlled (e.g. operating on arrays that are already in a given device's memory), it can also interact with on-device :ref:`array API classes <array_api>` like |dpnp_array|, and with SYCL-related objects from package |dpctl| such as :obj:`dpctl.SyclQueue`.
+
+.. Note:: Note that not every operation from every estimator is supported on GPU - see the :ref:`GPU support table <sklearn_algorithms_gpu>` for more information. See also :doc:`verbose` to verify where computations are performed.
 
 .. important:: Be aware that GPU usage requires non-Python dependencies on your system, such as the `Intel(R) Compute Runtime <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_ (see below).
 
-Prerequisites
--------------
+Software Requirements
+---------------------
 
 For execution on GPUs, DPC++ runtime and Intel Compute Runtime (also referred to elsewhere as 'GPGPU drivers') are required.
 
@@ -76,93 +81,146 @@ Be aware that datacenter-grade devices, such as 'Flex' and 'Max', require differ
 
 For more details, see the `DPC++ requirements page <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/oneapi-dpcpp/2025.html>`__.
 
-Device offloading
------------------
+Running operations on GPU
+-------------------------
 
-|sklearnex| offers two options for running an algorithm on a specified device:
+|sklearnex| offers different options for running an algorithm on a specified device (e.g. a GPU):
 
-- Use global configurations of |sklearnex|:
+Target offload option
+~~~~~~~~~~~~~~~~~~~~~
 
-  1. The :code:`target_offload` argument (in ``config_context`` and in ``set_config`` / ``get_config``)
-     can be used to set the device primarily used to perform computations. Accepted data types are
-     :code:`str` and :obj:`dpctl.SyclQueue`. Strings must match to device names recognized by
-     the SYCL* device filter selector - for example, ``"gpu"``. If passing ``"auto"``,
-     the device will be deduced from the location of the input data. Examples:
+Just like |sklearn|, the |sklearnex| can use configuration contexts and global options to modify how it interacts with different inputs - see :doc:`config-contexts` for details.
 
-     .. code-block:: python
-        
-        from sklearnex import config_context
-        from sklearnex.linear_model import LinearRegression
-        
-        with config_context(target_offload="gpu"):
-            model = LinearRegression().fit(X, y)
+In particular, the |sklearnex| allows an option ``target_offload`` which can be passed a SYCL device name like ``"gpu"`` indicating where the operations should be performed, moving the data to that device in the process if it's not already there; or a :obj:`dpctl.SyclQueue` object from an already-existing queue on a device.
 
-     .. code-block:: python
-        
-        from sklearnex import set_config
-        from sklearnex.linear_model import LinearRegression
-        
-        set_config(target_offload="gpu")
-        model = LinearRegression().fit(X, y)
+.. hint:: If repeated operations are going to be performed on the same data (e.g. cross-validators, resamplers, missing data imputers, etc.), it's recommended to use the array API option instead - see the next section for details.
 
+Example:
 
-     If passing a string different than ``"auto"``,
-     it must be a device 
+.. tabs::
+    .. tab:: Passing a device name
+       .. code-block:: python
 
-  2. The :code:`allow_fallback_to_host` argument in those same configuration functions
-     is a Boolean flag. If set to :code:`True`, the computation is allowed
-     to fallback to the host device when a particular estimator does not support
-     the selected device. The default value is :code:`False`.
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+           from sklearn.datasets import make_regression
+           X, y = make_regression()
+           model = LinearRegression()
 
-These options can be set using :code:`sklearnex.set_config()` function or
-:code:`sklearnex.config_context`. To obtain the current values of these options,
-call :code:`sklearnex.get_config()`.
+           with config_context(target_offload="gpu"):
+               model.fit(X, y)
+               pred = model.predict(X)
 
-.. note::
-     Functions :code:`set_config`, :code:`get_config` and :code:`config_context`
-     are always patched after the :code:`sklearnex.patch_sklearn()` call.
+    .. tab:: Passing a SYCL queue
+       .. code-block:: python
 
-- Pass input data as :obj:`dpctl.tensor.usm_ndarray` to the algorithm.
+           import dpctl
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+           from sklearn.datasets import make_regression
+           X, y = make_regression()
+           model = LinearRegression()
 
-  The computation will run on the device where the input data is
-  located, and the result will be returned as :code:`usm_ndarray` to the same
-  device.
+           queue = dpctl.SyclQueue("gpu")
+           with config_context(target_offload=queue):
+               model.fit(X, y)
+               pred = model.predict(X)
 
-  .. important::
-    In order to enable zero-copy operations on GPU arrays, it's necessary to enable
-    :ref:`array API support <array_api>` for scikit-learn. Otherwise, if passing a GPU
-    array and array API support is not enabled, GPU arrays will first be transferred to
-    host and then back to GPU.
 
-  .. note::
-    All the input data for an algorithm must reside on the same device.
+.. warning::
+    When using ``target_offload``, operations on a fitted model must be executed under a context or global option with the same device or queue where the model was fitted - meaning: a model fitted on GPU cannot make predictions on CPU, and vice-versa. Note that upon serialization and subsequent deserialization of models, data is moved to the CPU.
+
+GPU arrays through array API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As another option, computations can also be performed on data that is already on a SYCL device without moving it there if it belongs to an array API-compatible class, such as |dpnp_array| or `torch.tensor <https://docs.pytorch.org/docs/stable/tensors.html>`__ (see also the `PyTorch Intel GPU docs <https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html>`__).
+
+This is particularly useful when multiple operations are performed on the same data (e.g. cross validators, stacked ensembles, etc.), or when the data is meant to interact with other libraries besides the |sklearnex|. Be aware that it requires enabling array API support in |sklearn|, which comes with additional dependencies.
+
+See :doc:`array_api` for details, instructions, and limitations. Example:
+
+.. tabs::
+    .. tab:: With Torch tensors
+       .. code-block:: python
+
+           # Array API support from sklearn requires enabling it on SciPy too
+           import os
+           os.environ["SCIPY_ARRAY_API"] = "1"
+
+           import numpy as np
+           import torch
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+
+           # Random data for a regression problem
+           rng = np.random.default_rng(seed=123)
+           X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
+           y_np = rng.standard_normal(size=100, dtype=np.float32)
 
-  .. warning::
-    The :code:`usm_ndarray` can only be consumed by the base methods
-    like :code:`fit`, :code:`predict`, and :code:`transform`.
-    Note that only the algorithms in |sklearnex| support
-    :code:`usm_ndarray`. The algorithms from the stock version of |sklearn|
-    do not support this feature.
+           # Torch offers an array-API-compliant class where data can be on GPU (referred to as 'xpu')
+           X = torch.tensor(X_np, device="xpu")
+           y = torch.tensor(y_np, device="xpu")
 
+           # Important to note again that array API must be enabled on scikit-learn
+           model = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model.fit(X, y)
 
-Example
--------
+    .. tab:: With DPNP arrays
+       .. code-block:: python
 
-A full example of how to patch your code with Intel CPU/GPU optimizations:
+           # Array API support from sklearn requires enabling it on SciPy too
+           import os
+           os.environ["SCIPY_ARRAY_API"] = "1"
+
+           import numpy as np
+           import dpnp
+           from sklearnex import config_context
+           from sklearnex.linear_model import LinearRegression
+
+           # Random data for a regression problem
+           rng = np.random.default_rng(seed=123)
+           X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
+           y_np = rng.standard_normal(size=100, dtype=np.float32)
+
+           # DPNP offers an array-API-compliant class where data can be on GPU
+           X = dpnp.array(X_np, device="gpu")
+           y = dpnp.array(y_np, device="gpu")
+
+           # Important to note again that array API must be enabled on scikit-learn
+           model = LinearRegression()
+           with config_context(array_api_dispatch=True):
+               model.fit(X, y)
+
+.. note::
+    Not all estimator classes in the |sklearnex| support array API objects - see the list of :ref:`estimators with array API support <array_api_estimators>` for details.
+
+DPNP Arrays
+~~~~~~~~~~~
+
+As a special case, GPU arrays from |dpnp| can be used without enabling array API, even for estimators in the |sklearnex| that do not currently support array API, but note that it involves data movement to host and back and is thus not the most efficient route in computational terms.
+
+Example:
 
 .. code-block:: python
 
-   from sklearnex import patch_sklearn, config_context
-   patch_sklearn()
+    import numpy as np
+    import dpnp
+    from sklearnex import config_context
+    from sklearnex.linear_model import LinearRegression
+
+    rng = np.random.default_rng(seed=123)
+    X_np = rng.standard_normal(size=(100, 10), dtype=np.float32)
+    y_np = rng.standard_normal(size=100, dtype=np.float32)
 
-   from sklearn.cluster import DBSCAN
+    X = dpnp.array(X_np, device="gpu")
+    y = dpnp.array(y_np, device="gpu")
 
-   X = np.array([[1., 2.], [2., 2.], [2., 3.],
-                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
-   with config_context(target_offload="gpu:0"):
-       clustering = DBSCAN(eps=3, min_samples=2).fit(X)
+    model = LinearRegression()
+    model.fit(X, y)
 
 
-.. note:: Current offloading behavior restricts fitting and predictions (a.k.a. inference) of any models to be
-     in the same context or absence of context. For example, a model whose ``.fit()`` method was called in a GPU context with
-     ``target_offload="gpu:0"`` will throw an error if a ``.predict()`` call is then made outside the same GPU context.
+Note that, if array API had been enabled, the snippet above would use the data as-is on the device where it resides, but without array API, it implies data movements using the SYCL queue contained by those objects.
+
+.. note::
+    All the input data for an algorithm must reside on the same device.
diff --git a/doc/sources/substitutions.rst b/doc/sources/substitutions.rst
index ded175b164..205484f848 100644
--- a/doc/sources/substitutions.rst
+++ b/doc/sources/substitutions.rst
@@ -14,6 +14,7 @@
 
 .. |dpctl| replace:: :external+dpctl:doc:`dpctl <index>`
 .. |dpnp| replace:: :external+dpnp:doc:`dpnp <index>`
+.. |dpnp_array| replace:: :external+dpnp:doc:`dpnp.ndarray <reference/ndarray>`
 .. |sklearn| replace:: :external+sklearn:doc:`scikit-learn <index>`
 .. |intelex_repo| replace:: |sklearnex| repository
 .. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
diff --git a/sklearnex/_config.py b/sklearnex/_config.py
index 793fc295c9..46e3bf5975 100644
--- a/sklearnex/_config.py
+++ b/sklearnex/_config.py
@@ -14,6 +14,7 @@
 # limitations under the License.
 # ==============================================================================
 
+import sys
 from contextlib import contextmanager
 
 from sklearn import get_config as skl_get_config
@@ -22,6 +23,61 @@
 from daal4py.sklearn._utils import sklearn_check_version
 from onedal._config import _get_config as onedal_get_config
 
+__all__ = ["get_config", "set_config", "config_context"]
+
+tab = "    " if (sys.version_info.major == 3 and sys.version_info.minor < 13) else ""
+_options_docstring = f"""Parameters
+{tab}----------
+{tab}target_offload : str or dpctl.SyclQueue or None
+{tab}    The device used to perform computations, either as a string indicating a name
+{tab}    recognized by the SyCL runtime, such as ``"gpu"``, ``"gpu:0"``, or as a
+{tab}    :obj:`dpctl.SyclQueue` object indicating where to move the data.
+{tab}
+{tab}    Assuming SyCL-related dependencies are installed, the list of devices recognized
+{tab}    by SyCL can be retrieved through the CLI tool ``sycl-ls`` in a shell, or through
+{tab}    :obj:`dpctl.get_devices` in a Python process.
+{tab}
+{tab}    String ``"auto"`` is also accepted.
+{tab}
+{tab}    Global default: ``"auto"``.
+{tab}
+{tab}allow_fallback_to_host : bool or None
+{tab}    If ``True``, allows computations to fall back to host device (CPU) when an unsupported
+{tab}    operation is attempted on GPU through ``target_offload``.
+{tab}
+{tab}    Global default: ``False``.
+{tab}
+{tab}allow_sklearn_after_onedal : bool or None, default=None
+{tab}    If ``True``, allows computations to fall back to stock scikit-learn when no
+{tab}    accelered version of the operation is available (see :ref:`algorithms`).
+{tab}
+{tab}    Global default: ``True.``
+{tab}
+{tab}use_raw_input : bool or None
+{tab}    If ``True``, uses the raw input data in some SPMD onedal backend computations
+{tab}    without any checks on data consistency or validity. Note that this can be
+{tab}    better achieved through usage of :ref:`array API classes <array_api>` without
+{tab}    ``target_offload``. Not recommended for general use.
+{tab}
+{tab}    Global default: ``False``.
+{tab}
+{tab}    .. deprecated:: 2026.0
+{tab}
+{tab}sklearn_configs : kwargs
+{tab}    Other settings accepted by scikit-learn. See :obj:`sklearn.set_config` for
+{tab}    details.
+{tab}
+{tab}Warnings
+{tab}--------
+{tab}Using ``use_raw_input=True`` is not recommended for general use as it
+{tab}bypasses data consistency checks, which may lead to unexpected behavior. It is
+{tab}recommended to use the newer :ref:`array API <array_api>` instead.
+{tab}
+{tab}Note
+{tab}----
+{tab}Usage of ``target_offload`` requires additional dependencies - see
+{tab}:ref:`GPU support <oneapi_gpu>` for more information."""
+
 
 def get_config():
     """Retrieve current values for configuration set by :func:`set_config`.
@@ -47,52 +103,15 @@ def set_config(
     allow_sklearn_after_onedal=None,
     use_raw_input=None,
     **sklearn_configs,
-):
+):  # numpydoc ignore=PR01,PR07
     """Set global configuration.
 
-    Parameters
-    ----------
-    target_offload : str or SyclQueue or None, default=None
-        The device primarily used to perform computations.
-        If string, expected to be "auto" (the execution context
-        is deduced from input data location),
-        or SYCL* filter selector string. Global default: "auto".
-
-    allow_fallback_to_host : bool or None, default=None
-        If True, allows to fallback computation to host device
-        in case particular estimator does not support the selected one.
-        Global default: False.
-
-    allow_sklearn_after_onedal : bool or None, default=None
-        If True, allows to fallback computation to sklearn after onedal
-        backend in case of runtime error on onedal backend computations.
-        Global default: True.
-
-    use_raw_input : bool or None, default=None
-        If True, uses the raw input data in some SPMD onedal backend computations
-        without any checks on data consistency or validity.
-        Not recommended for general use.
-        Global default: False.
-
-        .. deprecated:: 2026.0
-
-    **sklearn_configs : kwargs
-        Scikit-learn configuration settings dependent on the installed version
-        of scikit-learn.
+    %_options_docstring%
 
     See Also
     --------
     config_context : Context manager for global configuration.
     get_config : Retrieve current values of the global configuration.
-
-    Warnings
-    --------
-    Using ``use_raw_input=True`` is not recommended for general use as it
-    bypasses data consistency checks, which may lead to unexpected behavior.
-
-    Use of ``target_offload`` requires the DPC++ backend. Setting a
-    non-default value (e.g ``cpu`` or ``gpu``) without this backend active
-    will raise an error.
     """
 
     skl_set_config(**sklearn_configs)
@@ -109,37 +128,19 @@ def set_config(
         local_config["use_raw_input"] = use_raw_input
 
 
+set_config.__doc__ = set_config.__doc__.replace(
+    "%_options_docstring%", _options_docstring
+)
+
+
 @contextmanager
 def config_context(**new_config):  # numpydoc ignore=PR01,PR07
-    """Context manager for global scikit-learn configuration.
-
-    Parameters
-    ----------
-    target_offload : str or SyclQueue or None, default=None
-        The device primarily used to perform computations.
-        If string, expected to be "auto" (the execution context
-        is deduced from input data location),
-        or SYCL* filter selector string. Global default: "auto".
-
-    allow_fallback_to_host : bool or None, default=None
-        If True, allows to fallback computation to host device
-        in case particular estimator does not support the selected one.
-        Global default: False.
-
-    allow_sklearn_after_onedal : bool or None, default=None
-        If True, allows to fallback computation to sklearn after onedal
-        backend in case of runtime error on onedal backend computations.
-        Global default: True.
-
-    use_raw_input : bool or None, default=None
-        .. deprecated:: 2026.0
-        If True, uses the raw input data in some SPMD onedal backend computations
-        without any checks on data consistency or validity.
-        Not recommended for general use.
-        Global default: False.
-
-    Notes
-    -----
+    """Context manager for local scikit-learn-intelex configurations.
+
+    %_options_docstring%
+
+    Note
+    ----
     All settings, not just those presently modified, will be returned to
     their previous values when the context manager is exited.
 
@@ -147,11 +148,6 @@ def config_context(**new_config):  # numpydoc ignore=PR01,PR07
     --------
     set_config : Set global scikit-learn configuration.
     get_config : Retrieve current values of the global configuration.
-
-    Warnings
-    --------
-    Using ``use_raw_input=True`` is not recommended for general use as it
-    bypasses data consistency checks, which may lead to unexpected behavior.
     """
     old_config = get_config()
     set_config(**new_config)
@@ -160,3 +156,8 @@ def config_context(**new_config):  # numpydoc ignore=PR01,PR07
         yield
     finally:
         set_config(**old_config)
+
+
+config_context.__doc__ = config_context.__doc__.replace(
+    "%_options_docstring%", _options_docstring
+)