uxlfoundation
diff --git a/‎README.md‎
Lines changed: 6 additions & 4 deletions b/‎README.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎doc/sources/algorithms.rst‎
Lines changed: 4 additions & 1 deletion b/‎doc/sources/algorithms.rst‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎doc/sources/conf.py‎
Lines changed: 1 addition & 0 deletions b/‎doc/sources/conf.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/sources/distributed-mode.rst‎
Lines changed: 64 additions & 18 deletions b/‎doc/sources/distributed-mode.rst‎
Lines changed: 64 additions & 18 deletions
diff --git a/‎doc/sources/index.rst‎
Lines changed: 13 additions & 11 deletions b/‎doc/sources/index.rst‎
Lines changed: 13 additions & 11 deletions
@@ -45,8 +45,8 @@ The software acceleration is achieved with vector instructions, AI hardware-spec
 
 With Intel(R) Extension for Scikit-learn, you can:
 
-* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
-* Benefit from performance improvements across different Intel(R) hardware configurations
+* Speed up training and inference by up to 100x with equivalent mathematical accuracy
+* Benefit from performance improvements across different Intel(R) hardware configurations, including GPUs and multi-GPU configurations
 * Integrate the extension into your existing Scikit-learn applications without code modifications
 * Continue to use the open-source scikit-learn API
 * Enable and disable the extension with a couple of lines of code or at the command line
@@ -71,12 +71,14 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
     from sklearn.cluster import DBSCAN
 
     X = np.array([[1., 2.], [2., 2.], [2., 3.],
-                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
     clustering = DBSCAN(eps=3, min_samples=2).fit(X)
     ```
 
 - **Enable Intel(R) GPU optimizations**
 
+    _Note: executing on GPU has [additional system software requirements](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html) - see [details](https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html)._
+
     ```py
     import numpy as np
     import dpctl
@@ -86,7 +88,7 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
     from sklearn.cluster import DBSCAN
 
     X = np.array([[1., 2.], [2., 2.], [2., 3.],
-                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
     with config_context(target_offload="gpu:0"):
         clustering = DBSCAN(eps=3, min_samples=2).fit(X)
     ```
 
@@ -12,13 +12,14 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
+.. include:: substitutions.rst
 .. _sklearn_algorithms:
 
 ####################
 Supported Algorithms
 ####################
 
-Applying |intelex| impacts the following scikit-learn algorithms:
+Applying |intelex| impacts the following |sklearn| estimators:
 
 on CPU
 ------
@@ -380,6 +381,8 @@ Other Tasks
      - All parameters are supported
      - Only dense data is supported
 
+.. _spmd-support:
+
 SPMD Support
 ------------
 
 
@@ -75,6 +75,7 @@
 
 intersphinx_mapping = {
     "sklearn": ("https://scikit-learn.org/stable/", None),
+    "dpctl": ("https://intelpython.github.io/dpctl/latest", None),
     # from scikit-learn, in case some object in sklearnex points to them:
     # https://github.com/scikit-learn/scikit-learn/blob/main/doc/conf.py
     "python": ("https://docs.python.org/{.major}".format(sys.version_info), None),
 
@@ -12,39 +12,85 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
+.. include:: substitutions.rst
+
 .. _distributed:
 
-Distributed Mode
-================
+Distributed Mode (SPMD)
+=======================
 
 |intelex| offers Single Program, Multiple Data (SPMD) supported interfaces for distributed computing.
-Several `GPU-supported algorithms <https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html#>`_
-also provide distributed, multi-GPU computing capabilities via integration with ``mpi4py``. The prerequisites
+Several :doc:`GPU-supported algorithms <oneapi-gpu>`
+also provide distributed, multi-GPU computing capabilities via integration with |mpi4py|. The prerequisites
 match those of GPU computing, along with an MPI backend of your choice (`Intel MPI recommended
 <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.dcan6r>`_, available
-via ``impi-devel`` python package) and the ``mpi4py`` python package. If using |intelex|
+via ``impi_rt`` python package) and the |mpi4py| python package. If using |intelex|
 `installed from sources <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md#build-from-sources>`_,
 ensure that the spmd_backend is built.
 
-Note that |intelex| now supports GPU offloading to speed up MPI operations. This is supported automatically with
-some MPI backends, but in order to use GPU offloading with Intel MPI, set the following environment variable (providing
+.. important::
+  SMPD mode requires the |mpi4py| package used at runtime to be compiled with the same MPI backend as the |intelex|. The PyPI and Conda distributions of |intelex| both use Intel's MPI as backend, and hence require an |mpi4py| also built with Intel's MPI - it can be easily installed from Intel's conda channel as follows::
+    
+    conda install -c https://software.repos.intel.com/python/conda/ mpi4py
+
+  It also requires the MPI runtime executable (``mpiexec`` / ``mpirun``) to be from the same library that was used to compile the |intelex| - Intel's MPI runtime library is offered as a Python package ``impi_rt`` and will be installed together with the ``mpi4py`` package if executing the command above, but otherwise, it can be installed separately from different distribution channels:
+
+  - Intel's conda channel (recommended)::
+
+      conda install -c https://software.repos.intel.com/python/conda/ impi_rt
+
+  - Conda-Forge::
+
+      conda install -c conda-forge impi_rt
+
+  - PyPI (not recommended, might require setting additional environment variables)::
+
+      pip install impi_rt
+
+  Using other MPI backends (e.g. OpenMPI) requires building |intelex| from source with that backend.
+
+Note that |intelex| supports GPU offloading to speed up MPI operations. This is supported automatically with
+some MPI backends, but in order to use GPU offloading with Intel MPI, it is required to set the environment variable ``I_MPI_OFFLOAD`` to ``1`` (providing
 data on device without this may lead to a runtime error):
 
-::
+- On Linux*::
+    
+    export I_MPI_OFFLOAD=1
+
+- On Windows*::
+    
+    set I_MPI_OFFLOAD=1
+
+SMPD-aware versions of estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
+desired, and should be transfered to a |dpctl| or `dpnp <https://github.com/IntelPython/dpnp>`__ array before being passed to the estimator.
+
+Note that SPMD estimators allow an additional argument ``queue`` in their ``.fit`` / ``.predict`` methods, which accept :obj:`dpctl.SyclQueue` objects. For example, while the signature for :obj:`sklearn.linear_model.LinearRegression.predict` would be
+
+.. code-block:: python
+
+    def predict(self, X): ...
+
+The signature for the corresponding predict method in ``sklearnex.spmd.linear_model.LinearRegression.predict`` is:
+
+.. code-block:: python
+
+    def predict(self, X, queue=None): ...
+
+Examples of SPMD usage can be found in the GitHub repository for the |intelex| under `examples/sklearnex <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex>`__.
 
-     export I_MPI_OFFLOAD=1
+To run on SPMD mode, first create a python file using SPMD estimators from ``sklearnex.spmd``, such as `linear_regression_spmd.py <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/linear_regression_spmd.py>`__.
 
-Estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
-desired, and should be transfered to a dpctl or dpnp array before being passed to the estimator. View a full
-example of this process in the |intelex| repository, where many examples of our SPMD-supported estimators are
-available: https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/. To run:
+Then, execute the file through MPI under multiple ranks - for example:
 
-::
+- On Linux*::
+    
+    mpirun -n 4 python linear_regression_spmd.py
 
-  mpirun -n 4 python linear_regression_spmd.py
+- On Windows*::
+    
+    mpiexec -n 4 python linear_regression_spmd.py
 
-Note that additional mpirun arguments can be added as desired. SPMD-supported estimators are listed in the
-`algorithms support documentation <https://uxlfoundation.github.io/scikit-learn-intelex/latest/algorithms.html#spmd-support>`_.
+Note that additional ``mpirun`` arguments can be added as desired. SPMD-supported estimators are listed in the :ref:`spmd-support` section.
 
-Additionally, daal4py offers some distributed functionality, see
+Additionally, ``daal4py`` (previously a separate package, now an importable module within ``scikit-learn-intelex``) offers some distributed functionality, see
 `documentation <https://intelpython.github.io/daal4py/scaling.html>`_ for further details.
@@ -12,29 +12,28 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
-.. |intelex_repo| replace:: |intelex| repository
-.. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
+.. include:: substitutions.rst
 
 .. _index:
 
 #########
 |intelex|
 #########
 
-Intel(R) Extension for Scikit-learn is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing scikit-learn code.
+|intelex| is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing |sklearn| code.
 The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel(R) platforms at launch time.
 
 .. rubric:: Designed for Data Scientists and Framework Designers
 
 
-Use Intel(R) Extension for Scikit-learn, to:
+Use |intelex|, to:
 
-* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
-* Benefit from performance improvements across different x86-compatible CPUs or Intel(R) GPUs
-* Integrate the extension into your existing Scikit-learn applications without code modifications
+* Speed up training and inference by up to 100x with equivalent mathematical accuracy
+* Benefit from performance improvements across different x86-64 CPUs and Intel(R) GPUs
+* Integrate the extension into your existing |sklearn| applications without code modifications
 * Enable and disable the extension with a couple of lines of code or at the command line
 
-Intel(R) Extension for Scikit-learn is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.
+|intelex| is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.
 
 
 .. image:: _static/scikit-learn-acceleration.PNG
@@ -65,11 +64,14 @@ Enable Intel(R) CPU Optimizations
    from sklearn.cluster import DBSCAN
 
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
-               [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
 Enable Intel(R) GPU optimizations
 *********************************
+
+Note: executing on GPU has `additional system software requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`__ - see :doc:`oneapi-gpu`.
+
 ::
 
    import numpy as np
@@ -80,7 +82,7 @@ Enable Intel(R) GPU optimizations
    from sklearn.cluster import DBSCAN
 
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
-               [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    with config_context(target_offload="gpu:0"):
        clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
@@ -101,7 +103,7 @@ Enable Intel(R) GPU optimizations
    :maxdepth: 2
 
    algorithms.rst
-   oneAPI and GPU support <oneapi-gpu.rst>
+   oneapi-gpu.rst
    distributed-mode.rst
    non-scikit-algorithms.rst
    input-types.rst