Skip to content

Commit 43e99eb

Browse files
david-cortes-intelicfaustVika-F
authored
DOC: Document selected daal4py sections in sklearnex, add full API reference section (#2395)
* add sections about d4p plus full api docs as section * remove reference to daal4py from readme * typo * Update doc/sources/about_daal4py.rst Co-authored-by: Ian Faust <[email protected]> * Update doc/sources/distributed_daal4py.rst Co-authored-by: Ian Faust <[email protected]> * mention incremental sklearn algorithms in streaming d4p section * break down explanation for using d4p algorithms * Update doc/sources/distributed_daal4py.rst Co-authored-by: Ian Faust <[email protected]> * add new RNGs * more missing entries * reword model builders * mention logreg in model builders title * more details about daal4py * fix broken links * add better article from intel * Update doc/sources/distributed_daal4py.rst Co-authored-by: Victoriya Fedotova <[email protected]> * Update doc/sources/daal4py.rst Co-authored-by: Victoriya Fedotova <[email protected]> * Update doc/daal4py/scaling.rst Co-authored-by: Victoriya Fedotova <[email protected]> --------- Co-authored-by: Ian Faust <[email protected]> Co-authored-by: Victoriya Fedotova <[email protected]>
1 parent cc12226 commit 43e99eb

16 files changed

+1784
-12
lines changed

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -131,11 +131,10 @@ To patch scikit-learn, you can:
131131
* [Medium Blogs](https://uxlfoundation.github.io/scikit-learn-intelex/latest/blogs.html)
132132
* [Code of Conduct](https://github.com/uxlfoundation/scikit-learn-intelex/blob/master/CODE_OF_CONDUCT.md)
133133
134-
### daal4py and oneDAL
134+
### Extension and oneDAL
135135
136-
The acceleration is achieved through the use of the oneAPI Data Analytics Library (oneDAL). Learn more:
137-
- [About oneAPI Data Analytics Library](https://github.com/uxlfoundation/oneDAL)
138-
- [About daal4py](https://github.com/uxlfoundation/scikit-learn-intelex/tree/main/daal4py)
136+
Acceleration in patched scikit-learn classes is achieved by replacing calls to scikit-learn with calls to oneDAL (oneAPI Data Analytics Library) behind the scenes:
137+
- [oneAPI Data Analytics Library](https://github.com/uxlfoundation/oneDAL)
139138
140139
## Samples & Examples
141140

daal4py/mb/logistic_regression_builders.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class LogisticDAALModel:
4040
which can calculate fast predictions of different types (classes, probabilities,
4141
logarithms of probabilities), from fitted coefficients and intercepts obtained
4242
elsewhere (such as from :obj:`sklearn.linear_model.LogisticRegression`), making
43-
the predictions either in double (:obj:`np.float64`) or single (:obj:`np.float32`)
43+
the predictions either in double (``np.float64``) or single (``np.float32``)
4444
precision.
4545
4646
See Also

doc/daal4py/scaling.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,9 @@ The following algorithms support distributed mode:
9090

9191
- `K-Means <https://github.com/uxlfoundation/scikit-learn-intelex/tree/main/examples/daal4py/kmeans_spmd.py>`_
9292

93+
- DBSCAN
94+
95+
- `DBSCAN <https://github.com/uxlfoundation/scikit-learn-intelex/tree/main/examples/daal4py/dbscan_spmd.py>`_
9396
- Correlation and Variance-Covariance Matrices (covariance)
9497

9598
- `Covariance <https://github.com/uxlfoundation/scikit-learn-intelex/tree/main/examples/daal4py/covariance_spmd.py>`_

doc/sources/about_daal4py.rst

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
.. Copyright contributors to the oneDAL project
2+
..
3+
.. Licensed under the Apache License, Version 2.0 (the "License");
4+
.. you may not use this file except in compliance with the License.
5+
.. You may obtain a copy of the License at
6+
..
7+
.. http://www.apache.org/licenses/LICENSE-2.0
8+
..
9+
.. Unless required by applicable law or agreed to in writing, software
10+
.. distributed under the License is distributed on an "AS IS" BASIS,
11+
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
.. See the License for the specific language governing permissions and
13+
.. limitations under the License.
14+
.. include:: substitutions.rst
15+
16+
.. _about_daal4py:
17+
18+
About daal4py
19+
=============
20+
21+
Introduction
22+
------------
23+
24+
``daal4py`` is a low-level module within the |sklearnex| package providing Python bindings
25+
over the |onedal|. It has been deprecated in favor of the newer ``sklearnex`` module in the
26+
same package, which offers a more idiomatic and higher-level interface for calling accelerated
27+
routines from the |onedal| in Python.
28+
29+
Internally, ``daal4py`` is a Python wrapper over the `now-deprecated "DAAL" interface <https://uxlfoundation.github.io/oneDAL/index.html#oneapi-vs-daal>`__
30+
of the |onedal|, while ``sklearnex`` is a module built atop of the "oneAPI" interface, offering
31+
DPC-based features such as :ref:`GPU support <oneapi_gpu>`.
32+
33+
There is a large degree of overlap in the functionalities offered between the two modules
34+
``daal4py`` and ``sklearnex`` - module ``sklearnex`` should be prefered whenever possible,
35+
either by using it directly or through the :ref:`patching mechanism <patching>` - but ``daal4py``
36+
exposes some additional functionalities from the |onedal| that ``sklearnex`` doesn't:
37+
38+
- :ref:`Algorithms that are outside the scope of scikit-learn <non_sklearn_d4p>`.
39+
- :ref:`Distributed mode on CPU <distributed_daal4py>`.
40+
- Fast serving of gradient boosted decision trees from other libraries such as XGBoost
41+
(:ref:`model builders <model_builders>`).
42+
43+
Previously ``daal4py`` was distributed as a separate package, but it is now an importable module
44+
within the ``scikit-learn-intelex`` package - meaning, after installing ``scikit-learn-intelex``,
45+
it can be imported as follows:
46+
47+
.. code::
48+
49+
import daal4py
50+
51+
For documentation about specific functions, see the :ref:`daal4py API reference <daal4py_ref>`.
52+
53+
54+
Using daal4py
55+
-------------
56+
57+
Unlike ``sklearnex``, ``daal4py``, being a lower-level interface, does not follow scikit-learn
58+
idioms - instead, the process for calling procedures from the ``daal4py`` interface is as follows:
59+
60+
- Instantiate an 'algorithm' class by calling its contructor, without any data - for example:
61+
``qr_algo = daal4py.qr()``.
62+
- Call the 'compute' method of that instantiated algorithm in order to obtain a 'result' object,
63+
passing it the data on which it will operate - for example: ``qr_result = qr_algo.compute(X)``.
64+
- Access the relevant results in the 'result' object - for example: ``R = qr_result.matrixR``.
65+
66+
67+
Full example calling the QR algorithm:
68+
69+
.. code::
70+
71+
import daal4py
72+
import numpy as np
73+
74+
rng = np.random.default_rng(seed=123)
75+
X = rng.standard_normal(size=(100,5))
76+
77+
qr_algo = daal4py.qr()
78+
qr_result = qr_algo.compute(X)
79+
80+
np.testing.assert_almost_equal(
81+
np.abs( qr_result.matrixR ),
82+
np.abs( np.linalg.qr(X).R ),
83+
)
84+
85+
.. note::
86+
QR factorization, unlike other linear algebra procedures, does not have a strictly unique
87+
solution - if the signs (+/-) of numbers are flipped for a particular column in both the Q
88+
and R matrices, they would still be valid and equivalent QR factorizations of the same
89+
original matrix 'X'.
90+
91+
Procedures like Cholesky decomposition are typically constrained to have only positive signs
92+
in the main diagonal in order to make the results deterministic, but this is not always the
93+
case for QR in most software, hence the example above takes the absolute values when comparing
94+
results from different libraries.
95+
96+
97+
Streaming mode
98+
**************
99+
100+
Many algorithms in ``daal4py`` accept an argument ``streaming=True``, which allows executing the
101+
computations in a 'streaming' or 'online' fashion, by supplying it different subsets of the data,
102+
one at a time (batches), instead of passing the whole data upfront, while still arriving at the
103+
same final result as if all the data had been passed at once.
104+
105+
.. note::
106+
The ``sklearnex`` module also offers incremental versions of some algorithms - see the docs
107+
on :ref:`extension_estimators` for more details.
108+
109+
This can be useful for executing algorithms on large datasets that don't fit in memory but which
110+
can still be loaded in smaller chunks, or for machine learning models that are constantly being
111+
updated as new data is collected, for example.
112+
113+
In order to use streaming mode, the algorithm constructor needs to be passed argument ``streaming=True``,
114+
method ``.compute()`` needs to be called multiple times with different data, and the 'result'
115+
object should be obtained by calling method ``.finalize()`` after all the data has been passed.
116+
117+
Example: ::
118+
119+
import daal4py
120+
import numpy as np
121+
122+
rng = np.random.default_rng(seed=123)
123+
X_full = rng.standard_normal(size=(100,5))
124+
batches = np.split(np.arange(100), 5)
125+
126+
qr_algo = daal4py.qr(streaming=True)
127+
for batch in batches:
128+
X_batch = X_full[batch]
129+
qr_algo.compute(X_batch)
130+
131+
qr_result = qr_algo.finalize()
132+
133+
np.testing.assert_almost_equal(
134+
np.abs( qr_result.matrixR ),
135+
np.abs( np.linalg.qr(X).R ),
136+
)
137+
138+
List of algorithms in ``daal4py`` supporting streaming mode:
139+
140+
- :obj:`SVD <daal4py.svd>`
141+
- :obj:`Linear Regression <daal4py.linear_regression_training>`
142+
- :obj:`Ridge Regression <daal4py.ridge_regression_training>`
143+
- :obj:`Multinomial Naive Bayes <daal4py.multinomial_naive_bayes_training>`
144+
- :obj:`Moments of Low Order <daal4py.low_order_moments>`
145+
- :obj:`Covariance <daal4py.covariance>`
146+
- :obj:`QR decomposition <daal4py.qr>`
147+
148+
Distributed mode
149+
****************
150+
151+
Many algorithms in ``daal4py`` accept an argument ``distributed=True``, which allows
152+
running computations in a distributed compute nodes using the MPI framework.
153+
154+
See the section :ref:`distributed_daal4py` for more details.
155+
156+
Documentation
157+
*************
158+
159+
See :ref:`daal4py_ref` for the full documentation of functions and classes.

doc/sources/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@
7474
intersphinx_mapping = {
7575
"sklearn": ("https://scikit-learn.org/stable/", None),
7676
"dpctl": ("https://intelpython.github.io/dpctl/latest", None),
77+
"mpi4py": ("https://mpi4py.readthedocs.io/en/stable/", None),
78+
"xgboost": ("https://xgboost.readthedocs.io/en/stable/", None),
7779
# from scikit-learn, in case some object in sklearnex points to them:
7880
# https://github.com/scikit-learn/scikit-learn/blob/main/doc/conf.py
7981
"python": ("https://docs.python.org/{.major}".format(sys.version_info), None),

doc/sources/d4p-kmeans-scale.jpg

82.5 KB
Loading

0 commit comments

Comments
 (0)