|
| 1 | +.. Copyright contributors to the oneDAL project |
| 2 | +.. |
| 3 | +.. Licensed under the Apache License, Version 2.0 (the "License"); |
| 4 | +.. you may not use this file except in compliance with the License. |
| 5 | +.. You may obtain a copy of the License at |
| 6 | +.. |
| 7 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 8 | +.. |
| 9 | +.. Unless required by applicable law or agreed to in writing, software |
| 10 | +.. distributed under the License is distributed on an "AS IS" BASIS, |
| 11 | +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 12 | +.. See the License for the specific language governing permissions and |
| 13 | +.. limitations under the License. |
| 14 | +.. include:: substitutions.rst |
| 15 | + |
| 16 | +.. _about_daal4py: |
| 17 | + |
| 18 | +About daal4py |
| 19 | +============= |
| 20 | + |
| 21 | +Introduction |
| 22 | +------------ |
| 23 | + |
| 24 | +``daal4py`` is a low-level module within the |sklearnex| package providing Python bindings |
| 25 | +over the |onedal|. It has been deprecated in favor of the newer ``sklearnex`` module in the |
| 26 | +same package, which offers a more idiomatic and higher-level interface for calling accelerated |
| 27 | +routines from the |onedal| in Python. |
| 28 | + |
| 29 | +Internally, ``daal4py`` is a Python wrapper over the `now-deprecated "DAAL" interface <https://uxlfoundation.github.io/oneDAL/index.html#oneapi-vs-daal>`__ |
| 30 | +of the |onedal|, while ``sklearnex`` is a module built atop of the "oneAPI" interface, offering |
| 31 | +DPC-based features such as :ref:`GPU support <oneapi_gpu>`. |
| 32 | + |
| 33 | +There is a large degree of overlap in the functionalities offered between the two modules |
| 34 | +``daal4py`` and ``sklearnex`` - module ``sklearnex`` should be prefered whenever possible, |
| 35 | +either by using it directly or through the :ref:`patching mechanism <patching>` - but ``daal4py`` |
| 36 | +exposes some additional functionalities from the |onedal| that ``sklearnex`` doesn't: |
| 37 | + |
| 38 | +- :ref:`Algorithms that are outside the scope of scikit-learn <non_sklearn_d4p>`. |
| 39 | +- :ref:`Distributed mode on CPU <distributed_daal4py>`. |
| 40 | +- Fast serving of gradient boosted decision trees from other libraries such as XGBoost |
| 41 | + (:ref:`model builders <model_builders>`). |
| 42 | + |
| 43 | +Previously ``daal4py`` was distributed as a separate package, but it is now an importable module |
| 44 | +within the ``scikit-learn-intelex`` package - meaning, after installing ``scikit-learn-intelex``, |
| 45 | +it can be imported as follows: |
| 46 | + |
| 47 | +.. code:: |
| 48 | +
|
| 49 | + import daal4py |
| 50 | +
|
| 51 | +For documentation about specific functions, see the :ref:`daal4py API reference <daal4py_ref>`. |
| 52 | + |
| 53 | + |
| 54 | +Using daal4py |
| 55 | +------------- |
| 56 | + |
| 57 | +Unlike ``sklearnex``, ``daal4py``, being a lower-level interface, does not follow scikit-learn |
| 58 | +idioms - instead, the process for calling procedures from the ``daal4py`` interface is as follows: |
| 59 | + |
| 60 | +- Instantiate an 'algorithm' class by calling its contructor, without any data - for example: |
| 61 | + ``qr_algo = daal4py.qr()``. |
| 62 | +- Call the 'compute' method of that instantiated algorithm in order to obtain a 'result' object, |
| 63 | + passing it the data on which it will operate - for example: ``qr_result = qr_algo.compute(X)``. |
| 64 | +- Access the relevant results in the 'result' object - for example: ``R = qr_result.matrixR``. |
| 65 | + |
| 66 | + |
| 67 | +Full example calling the QR algorithm: |
| 68 | + |
| 69 | +.. code:: |
| 70 | +
|
| 71 | + import daal4py |
| 72 | + import numpy as np |
| 73 | +
|
| 74 | + rng = np.random.default_rng(seed=123) |
| 75 | + X = rng.standard_normal(size=(100,5)) |
| 76 | +
|
| 77 | + qr_algo = daal4py.qr() |
| 78 | + qr_result = qr_algo.compute(X) |
| 79 | +
|
| 80 | + np.testing.assert_almost_equal( |
| 81 | + np.abs( qr_result.matrixR ), |
| 82 | + np.abs( np.linalg.qr(X).R ), |
| 83 | + ) |
| 84 | +
|
| 85 | +.. note:: |
| 86 | + QR factorization, unlike other linear algebra procedures, does not have a strictly unique |
| 87 | + solution - if the signs (+/-) of numbers are flipped for a particular column in both the Q |
| 88 | + and R matrices, they would still be valid and equivalent QR factorizations of the same |
| 89 | + original matrix 'X'. |
| 90 | + |
| 91 | + Procedures like Cholesky decomposition are typically constrained to have only positive signs |
| 92 | + in the main diagonal in order to make the results deterministic, but this is not always the |
| 93 | + case for QR in most software, hence the example above takes the absolute values when comparing |
| 94 | + results from different libraries. |
| 95 | + |
| 96 | + |
| 97 | +Streaming mode |
| 98 | +************** |
| 99 | + |
| 100 | +Many algorithms in ``daal4py`` accept an argument ``streaming=True``, which allows executing the |
| 101 | +computations in a 'streaming' or 'online' fashion, by supplying it different subsets of the data, |
| 102 | +one at a time (batches), instead of passing the whole data upfront, while still arriving at the |
| 103 | +same final result as if all the data had been passed at once. |
| 104 | + |
| 105 | +.. note:: |
| 106 | + The ``sklearnex`` module also offers incremental versions of some algorithms - see the docs |
| 107 | + on :ref:`extension_estimators` for more details. |
| 108 | + |
| 109 | +This can be useful for executing algorithms on large datasets that don't fit in memory but which |
| 110 | +can still be loaded in smaller chunks, or for machine learning models that are constantly being |
| 111 | +updated as new data is collected, for example. |
| 112 | + |
| 113 | +In order to use streaming mode, the algorithm constructor needs to be passed argument ``streaming=True``, |
| 114 | +method ``.compute()`` needs to be called multiple times with different data, and the 'result' |
| 115 | +object should be obtained by calling method ``.finalize()`` after all the data has been passed. |
| 116 | + |
| 117 | +Example: :: |
| 118 | + |
| 119 | + import daal4py |
| 120 | + import numpy as np |
| 121 | + |
| 122 | + rng = np.random.default_rng(seed=123) |
| 123 | + X_full = rng.standard_normal(size=(100,5)) |
| 124 | + batches = np.split(np.arange(100), 5) |
| 125 | + |
| 126 | + qr_algo = daal4py.qr(streaming=True) |
| 127 | + for batch in batches: |
| 128 | + X_batch = X_full[batch] |
| 129 | + qr_algo.compute(X_batch) |
| 130 | + |
| 131 | + qr_result = qr_algo.finalize() |
| 132 | + |
| 133 | + np.testing.assert_almost_equal( |
| 134 | + np.abs( qr_result.matrixR ), |
| 135 | + np.abs( np.linalg.qr(X).R ), |
| 136 | + ) |
| 137 | + |
| 138 | +List of algorithms in ``daal4py`` supporting streaming mode: |
| 139 | + |
| 140 | +- :obj:`SVD <daal4py.svd>` |
| 141 | +- :obj:`Linear Regression <daal4py.linear_regression_training>` |
| 142 | +- :obj:`Ridge Regression <daal4py.ridge_regression_training>` |
| 143 | +- :obj:`Multinomial Naive Bayes <daal4py.multinomial_naive_bayes_training>` |
| 144 | +- :obj:`Moments of Low Order <daal4py.low_order_moments>` |
| 145 | +- :obj:`Covariance <daal4py.covariance>` |
| 146 | +- :obj:`QR decomposition <daal4py.qr>` |
| 147 | + |
| 148 | +Distributed mode |
| 149 | +**************** |
| 150 | + |
| 151 | +Many algorithms in ``daal4py`` accept an argument ``distributed=True``, which allows |
| 152 | +running computations in a distributed compute nodes using the MPI framework. |
| 153 | + |
| 154 | +See the section :ref:`distributed_daal4py` for more details. |
| 155 | + |
| 156 | +Documentation |
| 157 | +************* |
| 158 | + |
| 159 | +See :ref:`daal4py_ref` for the full documentation of functions and classes. |
0 commit comments