diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index 0a65185a0e..0000000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,124 +0,0 @@ - - -# How to Contribute - -As an open source project, we welcome community contributions to Extension for Scikit-learn. -This document explains how to participate in project conversations, log bugs and enhancement requests, and submit code patches. - -## Licensing - -Extension for Scikit-learn uses the [Apache 2.0 License](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/LICENSE). -By contributing to the project, you agree to the license and copyright terms and release your own contributions under these terms. - -### Copyright Guidelines for Contributions - -Each new file added to the project must include the following copyright notice - note that this project is closely tied -to [oneDAL](https://github.com/uxlfoundation/oneDAL) and hence shares the same copyright header: - -* For Python files: -```python -# ============================================================================== -# Copyright contributors to the oneDAL project -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -``` - -* For markdown files: -```` - -```` - -* For JavaScript files: -```javascript -// Copyright contributors to the oneDAL project -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -``` - -## Pull Requests - -No anonymous contributions are accepted. The name in the commit message Signed-off-by line and your email must match the change authorship information. - -Make sure your ``.gitconfig`` is set up correctly so you can use `git commit -s` for signing your patches: - -`git config --global user.name "Kate Developer"` - -`git config --global user.email kate.developer@company.com` - -### Before Contributing Changes - -* Make sure you can build the product and run all the tests with your patch. -* For a larger feature, provide a relevant test. -* Document your code. Our project uses reStructuredText for documentation. -* For new file(s), specify the appropriate copyright year in the first line. -* Submit a pull request into the main branch. - -Continuous Integration (CI) testing is enabled for the repository. Your pull request must pass all checks before it can be merged. We will review your contribution and may provide feedback to guide you if any additional fixes or modifications are necessary. When reviewed and accepted, your pull request will be merged into our GitHub repository. - -## Code Style - -We use [black](https://black.readthedocs.io/en/stable/) version 24.1.1 and [isort](https://pycqa.github.io/isort/) version 5.13.2 formatters for Python* code. The line length is 90 characters; use default options otherwise. You can find the linter configuration in [.pyproject.toml](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/pyproject.toml). - -A GitHub* Action verifies if your changes comply with the output of the auto-formatting tools. - -Optionally, you can install pre-commit hooks that do the formatting for you. For this, run from the top level of the repository: - -```bash -pip install pre-commit -pre-commit install -``` - -## Ideas - -If you want to contribute but do not know where to start we maintain a [public list](https://uxlfoundation.github.io/scikit-learn-intelex/latest/ideas.html) of projects which include difficulty and effort in our documentation. These ideas have linked issues on GitHub where you can message us for next steps. diff --git a/INSTALL.md b/INSTALL.md deleted file mode 100755 index 7955ad7423..0000000000 --- a/INSTALL.md +++ /dev/null @@ -1,314 +0,0 @@ - - - -# Installation - -To install Extension for Scikit-learn*, use one of the following scenarios: - -- [Before You Begin](#before-you-begin) -- [Install via PIP](#install-via-pip) - - [Install from PyPI Channel (recommended by default)](#install-from-pypi-channel-recommended-by-default) -- [Install via conda](#install-via-conda) - - [Install from Conda-Forge Channel](#install-from-conda-forge-channel) - - [Install from Intel conda Channel](#install-from-intel-conda-channel) -- [Build from Sources](#build-from-sources) - - [Prerequisites](#prerequisites) - - [Configure the Build with Environment Variables](#configure-the-build-with-environment-variables) - - [Build Extension for Scikit-learn](#build-intelr-extension-for-scikit-learn) -- [Build from Sources with `conda-build`](#build-from-sources-with-conda-build) - - [Prerequisites for `conda-build`](#prerequisites-for-conda-build) - - [Build Extension for Scikit-learn with `conda-build`](#build-intelr-extension-for-scikit-learn-with-conda-build) -- [Next Steps](#next-steps) - - -## Before You Begin - -Check [System](https://uxlfoundation.github.io/scikit-learn-intelex/latest/system-requirements.html) and [Memory](https://uxlfoundation.github.io/scikit-learn-intelex/latest/memory-requirements.html) Requirements. - -## Supported Configurations - -* Operating systems: Linux*, Windows* -* Python versions: 3.9 through 3.13 -* Devices: CPU, GPU -* Distribution channels: - * PyPI - * Conda-Forge Channel - * Intel conda Channel (https://software.repos.intel.com/python/conda/) - -## Install via PIP - -To prevent version conflicts, create and activate a new environment: - - - On Linux: - - ```bash - python -m venv env - source env/bin/activate - ``` - - - On Windows: - - ```bash - python -m venv env - .\env\Scripts\activate - ``` - -### Install from PyPI Channel (recommended by default) - -Install `scikit-learn-intelex`: - - ```bash - pip install scikit-learn-intelex - ``` - -## Install via conda - -To prevent version conflicts, we recommend to create and activate a new environment. - -### Install from Conda-Forge Channel - -- Install into a newly created environment (recommended): - - ```bash - conda create -n sklex -c conda-forge --override-channels scikit-learn-intelex - conda activate sklex - ``` - -- Install into your current environment: - - ```bash - conda install -c conda-forge scikit-learn-intelex - ``` - -### Install Intel conda Channel - -We recommend this installation for the users of Intelยฎ Distribution for Python. - -- Install into a newly created environment (recommended): - - ```bash - conda create -n sklex -c https://software.repos.intel.com/python/conda/ -c conda-forge --override-channels scikit-learn-intelex - conda activate sklex - ``` - -- Install into your current environment: - - ```bash - conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge scikit-learn-intelex - ``` - -**Note:** packages from the Intel channel are meant to be used together with dependencies from the **conda-forge** channel, and might not -work correctly when used in an environment where packages from the `anaconda` default channel have been installed. It is -advisable to use the [miniforge](https://github.com/conda-forge/miniforge) installer for `conda`/`mamba`, as it comes with -`conda-forge` as the only default channel. - - -## Build from Sources -Extension for Scikit-learn* is easily built from the sources with the majority of the necessary prerequisites available with conda or pip. - -The package is available for Windows* OS, Linux* OS, and macOS*. - -Multi-node (distributed) and streaming support can be disabled if needed. - -The build-process (using setup.py) happens in 4 stages: -1. Creating C++ and Cython sources from oneDAL C++ headers -2. Building oneDAL Python interfaces via cmake and pybind11 -3. Running Cython on generated sources -4. Compiling and linking them - -### Prerequisites -* Python version >= 3.9 -* Jinja2 -* Cython -* Numpy -* cmake and pybind11 -* A C++ compiler with C++11 support -* Clang-Format version >=14 -* [oneAPI Data Analytics Library (oneDAL)](https://github.com/uxlfoundation/oneDAL) version 2021.1 or later, but be mindful that **the oneDAL version must be <= than that of scikit-learn-intelex** (it's backwards compatible but not forwards compatible). - * You can use the pre-built `dal-devel` conda package from conda-forge channel -* MPI (optional, needed for distributed mode) - * You can use the pre-built `impi_rt` and `impi-devel` conda packages from conda-forge channel -* A DPC++ compiler (optional, needed for DPC++ interfaces) - * Note that this also requires a oneDAL build with DPC++ enabled. - -### Configure the Build with Environment Variables -* ``SKLEARNEX_VERSION``: sets the package version -* ``DALROOT``: sets the oneAPI Data Analytics Library path -* ``MPIROOT``: sets the path to the MPI library that will be used for distributed mode support. If this variable is not set but `I_MPI_ROOT` is found, will use `I_MPI_ROOT` instead. Not used when using `NO_DIST=1` -* ``NO_DIST``: set to '1', 'yes' or alike to build without support for distributed mode -* ``NO_STREAM``: set to '1', 'yes' or alike to build without support for streaming mode -* ``NO_DPC``: set to '1', 'yes' or alike to build without support of oneDAL DPC++ interfaces -* ``OFF_ONEDAL_IFACE``: set to '1' to build without the support of oneDAL interfaces -* ``MAKEFLAGS``: the last `-j` flag determines the number of threads for building the onedal extension. It will default to the number of CPU threads when not set. - -**Note:** in order to use distributed mode, `mpi4py` is also required, and needs to be built with the same MPI backend as scikit-learn-intelex. -**Note:** The `-j` flag in the ``MAKEFLAGS`` environment variable is superseded in `setup.py` modes which support the ``--parallel`` and `-j` command line flags. - - -### Build Extension for Scikit-learn - -- To install the package: - - ```bash - cd - python setup.py install - ``` - -- To install the package in the development mode: - - ```bash - cd - python setup.py develop - ``` - -- To install scikit-learn-intelex without checking for dependencies: - - ```bash - cd - python setup.py install --single-version-externally-managed --record=record.txt - ``` - ```bash - cd - python setup.py develop --no-deps - ``` - -Where: - -* Keys `--single-version-externally-managed` and `--no-deps` are required to not download daal4py after the installation of Extension for Scikit-learn. -* The `develop` mode does not install the package but creates a `.egg-link` in the deployment directory -back to the project source-code directory. That way, you can edit the source code and see the changes -without reinstalling the package after a small change. -* `--single-version-externally-managed` is an option for Python packages instructing the setuptools module to create a package that the host's package manager can easily manage. - -- To build the python module without installing it: - - ```bash - cd - python setup.py build_ext --inplace --force - python setup.py build - ``` - -**Note1:** the `daal4py` extension module which is built through `build_ext` does not use any kind of build caching for incremental compilation. For development purposes, one might want to use it together with `ccache`, for example by setting `export CXX="ccache icpx"`. - -**Note2:** the `setup.py` file will accept an optional argument `--abs-rpath` on linux (for all of `build`/`install`/`develop`/etc.) which will make it add the absolute path to oneDAL's shared objects (.so files) to the rpath of the scikit-learn-intelex extension's shared object files in order to load them automatically. This is not necessary when installing from pip or conda, but can be helpful for development purposes when using a from-source build of oneDAL that resides in a custom folder, as it won't assume that oneDAL's files will be found under default system paths. Example: - -```shell -python setup.py build_ext --inplace --force --abs-rpath -python setup.py build --abs-rpath -``` - -**Note:** when building `scikit-learn-intelex` from source with this option, it will use the oneDAL library with which it was compiled. oneDAL has dependencies on other libraries such as TBB, which is also distributed as a python package through `pip` and as a `conda` package. By default, a conda environment will first try to load TBB from its own packages if it is installed in the environment, which might cause issues if oneDAL was compiled with a system TBB instead of a conda one. In such cases, it is advised to either uninstall TBB from pip/conda (it will be loaded from the oneDAL library which links to it), or modify the order of search paths in environment variables like `${LD_LIBRARY_PATH}`. - -### Using LLD as linker - -By default, the setup script adds additional linkage arguments on Linux, such as strong stack protection. These are not supported by all linkers - in particular, they are not supported by LLVM's LLD linker. If using LLD as linker (for example, by setting environment variable `LDFLAGS="-fuse-ld=lld"`), then one must additionally pass argument `--using-lld` to the setup command. Example: - -```shell -CC=clang CXX=clang++ LDFLAGS="-fuse-ld=lld" python setup.py build_ext --inplace --force --abs-rpath --using-lld -CC=clang CXX=clang++ LDFLAGS="-fuse-ld=lld" python setup.py build --abs-rpath --using-lld -``` - -Note that passing argument `--using-lld` does not make the script use LLD as linker, only makes it avoid adding options that are not supported by it. - -### Debug Builds - -To build modules with debugging symbols and assertions enabled, pass argument `--debug` to the setup command - e.g.: - -```shell -python setup.py build_ext --inplace --force --abs-rpath --debug -python setup.py build --abs-rpath --debug -``` - -_**Note:** on Windows, this will only add debugging symbols for the `onedal` extension modules, but not for the `daal4py` extension module._ - -### Building with ASAN - -In order to use AddressSanitizer (ASan) together with `scikit-learn-intelex`, it's necessary to: -* Build both oneDAL and scikit-learn-intelex with ASan and with debug symbols (otherwise error traces will not be very informative). -* Preload the ASan runtime when executing the Python process that imports `scikit-learn-intelex`. -* Optionally, configure Python to use `malloc` as default allocator to reduce the number of false-positive leak reports. - -See the instructions on the oneDAL repository for building the library from source with ASAN enabled: -https://github.com/uxlfoundation/oneDAL/blob/main/INSTALL.md - -When building `scikit-learn-intelex`, the system's default compiler is used unless specified otherwise through variables such as `$CXX`. In order to avoid issues with incompatible runtimes of ASan, one might want to change the compiler to ICX if oneDAL was built with ICX (the default for it). - -The compiler and flags to build with both ASan and debug symbols can be controlled through environment variables - **assuming a Linux system** (ASan on Windows has not been tested): -```shell -export CC="icx -fsanitize=address -g" -export CXX="icpx -fsanitize=address -g" -``` - -_Hint: the Cython module `daal4py` that gets built through `build_ext` does not do incremental compilation, so one might want to add `ccache` into the compiler call for development purposes - e.g. `CXX="ccache icx -fsanitize=address -g"`._ - -The ASan runtime used by ICX is the same as the one by Clang. It's possible to preload the ASan runtime for GNU if that's the system's default through e.g. `LD_PRELOAD=libasan.so` or similar. However, one might need to specifically pass the paths from Clang to get the same ASan runtime as for oneDAL if that is not the system's default compiler: -```shell -export LD_PRELOAD="$(clang -print-file-name=libclang_rt.asan-x86_64.so)" -``` - -_Note: this requires both `clang` and its runtime libraries to be installed. If using toolkits from `conda-forge`, then using `libclang_rt` requires installing package `compiler-rt`, in addition to `clang` and `clangxx`._ - -Then, the Python memory allocator can be set to `malloc` like this: -```shell -export PYTHONMALLOC=malloc -``` - -Putting it all together, the earlier examples building the library in-place and executing a python file with it become as follows: -```shell -source -CC="ccache icx -fsanitize=address -g" CXX="ccache icpx -fsanitize=address -g" python setup.py build_ext --inplace --force --abs-rpath -CC="icx -fsanitize=address -g" CXX="icpx -fsanitize=address -g" python setup.py build --abs-rpath -LD_PRELOAD="$(clang -print-file-name=libclang_rt.asan-x86_64.so)" PYTHONMALLOC=malloc PYTHONPATH=$(pwd) python -``` - -_Be aware that ASan is known to generate many false-positive reports of memory leaks when used with oneDAL, NumPy, and SciPy._ - -### Building with other sanitizers - -UBSan can be used in a similar way as ASan in scikit-learn-intelex when oneDAL is built with this sanitizer, by using `-fsanitize=undefined` instead, but getting Python to load the required runtime might require using LLD as linker when compiling scikit-learn (see argument `--using-lld` for more details), and might require loading a different compiler runtime, such as `libclang_rt.ubsan_standalone-x86_64.so`. - -Other sanitizers such as MSan which provide only static-link files with no runtimes are not possible to use with scikit-learn-intelex, unless Python itself is also compiled with them. - -## Build from Sources with `conda-build` - -Extension for Scikit-learn* is easily built from the sources using only one command and `conda-build` utility. - -### Prerequisites for `conda-build` - -* any `conda` distribution (`miniforge` is recommended) -* `conda-build` and `conda-verify` installed in a conda environment -* (Windows only) Microsoft Visual Studio* -* (optional) Intel(R) oneAPI DPC++/C++ Compiler - -`conda-build` config requires **2022** version of Microsoft Visual Studio* by default, you can specify another version in `conda-recipe/conda_build_config.yaml` if needed. - -In order to enable DPC++ interfaces support on Windows, you need to set `DPCPPROOT` environment variable pointing to DPC++/C++ Compiler distribution. -Conda-forge distribution of DPC++ compiler is used by default on Linux, but you still can set your own distribution via `DPCPPROOT` variable. - -### Build Extension for Scikit-learn with `conda-build` - -Create and verify `scikit-learn-intelex` conda package with next command executed from root of sklearnex repo: - -```bash -conda build . -``` - -## Next Steps - -- [Learn what patching is and how to patch scikit-learn](https://uxlfoundation.github.io/scikit-learn-intelex/latest/what-is-patching.html) -- [Start using scikit-learn-intelex](https://uxlfoundation.github.io/scikit-learn-intelex/latest/quick-start.html) diff --git a/README.md b/README.md index 63b691c2d9..07cc419539 100755 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@

Speed up your scikit-learn applications for CPUs and GPUs across single- and multi-node configurations -[Releases](https://github.com/uxlfoundation/scikit-learn-intelex/releases)   |   [Documentation](https://uxlfoundation.github.io/scikit-learn-intelex/)   |   [Examples](https://github.com/uxlfoundation/scikit-learn-intelex/tree/master/examples/notebooks)   |   [Support](SUPPORT.md)   |  [License](https://github.com/uxlfoundation/scikit-learn-intelex/blob/master/LICENSE)    +[Releases](https://github.com/uxlfoundation/scikit-learn-intelex/releases)   |   [Documentation](https://uxlfoundation.github.io/scikit-learn-intelex/)   |   [Examples](https://uxlfoundation.github.io/scikit-learn-intelex/latest/samples.html)   |   [Support](https://uxlfoundation.github.io/scikit-learn-intelex/development/support.html)   |  [License](https://github.com/uxlfoundation/scikit-learn-intelex/blob/master/LICENSE)    [![Build Status](https://dev.azure.com/daal/daal4py/_apis/build/status/CI?branchName=main)](https://dev.azure.com/daal/daal4py/_build/latest?definitionId=9&branchName=main) @@ -131,7 +131,7 @@ To install Extension for Scikit-learn, run: pip install scikit-learn-intelex ``` -Package is also offered through other channels such as conda-forge. See all installation instructions in the [Installation Guide](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md). +Package is also offered through other channels such as conda-forge. See all installation instructions in the [Installation Guide](https://uxlfoundation.github.io/scikit-learn-intelex/latest/quick-start.html#installation). ## Integration @@ -170,7 +170,7 @@ from sklearn.cluster import DBSCAN as stockDBSCAN * [Documentation and Tutorials](https://uxlfoundation.github.io/scikit-learn-intelex/latest/index.html) * [Release Notes](https://github.com/uxlfoundation/scikit-learn-intelex/releases) * [Medium Blogs](https://uxlfoundation.github.io/scikit-learn-intelex/latest/blogs.html) -* [Code of Conduct](https://github.com/uxlfoundation/scikit-learn-intelex/blob/master/CODE_OF_CONDUCT.md) +* [Code of Conduct](https://uxlfoundation.github.io/scikit-learn-intelex/development/code-of-conduct.html) ### Extension and oneDAL @@ -186,7 +186,7 @@ Acceleration in patched scikit-learn classes is achieved by replacing calls to s ## How to Contribute -We welcome community contributions, check our [Contributing Guidelines](https://github.com/uxlfoundation/scikit-learn-intelex/blob/master/CONTRIBUTING.md) to learn more. +We welcome community contributions, check our [Contributing Guidelines](https://uxlfoundation.github.io/scikit-learn-intelex/development/contribute.html) to learn more. ------------------------------------------------------------------------ \* The Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. diff --git a/SUPPORT.md b/SUPPORT.md deleted file mode 100644 index 553c55c6d8..0000000000 --- a/SUPPORT.md +++ /dev/null @@ -1,42 +0,0 @@ - - -# Extension for Scikit-learn Support - -We are committed to providing support and assistance to help you make the most out of Extension for Scikit-learn. -Use the following methods if you face any challenges. - - -## Issues - -If you have a problem, check out the [GitHub Issues](https://github.com/uxlfoundation/scikit-learn-intelex/issues) to see if the issue you want to address is already reported. -You may find users that have encountered the same bug or have similar ideas for changes or updates. - -You can use issues to report a problem, make a feature request, or add comments on an existing issue. - -## Discussions - -Visit the [GitHub Discussions](https://github.com/uxlfoundation/scikit-learn-intelex/discussions) to engage with the community, ask questions, or help others. - -## Forum - -Ask questions about Extension for Scikit-learn on our [Forum](https://community.intel.com/t5/Intel-Distribution-for-Python/bd-p/distribution-python). -Make sure to provide all relevant details, so we could help you as soon as possible. - -## Email - -Reach out to us privately via [email](mailto:onedal.maintainers@intel.com). diff --git a/conda-recipe/run_test.sh b/conda-recipe/run_test.sh index 61e96fb018..f94d233137 100755 --- a/conda-recipe/run_test.sh +++ b/conda-recipe/run_test.sh @@ -26,7 +26,7 @@ while [[ count -ne 0 ]]; do done if [[ count -eq 0 ]]; then - echo "run_test.sh did not find the required testing directories" + echo "run_test.bat did not find the required testing directories" exit 1 fi diff --git a/daal4py/README.md b/daal4py/README.md index 5e16d11099..75a40b722e 100755 --- a/daal4py/README.md +++ b/daal4py/README.md @@ -14,53 +14,4 @@ ~ limitations under the License. --> -# daal4py - A Convenient Python API to the oneAPI Data Analytics Library -[![Build Status](https://dev.azure.com/daal/daal4py/_apis/build/status/CI?branchName=main)](https://dev.azure.com/daal/daal4py/_build/latest?definitionId=9&branchName=main) -[![Coverity Scan Build Status](https://scan.coverity.com/projects/21716/badge.svg)](https://scan.coverity.com/projects/daal4py) -[![Join the community on GitHub Discussions](https://badgen.net/badge/join%20the%20discussion/on%20github/black?icon=github)](https://github.com/uxlfoundation/scikit-learn-intelex/discussions) -[![PyPI Version](https://img.shields.io/pypi/v/daal4py)](https://pypi.org/project/daal4py/) -[![Conda Version](https://img.shields.io/conda/vn/conda-forge/daal4py)](https://anaconda.org/conda-forge/daal4py) - -**IMPORTANT NOTICE**: `daal4py` has been merged into `scikit-learn-intelex`. As of version 2025.0, it is distributed as an additional importable module within the package `scikit-learn-intelex` instead of being a separate package. The last standalone release of `daal4py` was version 2024.7, and this standalone package will not receive further updates. - -A simplified API to oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to oneAPI Data Analytics Library for either direct usage or integration into one's own framework. - -Note: For the most part, `daal4py` is used as an internal backend within the Scikit-Learn extension, and it is highly recommended to use `sklearnex` instead. Nevertheless, some functionalities from `daal4py` can still be of use, and the module can still be imported directly (`import daal4py`) after installing `scikit-learn-intelex`. - -## ๐Ÿ‘€ Follow us on Medium - -We publish blogs on Medium, so [follow us](https://medium.com/intel-analytics-software/tagged/machine-learning) to learn tips and tricks for more efficient data analysis the help of daal4py. Here are our latest blogs: - -- [Intel Gives Scikit-Learn the Performance Boost Data Scientists Need](https://medium.com/intel-analytics-software/intel-gives-scikit-learn-the-performance-boost-data-scientists-need-42eb47c80b18) -- [From Hours to Minutes: 600x Faster SVM](https://medium.com/intel-analytics-software/from-hours-to-minutes-600x-faster-svm-647f904c31ae) -- [Improve the Performance of XGBoost and LightGBM Inference](https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e) -- [Accelerate Kaggle Challenges Using Intel AI Analytics Toolkit](https://medium.com/intel-analytics-software/accelerate-kaggle-challenges-using-intel-ai-analytics-toolkit-beb148f66d5a) -- [Accelerate Your scikit-learn Applications](https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e) -- [Accelerate Linear Models for Machine Learning](https://medium.com/intel-analytics-software/accelerating-linear-models-for-machine-learning-5a75ff50a0fe) -- [Accelerate K-Means Clustering](https://medium.com/intel-analytics-software/accelerate-k-means-clustering-6385088788a1) - -## ๐Ÿ”— Important links -- [Documentation](https://uxlfoundation.github.io/scikit-learn-intelex/latest/about_daal4py.html) -- [Building from Sources](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/daal4py/INSTALL.md) -- [About oneAPI Data Analytics Library](https://github.com/uxlfoundation/oneDAL) - -## ๐Ÿ’ฌ Support - -Report issues, ask questions, and provide suggestions using: - -- [GitHub Issues](https://github.com/uxlfoundation/scikit-learn-intelex/issues) -- [GitHub Discussions](https://github.com/uxlfoundation/scikit-learn-intelex/discussions) -- [Forum](https://community.intel.com/t5/Intel-Distribution-for-Python/bd-p/distribution-python) - -You may reach out to project maintainers privately at onedal.maintainers@intel.com - -# ๐Ÿ›  Installation - -Daal4Py is distributed as part of scikit-learn-intelex, which itself is distributed under different channels. - -See the [installation instructions for scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md) for details. - - -# โš ๏ธ Scikit-learn patching - -Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package - [Extension for Scikit-learn*](https://github.com/uxlfoundation/scikit-learn-intelex). All future updates for the patching will be available in Extension for Scikit-learn only. Please use the package instead of daal4py for the Scikit-learn acceleration. +Doc page moved to: https://uxlfoundation.github.io/scikit-learn-intelex/latest/about_daal4py.html diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst index 77d4c168ec..15121deba9 100755 --- a/doc/sources/algorithms.rst +++ b/doc/sources/algorithms.rst @@ -653,8 +653,4 @@ Scikit-learn Tests ------------------ Monkey-patched scikit-learn classes and functions passes scikit-learn's own test -suite, with few exceptions, specified in `deselected_tests.yaml -`__. - -See the file `scikit-learn-tests.md `__ -for instructions about how to execute the scikit-learn test suite under patching. +suite, with few exceptions - see :ref:`conformance_tests` for details. diff --git a/doc/sources/building-from-source.rst b/doc/sources/building-from-source.rst new file mode 100644 index 0000000000..9eda3d6153 --- /dev/null +++ b/doc/sources/building-from-source.rst @@ -0,0 +1,360 @@ +.. Copyright contributors to the oneDAL project +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. include:: substitutions.rst + +==================== +Building from Source +==================== + +Components +---------- + +The |sklearnex| as a library works mostly as a frontend to the |onedal| by leveraging it as a backend for |sklearn| calls. In order to build the |sklearnex|, it's necessary to have a version of the |onedal| as a shared library already built somewhere along with its headers - for example, by using the Python packages ``dal`` + ``dal-devel`` (conda) / ``daal`` + ``daal-devel`` (PyPI), or the system-wide `offline installer `__, or by `building oneDAL from source `__. + +.. note:: Python packages ``dal`` (conda) and ``daal`` (PyPI) provide the same components, but due to naming availability in these repositories, they are distributed under different names. + +As a library, the |sklearnex| consists of a Python codebase with Python extension modules written in C++ and Cython, with some of those modules being optional. These extension modules require compilation before being used, for which a C++ compiler along other dependencies are required. In the case of GPU-related modules, a SYCL compiler (such as `Intel's DPC++ `__) is required, and in the case of distributed mode, whether on CPU or on GPU, an `MPI `__ backend is required, such as `Intel's MPI `__. + +The extension modules are as follows: + +- ``daal4py``: the source code for this module is auto-generated from the headers of the |onedal| as a Cython file through the code under the folder `generator `__, along with other C++ source files. This module is mandatory. It provides the necessary bindings for the DAAL interface - see :doc:`about_daal4py` for details. It will contain also the necessary MPI bindings for distributed computations on CPU if building with distributed mode (see :doc:`distributed_daal4py` for details), and the necessary bindings for streaming mode if that functionality is built. +- ``_onedal_py_host``: this module provides PyBind11-generated bindings over the oneAPI interface of the |onedal| for CPU (host). This module is mandatory. +- ``_onedal_py_dpc``: this module provides PyBind11-generated bindings over the oneAPI interface of the |onedal| for GPU (DPC++). This module is optional, and requires a SYCL compiler. If the oneDAL backend is compiled from source, it must also have been built with its DPC++ component in order to build this module. See :doc:`oneapi-gpu` for more information. +- ``_onedal_py_spmd`` (Linux*-only): this module provides PyBind11-generated bindings over SPMD implementations (distributed mode on GPU) using the oneAPI interface of the |onedal| - see :doc:`distributed-mode` for details. This module is optional, and requires both a SYCL compiler and an MPI backend, along with its headers. It requires the ``_onedal_py_dpc`` module to also be built. + +**Note that all of the optional components are built by default** (see rest of this page for how to enable or disable specific components). + +Build Requirements +------------------ + +In order to build the library from source, a file `dependencies-dev `__ with locked versions of mandatory dependencies is created for usage in CI jobs. Note however that this file does not contain the necessary dependencies for distributed mode, nor does it contain compiler-related dependencies, and it is not strongly necessary to install the exact same versions as in that file for local development purposes. + +Python dependencies +~~~~~~~~~~~~~~~~~~~ + +To install the necessary Python dependencies: + +- Using ``conda``: + +.. code-block:: bash + + conda install -c conda-forge numpy cython jinja2 pybind11 "setuptools<=79" + +- Using ``pip``: + +.. code-block:: bash + + pip install numpy cython jinja2 pybind11 "setuptools<=79" + +.. hint:: Using the compiled library after building it has a different set of requirements, such as the |sklearn| package along with its dependencies. Executing the tests also adds additional dependencies such as ``pytest``. + +Non-Python dependencies +~~~~~~~~~~~~~~~~~~~~~~~ + +Apart from Python libraries and from the |onedal|, the following dependencies are needed in order to compile the |sklearnex|: + +- A C++ compiler. +- clang-format. +- CMake. +- A DPC++ compiler (required for GPU components). +- An MPI backend and its headers (required for distributed components). + +The easiest way to install the necessary dependencies that are not Python libraries is with conda. + +- On Linux*: + +.. code-block:: bash + + conda install -c conda-forge \ + cmake clang-format cxx-compiler `# mandatory dependencies` \ + dpcpp-cpp-rt dpcpp_linux-64 `# required for GPU mode` \ + impi-devel impi_rt `# required for distributed mode` + +- On Windows*: + +.. code-block:: bash + + conda install -c conda-forge ^ + cmake clang-format cxx-compiler ^ + dpcpp-cpp-rt dpcpp_win-64 ^ + impi-devel impi_rt + +Some of these dependencies can also be installed from PyPI: + +.. code-block:: bash + + pip install clang-format impi-devel impi_rt + +Note however that, if installing Intel's MPI from PyPI instead of from conda, it will be necessary to manually set the environment variable ``$MPIROOT``, while the conda distribution of Intel's MPI comes with an activation script that sets up this variable. + +Instructions +------------ + +Setting environment variables +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Before compiling the |sklearnex|, it's necessary to set up some environment variables to point to the installation paths of dependencies. + +OneDAL +****** + +An environment variable ``$DALROOT`` must be set to the path containing the |onedal| library, such that the shared objects (``.so`` / ``.dll``) will be findable under the path ``$DALROOT/lib``. This environment variable can be set in different ways: + + - If using an offline installer for the |onedal|, this variable will be set automatically when sourcing the general activation script for oneAPI products, which can be done as follows, assuming a Linux* system: + + .. code-block:: bash + + source /opt/intel/oneapi/setvars.sh + + - If building the |onedal| from source, it will be set automatically when sourcing the generated environment activation script - see the `instructions on the oneDAL repository `__ for more details. + + - Otherwise, the variable can be set manually. For example, if installing oneDAL through ``conda``, assuming a Linux* system: + + .. code-block:: bash + + export DALROOT="$CONDA_PREFIX" + +.. important:: If the |onedal| is not under a default system path, in order to be able to load it after compiling the |sklearnex|, its path must be added to an environment variable such as ``$LD_LIBRARY_PATH``, or the |sklearnex| must be built with argument ``--abs-rpath`` (see rest of this document for details). + +MPI +*** + +If building with distributed mode, an environment variable ``$MPIROOT`` must be set to the path containing the MPI library, such that the shared objects (such as ``libmpi.so``) will be findable under ``$MPIROOT/lib`` and the headers under ``$MPIROOT/include``. Alternatively, environment variable ``$I_MPI_ROOT``, which is used by Intel's MPI, will be used if it is defined while ``$MPIROOT`` isn't. If using Intel's MPI, this variable can be set in different ways: + +- If installing IMPI (Intel's MPI) from conda, the variable will be set automatically upon activation of the conda environment. +- If using an offline installer for IMPI, this variable will be set automatically when sourcing the general activation script for oneAPI products, which can be done as follows, assuming a Linux* system: + + .. code-block:: bash + + source /opt/intel/oneapi/setvars.sh + +- Otherwise, the variable can be set manually. For example, if installing some MPI other than IMPI through ``conda``, assuming a Linux* system: + + .. code-block:: bash + + export MPIROOT="$CONDA_PREFIX" + +.. _build_script: + +Build using ``setup.py`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +With all of the necessary requirements and environment variables already set up, the library can be installed from source as follows: + +.. code-block:: bash + + python setup.py install + +.. hint:: See the rest of this document for build-time options, such as disabling distributed mode or disabling GPU mode. + +To install it in development mode: + +.. code-block:: bash + + python setup.py develop + +To build the extensions in-place without installing (recommended for local development): + +.. code-block:: bash + + python setup.py build_ext --inplace --force # builds daal4py + python setup.py build # builds onedal extension modules + +.. hint:: If building the library in-place without installing, it's then necessary to set environment variable ``$PYTHONPATH`` to point to the root of the repository in order to ble able to import the modules in Python. + +Build using conda +~~~~~~~~~~~~~~~~~ + +The |sklearnex| can also be easily built from source with a single command using ``conda-build``. + +Requirements +************ + +The following are required in order to use ``conda-build``: + +- Any ``conda`` distribution (`Miniforge `__ is recommended). +- ``conda-build`` and ``conda-verify`` packages installed in a conda environment: + + .. code-block:: bash + + conda install -c conda-forge conda-build conda-verify + +- On Windows*, an **external** installation of the MSVC compiler **version 2022** is required by default. Other versions can be specified in `conda-recipe/conda_build_config.yaml `__ if needed. +- Optionally, for DPC++ (GPU) support on Windows*, environment variable ``%DPCPPROOT%`` must be set to point to the DPC++ compiler path. + +Instructions +************ + +To create and verify the conda package for this library, execute the following command from the root of the repository: + +.. code-block:: bash + + conda build . + +Build-time Options +------------------ + +The setup script accepts many configurable options, some controllable through environment variables and others controllable through command line arguments. For example: + +.. code-block:: bash + + NO_DIST=1 python setup.py build_ext --inplace --force --abs-rpath + +Additionally, the tools used by the build backend can also be passed custom configurations through environment variables such as ``$CXX``, ``$CXXFLAGS``, ``$LDFLAGS``, etc. For example: + +.. code-block:: bash + + NO_DIST=1 LDFLAGS="-fuse-ld=lld" python setup.py build --using-lld + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +The following environment variables can be used to control setup aspects: + +- ``SKLEARNEX_VERSION``: sets the package version. +- ``DALROOT``: sets the |onedal| path. +- ``MPIROOT``: sets the path to the MPI library. If this variable is not set but ``I_MPI_ROOT`` is found, will use ``I_MPI_ROOT`` instead. Not used when using ``NO_DIST=1``. +- ``NO_DIST``: set to '1', 'yes' or alike to build without support for distributed mode. +- ``NO_STREAM``: set to '1', 'yes' or alike to build without support for streaming mode. +- ``NO_DPC``: set to '1', 'yes' or alike to build without support of oneDAL DPC++ interfaces. +- ``OFF_ONEDAL_IFACE``: set to '1' to build without the support of oneDAL interfaces. +- ``MAKEFLAGS``: the last `-j` flag determines the number of threads for building the onedal extension. It will default to the number of CPU threads when not set. + +.. note:: The ``-j`` flag in the ``MAKEFLAGS`` environment variable is superseded in ``setup.py`` modes which support the ``--parallel`` and ``-j`` command line flags. + +Command line arguments +~~~~~~~~~~~~~~~~~~~~~~ + +The following additional arguments are accepted in calls to the ``setup.py`` script: + +- ``--abs-rpath`` (Linux*-only): will make it add the absolute path to the |onedal| shared objects (``.so`` files) to the rpath of the |sklearnex| shared object files in order to load them automatically. This is not necessary when installing through ``pip`` or ``conda``, but can be helpful for development purposes when using a from-source build of the |onedal| that resides in a custom folder, as it won't assume that its files will be found under default system paths. +- ``--debug``: builds modules with debugging symbols and assertions enabled. Note that on Windows*, this will only add debugging symbols for the ``_onedal_py`` extension modules, but not for the ``daal4py`` extension module. +- ``--using-lld`` (Linux*-only): makes the setup script avoid passing arguments that are not supported by LLVM's LLD linker, such as strong stack protection. This flag is required when building with the LLD linker (which can be achieved by setting environment variable ``$LDFLAGS="-fuse-ld=lld"``), but note that it **does not make the build script use LLD**, only avoids adding arguments that it doesn't support. + +Apart from these, standard arguments recognized by the build libraries can also be passed in the same call - for example, to install without checking for dependencies: + +.. code-block:: bash + + python setup.py install --single-version-externally-managed --record=record.txt + python setup.py develop --no-deps + + +Tips +---- + +Incremental Compilation +~~~~~~~~~~~~~~~~~~~~~~~ + +The compiled modules are a mixture of Cython and PyBind11. Compilation of the PyBind11 modules is managed through CMake, which offers incremental compilation and parallel compilation, but compilation of the Cython module ``daal4py`` is managed through ``setuptools``, which lacks this feature, and in addition, is compiled under a single thread due to consisting of a single large file. Thus, by default, a call to ``python setup.py build`` can take a long time to finish, with most of that time spent in the single-threaded ``daal4py`` compilation. + +For local development, in order to speed up setup, one can instead use ``ccache`` in order to avoid recompiling ``daal4py`` modules throughout multiple calls to ``setup.py``. While the build script doesn't have any explicit option for ``ccache``, it can be configured to use it by setting the compiler to something that would execute under it. Example: + +.. code-block:: bash + + CC="ccache icx" CXX="ccache icpx" python setup.py build_ext --inplace --force + CC="ccache icx" CXX="ccache icpx" python setup.py build + +Leave components out +~~~~~~~~~~~~~~~~~~~~ + +When it comes to local development, in many cases the features being developed do not involve an SPMD or GPU component. In such cases, it's faster to compile without those options, and it's likewise usually faster to use the LLD linker and lower the optimization level for the library: + +.. code-block:: bash + + NO_DPC=1 NO_DIST=1 CC="ccache icx -O0" CXX="ccache icpx -O0" LDFLAGS="-fuse-ld=lld" \ + python setup.py build_ext --inplace --force --abs-rpath --using-lld + NO_DPC=1 NO_DIST=1 CC="ccache icx -O0" CXX="ccache icpx -O0" LDFLAGS="-fuse-ld=lld" \ + python setup.py build --abs-rpath --using-lld + +Cleaning the build folder +~~~~~~~~~~~~~~~~~~~~~~~~~ + +When building from source, temporary artifacts are created under a ``/build`` folder. Since some modules use CMake, which is designed for incremental compilation, it will leave pre-compiled objects that it will try to reuse if further builds are executed without modifying the same input files. + +However, note that CMake's logic does not consider compatibility of these leftover objects, so for example, if one first compiles the library with a given Python version, and then tries to compile it from the same folder using a different Python version, the leftover artifacts will be incompatible, but CMake will still try to reuse them and fail in the process, with a non-informative error message. Same issue might happen for example if some modules are enabled or disabled across different calls to the ``setup.py`` script. + +If experiencing issues during compilation, try removing the existing ``/build`` folder to see if it solve the issues: + +.. code-block:: bash + + rm -Rf build + +TBB runtimes +~~~~~~~~~~~~ + +When building with the ``--abs-rpath`` option, it will use the |onedal| library version with which it was compiled. |onedal| has dependencies on other libraries such as `TBB `__, which is also distributed as a python package through ``pip`` and as a ``conda`` package. + +By default, a conda environment will first try to load TBB from its own packages if it is installed in the environment, which might cause issues if the |onedal| was compiled with a system TBB instead of a conda one. + +In such cases, it is advised to either uninstall TBB from ``pip``/``conda`` (it will be loaded from the |onedal| library which links to it), or modify the order of search paths in environment variables like ``$LD_LIBRARY_PATH`` to prefer the one with which the |onedal| was compiled instead of the one from ``conda``. + +Building with sanitizers +------------------------ + +Building with ASan +~~~~~~~~~~~~~~~~~~ + +In order to use AddressSanitizer (ASan) together with the |sklearnex|, it's necessary to: + +- Build both the |onedal| and the |sklearnex| with ASan and with debugging symbols (otherwise error traces will not be very informative). +- Preload the ASan runtime when executing the Python process that imports ``sklearnex`` or ``daal4py``. +- Optionally, configure Python to use ``malloc`` as default allocator to reduce the number of false-positive leak reports. + +See the `instructions on the oneDAL repository `__ for building the library from source with ASAN enabled. + +When building this library, the system's default compiler is used unless specified otherwise through variables such as ``$CXX``. In order to avoid issues with incompatible runtimes of ASan, one might want to change the compiler to ICX if the |onedal| was built with ICX (the default for it). + +The compiler and flags to build with both ASan and debug symbols can be controlled through environment variables - **assuming a Linux\* system** (ASan on Windows* has not been tested): + +.. code-block:: bash + + export CC="icx -fsanitize=address -g" + export CXX="icpx -fsanitize=address -g" + +.. hint:: The Cython module ``daal4py`` that gets built through ``build_ext`` does not do incremental compilation, so one might want to add ``ccache`` into the compiler call for development purposes - e.g. ``CXX="ccache icx -fsanitize=address -g"``. + +The ASan runtime used by ICX is the same as the one by Clang. It's possible to preload the ASan runtime for GNU if that's the system's default through e.g. ``$LD_PRELOAD=libasan.so`` or similar. However, one might need to specifically pass the paths from Clang to get the same ASan runtime as for oneDAL if that is not the system's default compiler: + +.. code-block:: bash + + export LD_PRELOAD="$(clang -print-file-name=libclang_rt.asan-x86_64.so)" + +.. note:: This requires both ``clang`` and its runtime libraries to be installed. If using toolkits from ``conda-forge``, then using ``libclang_rt`` requires installing package ``compiler-rt``, in addition to ``clang`` and ``clangxx``. + +Then, the Python memory allocator can be set to ``malloc`` like this: + +.. code-block:: bash + + export PYTHONMALLOC=malloc + + +Putting it all together, the earlier examples building the library in-place and executing a python file with it become as follows: + +.. code-block:: bash + + source + CC="ccache icx -fsanitize=address -g" CXX="ccache icpx -fsanitize=address -g" \ + python setup.py build_ext --inplace --force --abs-rpath + CC="icx -fsanitize=address -g" CXX="icpx -fsanitize=address -g" \ + python setup.py build --abs-rpath + LD_PRELOAD="$(clang -print-file-name=libclang_rt.asan-x86_64.so)" \ + PYTHONMALLOC=malloc PYTHONPATH=$(pwd) \ + python + +.. note:: Be aware that ASan is known to generate many false-positive reports of memory leaks when used with the |onedal|, NumPy, and SciPy. + +Building with other sanitizers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +UBSan can be used in a similar way as ASan in this library when the |onedal| is built with this sanitizer, by using ``-fsanitize=undefined`` instead, but getting Python to load the required runtime might require using LLD as linker when compiling this library (see argument ``--using-lld`` for more details), and might require loading a different compiler runtime, such as ``libclang_rt.ubsan_standalone-x86_64.so``. diff --git a/CODE_OF_CONDUCT.md b/doc/sources/code-of-conduct.rst similarity index 60% rename from CODE_OF_CONDUCT.md rename to doc/sources/code-of-conduct.rst index 1ad20877e1..b261d2eeda 100644 --- a/CODE_OF_CONDUCT.md +++ b/doc/sources/code-of-conduct.rst @@ -1,22 +1,27 @@ - - -# Contributor Covenant Code of Conduct - -## Our Pledge +.. Copyright contributors to the oneDAL project +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. include:: substitutions.rst + +=============== +Code of Conduct +=============== + +Contributor Covenant Code of Conduct +------------------------------------ + +Our Pledge +~~~~~~~~~~ In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and @@ -25,29 +30,31 @@ size, disability, ethnicity, sex characteristics, gender identity and expression level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. -## Our Standards +Our Standards +~~~~~~~~~~~~~ Examples of behavior that contributes to creating a positive environment include: -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members +- Using welcoming and inclusive language +- Being respectful of differing viewpoints and experiences +- Gracefully accepting constructive criticism +- Focusing on what is best for the community +- Showing empathy towards other community members Examples of unacceptable behavior by participants include: -* The use of sexualized language or imagery and unwelcome sexual attention or - advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic - address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a - professional setting +- The use of sexualized language or imagery and unwelcome sexual attention or + advances +- Trolling, insulting/derogatory comments, and personal or political attacks +- Public or private harassment +- Publishing others' private information, such as a physical or electronic + address, without explicit permission +- Other conduct which could reasonably be considered inappropriate in a + professional setting -## Our Responsibilities +Our Responsibilities +~~~~~~~~~~~~~~~~~~~~ Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in @@ -59,7 +66,8 @@ that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. -## Scope +Scope +~~~~~ This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of @@ -68,7 +76,8 @@ address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. -## Enforcement +Enforcement +~~~~~~~~~~~ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at onedal.maintainers@intel.com. All @@ -81,12 +90,11 @@ Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. -## Attribution +Attribution +~~~~~~~~~~~ -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +This Code of Conduct is adapted from the `Contributor Covenant homepage `__, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html -[homepage]: https://www.contributor-covenant.org - For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq diff --git a/doc/sources/distributed-mode.rst b/doc/sources/distributed-mode.rst index 73e55839c9..a4d2226048 100644 --- a/doc/sources/distributed-mode.rst +++ b/doc/sources/distributed-mode.rst @@ -27,8 +27,7 @@ also provide distributed, multi-GPU computing capabilities via integration with match those of GPU computing, along with an MPI backend of your choice (`Intel MPI recommended `_, available via the ``impi_rt`` python/conda package) and the |mpi4py| python package. If using |sklearnex| -`installed from sources `_, -ensure that the spmd_backend is built. +:doc:`installed from sources `, ensure that the SPMD backend is built. .. important:: SMPD mode requires the |mpi4py| package used at runtime to be compiled with the same MPI backend as the |sklearnex|, or with an ABI-compatible MPI backend. The PyPI and Conda distributions of |sklearnex| are both built with Intel's MPI as backend, which follows the MPICH ABI and hence require an |mpi4py| also built with either Intel's MPI, or with another MPICH-compatible MPI backend (such as MPICH itself) - versions of |mpi4py| built with Intel's MPI can be installed as follows: diff --git a/doc/sources/distributed_daal4py.rst b/doc/sources/distributed_daal4py.rst index 362fdd7fd9..289ff355d7 100644 --- a/doc/sources/distributed_daal4py.rst +++ b/doc/sources/distributed_daal4py.rst @@ -49,8 +49,8 @@ same algorithms to much larger problem sizes. conda install -c conda-forge impi_rt mpi=*=impi - Using distributed mode with non-MPICH-compatible backends such as OpenMPI requires compiling the - library from source with that backend. + Using distributed mode with non-MPICH-compatible backends such as OpenMPI requires + :doc:`compiling the library from source ` with that backend. See the docs for :ref:`SPMD mode ` for more details. diff --git a/doc/sources/ideas.rst b/doc/sources/ideas.rst index ffcab7f2be..cb3fa18c0c 100644 --- a/doc/sources/ideas.rst +++ b/doc/sources/ideas.rst @@ -12,9 +12,9 @@ .. See the License for the specific language governing permissions and .. limitations under the License. -##### -Ideas -##### +####################### +Ideas for Contributions +####################### As an open-source project, we welcome community contributions to Extension for Scikit-learn. This document suggests contribution directions which we consider good introductory projects with meaningful diff --git a/doc/sources/index.rst b/doc/sources/index.rst index 2b405c8d10..978846581c 100755 --- a/doc/sources/index.rst +++ b/doc/sources/index.rst @@ -153,7 +153,7 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU. .. toctree:: - :caption: Get Started + :caption: Getting Started :hidden: :maxdepth: 3 @@ -162,9 +162,9 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU. kaggle.rst .. toctree:: - :caption: Developer Guide + :caption: Documentation topics :hidden: - :maxdepth: 2 + :maxdepth: 4 algorithms.rst oneapi-gpu.rst @@ -187,6 +187,16 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU. about_daal4py.rst daal4py.rst +.. toctree:: + :caption: Development guides + :hidden: + + building-from-source.rst + tests.rst + contribute.rst + topics-for-contributors.rst + ideas.rst + .. toctree:: :caption: Performance :hidden: @@ -207,7 +217,6 @@ See :ref:`oneapi_gpu` for other ways of executing on GPU. :hidden: :maxdepth: 2 - Support - contribute.rst - ideas.rst + support.rst + code-of-conduct.rst license.rst diff --git a/doc/sources/quick-start.rst b/doc/sources/quick-start.rst index 8e7c94959c..86db3f510c 100644 --- a/doc/sources/quick-start.rst +++ b/doc/sources/quick-start.rst @@ -284,7 +284,7 @@ To prevent version conflicts, we recommend installing ``scikit-learn-intelex`` i Build from Sources ********************** -See `Installation instructions `_ to build |sklearnex| from the sources. +See :doc:`building-from-source` for details. Install Intel*(R) AI Tools **************************** diff --git a/doc/sources/support.rst b/doc/sources/support.rst index 447227d729..e75672dfff 100644 --- a/doc/sources/support.rst +++ b/doc/sources/support.rst @@ -12,21 +12,34 @@ .. See the License for the specific language governing permissions and .. limitations under the License. -###################################################### -Extension for Scikit-learn Support -###################################################### - - -We are committed to providing support and assistance to help you make the most out of Extension for Scikit-learn. +####### +Support +####### +We are committed to providing support and assistance to help you make the most out of Extension for Scikit-learn. Use the following methods if you face any challenges. Issues ----------------------------------- - -If you have a problem, check out the `GitHub Issues `_ to see if the issue you want to address is already reported. +------ +If you have a problem, check out the `GitHub Issues `__ to see if the issue you want to address is already reported. You may find users that have encountered the same bug or have similar ideas for changes or updates. You can use issues to report a problem, make a feature request, or add comments on an existing issue. + +Discussions +----------- + +Visit the `GitHub Discussions `__ to engage with the community, ask questions, or help others. + +Forum +----- + +Ask questions about Extension for Scikit-learn on our `Forum `__. +Make sure to provide all relevant details, so we could help you as soon as possible. + +Email +----- + +Reach out to us privately via: onedal.maintainers@intel.com. diff --git a/doc/sources/tests.rst b/doc/sources/tests.rst new file mode 100644 index 0000000000..0dfda343da --- /dev/null +++ b/doc/sources/tests.rst @@ -0,0 +1,221 @@ +.. Copyright contributors to the oneDAL project +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. include:: substitutions.rst + +============= +Running Tests +============= + +Overview +-------- + +The |sklearnex| contains a test suite consisting of a mixture of smoke tests around patching along with unit tests, which are written in a mixture of Python's ``unittest`` (for legacy interfaces) and ``pytest``. However, all of the tests are executed with ``pytest`` as runner. Apart from the tests, code examples are also executed, but are not thoroughly checked for correctness, just for executing without erroring out. + +Running test scripts +-------------------- + +Requirements +~~~~~~~~~~~~ + +As the library is designed with optional components and integrates with external packages that are optional by design, executing the tests involves additional dependencies, some of which are mandatory and some of which are optional. + +The mandatory dependencies for tests with locked versions of packages are listed under a file `requirements-test.txt `__, but it is not strictly necessary to have the exact versions listed there. Those locked requirements can be installed with ``pip`` as follows: + +.. code-block:: bash + + pip install -r requirements-test.txt + +Some tests will only execute depending on the availability of optional dependencies at runtime. Other optional dependencies that will trigger additional tests can be installed as follows, assuming a Linux* system: + +.. code-block:: bash + + pip install \ + dpctl `# for GPU functionalities` \ + dpnp `# for array API and GPU functionalities` + + pip install --index-url https://software.repos.intel.com/python/pypi \ + torch `# for array API` + + pip install --index-url https://software.repos.intel.com/python/pypi \ + mpi4py impi_rt `# for distributed mode, be sure to install from Intel's index` \ + && pip install pytest-mpi `# also required, but not from Intel's index` + +.. warning:: It might not be possible to install all of the test dependencies simultaneously in the same Python environment. In particular, dependencies ``torch`` and ``dpctl`` / ``dpnp`` are likely to not be installable in a compatible way in the same environment if using pre-built distributions. Try using different Python environments for each set of dependencies to test. + +.. warning:: If installing dependencies for distributed mode from ``pip``, be sure to install ``mpi4py`` from the Intel ``pip`` index to ensure that it uses a compatible MPI backend. See :doc:`distributed-mode` for details. + +Executing tests +~~~~~~~~~~~~~~~ + +In order to run the whole test suite, the following script can be used on Linux*: + +.. code-block:: bash + + conda-recipe/run_test.sh + +.. warning:: This script must be executed from the root of the repository. + +It also comes with an analog for Windows*: + +.. code-block:: console + + call conda-recipe/run_test.bat + +Individual test files or tests can be executed with PyTest under different options (such as a different verbosity mode, stopping at the first failure, etc.) with calls to pytest - for example: + +.. code-block:: bash + + pytest sklearnex/ensemble/tests/test_forest.py + +.. hint:: If executing these from the root of the repository, there might be naming clashes between the folders and the installed Python modules. It might be helpful to :ref:`build the library extensions in-place ` and set ``$PYTHONPATH`` to avoid problems. + +Configurable options +~~~~~~~~~~~~~~~~~~~~ + +The files ``run_test.sh`` and ``run_test.bat`` offer configurable behaviors through environment variables and command line arguments: + +- Environment variable ``$NO_DIST``, if set, will make it avoid running distributed mode tests. Note that executing these tests requires additional dependencies, otherwise they will be skipped either way. +- Environment variable ``$PYTHON`` can be used to set a Python interpreter under an MPI runner to execute distributed tests on Windows* - for example: ``set "PYTHON=mpiexec -n 2 python"``. **This variable is required for distributed mode tests on Windows\*** - if not set, ``NO_DIST`` will be automatically set to 1. + + - On Linux*, this same variable can be used to set the Python interpreter that will run the tests for patching functionality. +- Passing argument ``--json-report`` will generate JSON reports of each test component under path ``/.pytest_reports``. Note that, if the folder is not empty, existing files will be deleted. +- Environment variable ``$COVERAGE_RCFILE``, if set, will make it generate coverage reports under the path specified from this variable. + +Running distributed mode examples +--------------------------------- + +A helper script `tests/run_examples.py `__ is provided for executing the `code examples `__ for distributed mode on both GPU (see :doc:`distributed-mode`) and CPU (see :doc:`distributed_daal4py`). + +This script is not executed as part of the regular test suite, even though the examples might be executed in non-distributed mode during execution of other scripts. + +Executing these distributed mode examples requires all of the optional dependencies for distributed mode tests. With those installed, the script can be executed as follows: + +.. code-block:: bash + + python tests/run_examples.py + +.. warning:: This script needs to be executed from the root of the repository as working directory. The script will modify the working directory when it launches subprocesses, so if using environment variables like ``$PYTHONPATH``, these need to be set as absolute paths (not relative). + +.. _conformance_tests: + +Scikit-learn's test suite +------------------------- + +The |sklearnex| is regularly tested for correctness through the test suite of |sklearn| itself executed with patching applied, referred throughout the CI jobs and files as 'conformance testing'. + +Executing tests +~~~~~~~~~~~~~~~ + +To execute the |sklearn| conformance tests, the following script can be used: + +.. code-block:: bash + + ./.ci/scripts/run_sklearn_tests.sh + + +Note that some tests are known to produce failures - for example, :obj:`sklearn.linear_model.LinearRegression` allows an argument ``copy_X``, and one of their tests checks that passing ``copy_X=False`` modifies the 'X' input in-place, while the |sklearnex| never modifies this data regardless of the argument ``copy_X``, hence the test would show a failure under a patched call to |sklearn|, even though the results do not change. + +Cases that are known to fail are not executed during these conformance test. The list of deselected tests can be found under `deselected_tests.yaml `__. + +Selecting tests +*************** + +Individual tests can be executed through the underlying ``.py`` file that the ``.sh`` script executes, and other custom selections or deselections can be changed on-the-fly there through usage of environment variables - for example: + +.. code-block:: bash + + SELECTED_TESTS=all DESELECTED_TESTS="" python .ci/scripts/run_sklearn_tests.py + +The environment variables ``SELECTED_TESTS`` and ``DESELECTED_TESTS`` accept space-separated names of tests from the test suite of |sklearn|, as PyTest would take them if executed from the root of the repository. For example, in order to execute the test named `test_classification_toy `__ from the file ``ensemble/tests/test_forest.py`` `from the scikit-learn repository `__, the following can be used: + +.. code-block:: bash + + SELECTED_TESTS="ensemble/tests/test_forest.py::test_classification_toy" DESELECTED_TESTS="" \ + python .ci/scripts/run_sklearn_tests.py + +Note that these are passed to the ``pytest`` call, so other forms of pattern matching accepted by PyTest can also be used. + + +.. note:: If building the extension modules in-place :ref:`per the instructions here `, it requires also setting ``$PYTHONPATH`` for this script to work. + +Further arguments to pytest can be supplied by passing them as arguments to the `.py` runner - for example: + +.. code-block:: bash + + SELECTED_TESTS=all DESELECTED_TESTS="" python .ci/scripts/run_sklearn_tests.py -x + +GPU mode +******** + +The tests can also be made to run on GPU, either by passing argument ``gpu`` to ``run_sklearn_tests.sh``, or by passing argument ``--device `` to ``run_sklearn_tests.py`` - example: + +.. code-block:: bash + + ./.ci/scripts/run_sklearn_tests.sh gpu + +Preview mode +************ + +Note that :doc:`preview mode ` is not tested by default - in order to test it, it's necessary to set environment variable ``SKLEARNEX_PREVIEW=1`` to enable patching of such functionalities before executing either of these scripts (``.sh`` / ``.py``). The ``.sh`` script by default will take care of deselecting tests that require preview mode for patching when this environment variable is not set. + +Producing a test report +~~~~~~~~~~~~~~~~~~~~~~~ + +Optionally, a JSON report of the results can be produced (requires package ``pytest-json-report``) by setting an environment variable ``JSON_REPORT_FILE``, indicating the location where to produce a JSON output file - note that the test runner changes the PyTest root directory, so it should be specified as an absolute path, or otherwise will get written into the ``site-packages`` folder for ``sklearn``: + +.. code-block:: bash + + SELECTED_TESTS=all \ + DESELECTED_TESTS="" \ + JSON_REPORT_FILE="$(pwd)/sklearn_test_results.json" \ + python .ci/scripts/run_sklearn_tests.py + + +Comparing test reports +********************** + +A small utility to compare two JSON test reports is provided under `tests/util_compare_json_reports.py `__, which can be useful for example when comparing changes before and after a given commit. + +The file is a python script which produces a new JSON output file highlighting the tests that had different outcomes between two JSON reports. It needs to be executed with the following arguments, prefixed with two dashes and with the value passed after an equal sign (e.g. ``--arg1=value``): + +- ``json1``: path to a first JSON report file from ``pytest-json-report``. +- ``json2``: path to a second JSON report file from ``pytest-json-report``. +- ``name1``: name that the tests from the first file will use as JSON keys in the generated output file. +- ``name2``: name that the tests from the second file will use as JSON keys in the generated output file. +- ``output``: file name where to save the result JSON file that highlights the differences. + +Example: + +.. code-block:: bash + + python tests/util_compare_json_reports.py \ + --json1=logs_before.json \ + --json2=logs_after.json \ + --name1="before" \ + --name2="after" \ + --output="diffs_before_after.json" + + +The result will be a new JSON file which will contain only entries for tests that were present in both files and which had different outcomes, with a structure as follows: + +.. code-block:: + + "test_name": { # taken from 'nodeid' in the pytest json reports + : { # taken from argument 'name1' + ... # json from entry in pytest report under 'tests', minus key 'nodeid' + }, + : { # taken from argument 'name2' + ... # json from entry in pytest report under 'tests', minus key 'nodeid' + } + } diff --git a/doc/sources/topics-for-contributors.rst b/doc/sources/topics-for-contributors.rst new file mode 100644 index 0000000000..6efb509136 --- /dev/null +++ b/doc/sources/topics-for-contributors.rst @@ -0,0 +1,253 @@ +.. Copyright contributors to the oneDAL project +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. +.. include:: substitutions.rst + +======================= +Topics for Contributors +======================= + +Adding an estimator +------------------- + +Estimator classes in the |sklearnex| are wrappers over algorithms from the |onedal|. In order to add a new estimator, an example class ``DummyEstimator`` is available in the library, along with code comments and tests which explain how it should work. Estimators spawn multiple files, ranging from C++ wrappers from PyBind11, direct wrappers in the ``onedal/`` module, scikit-learn-conformant wrappers over those in the ``sklearnex/`` module, direct tests, configurations for general tests, and others. + +Example estimator +~~~~~~~~~~~~~~~~~ + +The following files and folders might be of help when looking at how the example ``DummyEstimator`` works and what is needed of an estimator: + +- Files under folder `onedal/dummy/ `__. +- Files under folder `sklearnex/dummy/ `__. + +The following files might also require changes after adding a new estimator - look out for the "dummy" keyword: + +- Import-related files: + + - `onedal/dal.cpp `__. + - `onedal/__init__.py `__. + - `setup.py `__. + - `sklearnex/__init__.py `__. + - `sklearnex/dispatcher.py.py `__. + +- Test-related files: + + - `sklearnex/tests/utils/base.py `__. + - `sklearnex/tests/test_common.py `__. + - `sklearnex/tests/test_memory_usage.py `__. + - `sklearnex/tests/test_n_jobs_support.py `__. + - `sklearnex/tests/test_patching.py `__. + - `sklearnex/tests/test_run_to_run_stability.py `__. + - `.ci/scripts/select_sklearn_tests.py `__. + +.. note:: The library contains lots of classes with legacy code from previous designs that do not work in the same way as the ``DummyEstimator`` class, such as classes based off ``daal4py``. New estimators should nevertheless not try to mimic those, and follow instead the design from ``DummyEstimator``. + +.. tip:: Another good reference example for how estimators should be implemented is :obj:`sklearn.linear_model.LinearRegression` from the ``sklearnex`` module. + +For estimators that somehow depend on functionality that is only exposed through ``daal4py``, an internal wrapper akin to the files under ``onedal/`` must first be created under ``daal4py/sklearn``, and then imported in a corresponding class on ``onedal/``. Note that new functionalities in the |onedal| are meant to be introduced through the oneAPI interface, so only legacy functionalities should ever need to go through this route. + +Version compatibilities +----------------------- + +OneDAL +~~~~~~ + +The |sklearnex| is intended to be backwards-compatible with different versions of the |onedal|, but not forwards-compatible except within a major release series - meaning: it is meant to run with a version of the |onedal| that is lower or equal than the version of the |sklearnex|, such that ``onedal==2025.0`` + ``sklearnex==2025.0`` and ``onedal==2025.0`` + ``sklearnex==2025.2`` should both work correctly, even though the latter might not expose the same functionalities with ``onedal==2025.0`` as with ``onedal==2025.2``. + +This is achieved with conditional runtime checks of the library versions in order to determine whether some class or function or similar should be defined or not, through the provided function ``daal_check_version``, which accepts a tuple as argument containing the major version number, the ``"P"`` string (other possibilities for this parameter are not used anymore), and the minor version **multiplied by 100**. So for example, if a given piece of code requires ``onedal>=2025.2``, the function should be called as follows: + +.. code-block:: python + + if daal_check_version((2025, "P", 200)): + # code branch for onedal>=2025.2 + else: + # code branch for onedal<2025.2 + +.. hint:: This helper is meant for usage in both source code and tests. + +On C++ code, the macro ``ONEDAL_VERSION`` should be checked at compile-time for conditional code inclusions or exclusions. This macro contains a single integral number with the major version, followed by the minor version using 2 digits, and other patch versions using another two digits. For example, if a given piece of code requires ``onedal>=2025.2``, the check would be as follows: + +.. code-block:: cpp + + #if defined(ONEDAL_VERSION) && ONEDAL_VERSION >= 20250200 + // code for newer version + #else + // code for older version + #endif + +Scikit-learn +~~~~~~~~~~~~ + +The |sklearnex| is intended to be compatible with multiple versions of |sklearn|. In order to achieve this compatibility, conditional runtime checks for the version of |sklearn| are executed in order to offer different code paths for different versions, through function ``sklearn_check_version``, which accepts a string with the major and minor version as recognized by ``pip``. For example, in order to have different code branches depending on ``sklearn>=1.7`` (which would also trigger for ``sklearn==1.7.2``, for example), the following can be used: + +.. code-block:: python + + if sklearn_check_version("1.7"): + # code branch for sklearn>=1.7 + else: + # code branch for sklearn<1.7 + +Test helpers +------------ + +Note that not all estimators offer the same functionalities, and thus tests should be designed accordingly. The tests provide some custom marks, fixtures, and helpers that one might to use for some cases: + +- ``@pytest.mark.allow_sklearn_fallback``: will avoid having tests fail when they end up calling procedures from |sklearn| instead of from the |onedal|. This can be helpful for example when testing that some corner case falls back correctly when it should. +- ``onedal.tests.utils._dataframes_support._as_numpy``: this function can be used to convert an input array or data frame to NumPy, regardless of whether it lives on host or on device, and regardless of array API support. + +Tests with optional dependencies +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tests that require optional dependencies in order to execute should have a conditional skip logic through usage of ``@pytest.mark.skipif``. The test files are meant to be executable without the optional dependencies being installed, so they should be imported conditionally or in a ``try`` + ``except ImportError`` block. + +SPMD tests +~~~~~~~~~~ + +Tests that involve distributed mode functionalities should rely on ``pytest-mpi`` and need to be marked with ``@pytest.mark.mpi``. + +Running benchmarks +------------------ + +As this library aims to offer accelerated versions of algorithms, when it comes to adding or modifying estimators and related helper functions, it is usually helpful - and in many cases required - to conduct benchmarks to assess the performance implications of changes, whether against |sklearn| or against the current version of the |sklearnex| when introducing changes. + +Benchmarks are usually conducted through the `scikit-learn_bench `__ tool, which lives in a different repository. See the instructions in that repository for how to run the appropriate benchmarks. + +Results from benchmarks are usually shared as a relative improvement over the baseline being compared against, which will be available in the sheets of the generated ``.xlsx`` comparison reports from that repository. Usually, the geometric mean is used as a final number, but changes for individual datasets and estimator methods are typically still of interest within a given pull request. + +Building the documentation +-------------------------- + +The source code for the documentation being rendered here is available from the same repository as the library's source code, and hosted on GitHub pages through automated deployments. The source code for the documentation is written in Sphinx, taking some docstrings from the classes and functions in the library to render them. + +Thus, building the documentation from source requires being able to import the library in the same Python environment that is building the documentation, in addition to having all of the Python packages used by the Sphinx built script, such as Sphinx itself and the Sphinx extensions used throughout these docs. + +Building documentation locally +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For development purposes, it's helpful to build the docs locally to inspect them offline without deploying, based off the current version of the source code instead of a public release version. This can be done using the provided scripts in this repository. + +Requirements +************ + +Being based off Sphinx, the scripts for building documentation require a Python environment with documentation-related packages installed. The locked requirements (and note that in many cases specific versions of the dependencies might be needed) are available in file `requirements-doc.txt `__. They can be installed from the root of the repository as follows: + +.. code-block:: bash + + pip install -r requirements-doc.txt + +.. tip:: It's advised to create a separate Python environment for building the docs due to the locked requirements and version conflicts with what's used for the tests. + +Instructions +************ + +With the necessary dependencies being installed, the docs can then be built locally **on Linux\*** by executing the following script **from the root of the repository**: + +.. code-block:: bash + + ./doc/build-doc.sh + +.. note:: The script accepts additional arguments and environment variables which are used for the versioned doc pages hosted on GitHub pages. Those are not meant to be used for local development. + +The script will copy over necessary files to the docs folder and make calls to Sphinx to build the docs as HTML. After that script is executed for the first time, if no new embedded notebooks / examples from ``.py`` files have been added, the docs can be built without the script using the provided ``Makefile``: + +.. code-block:: bash + + cd doc + make clean + make html + +.. note:: The docs can be built on Windows* using the file ``make.bat``, but be aware that it will not render everything correctly if the commands from ``build-doc.sh`` that copy files haven't been executed. + +Copyright headers +----------------- + +Each new file added to the project must include the following copyright notice - note that this project is closely tied to the |onedal| and hence shares the same copyright header. The following copyright headers should be used: + +- For Python and YAML files: + + .. code-block:: python + + # ============================================================================== + # Copyright contributors to the oneDAL project + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + # ============================================================================== + +- For markdown files: + + .. code-block:: + + + +- For javascript files: + + .. code-block:: javascript + + // Copyright contributors to the oneDAL project + // + // Licensed under the Apache License, Version 2.0 (the "License"); + // you may not use this file except in compliance with the License. + // You may obtain a copy of the License at + // + // http://www.apache.org/licenses/LICENSE-2.0 + // + // Unless required by applicable law or agreed to in writing, software + // distributed under the License is distributed on an "AS IS" BASIS, + // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + // See the License for the specific language governing permissions and + // limitations under the License. + +- For rst files: + + .. code-block:: + + .. Copyright contributors to the oneDAL project + .. + .. Licensed under the Apache License, Version 2.0 (the "License"); + .. you may not use this file except in compliance with the License. + .. You may obtain a copy of the License at + .. + .. http://www.apache.org/licenses/LICENSE-2.0 + .. + .. Unless required by applicable law or agreed to in writing, software + .. distributed under the License is distributed on an "AS IS" BASIS, + .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + .. See the License for the specific language governing permissions and + .. limitations under the License. + +If for some reason it doesn't make sense to include this copyright header in a text-based file (e.g. json files), said file needs to be added to the `exclusion list `__, but this should be a rare occurrence. diff --git a/scikit-learn-tests.md b/scikit-learn-tests.md deleted file mode 100644 index 4e4a121aeb..0000000000 --- a/scikit-learn-tests.md +++ /dev/null @@ -1,95 +0,0 @@ - - -# Running the scikit-learn test suite - -The Extension for scikit-learn* is regularly tested for correctness through the test suite of scikit-learn* itself executed with patching applied, referred throughout the CI jobs and files as 'conformance testing'. - -To execute the scikit-learn* conformance tests, the following script can be used: - -```shell -./.ci/scripts/run_sklearn_tests.sh -``` - -Note that some tests are known to produce failures - for example, scikit-learn's [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) allows an argument `copy_X`, and one of their tests checks that passing `copy_X=False` modifies the 'X' input in-place, while the extension never modifies this data regardless of the argument `copy_X`, and hence the test would show a failure under a patched scikit-learn*, even though the results do not change. - -Cases that are known to fail are not executed during these conformance test. The list of deselected tests can be found under [deselected_tests.yaml](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/deselected_tests.yaml). - -Individual tests can be executed through the underlying `.py` file that the `.sh` script executes, and other custom selections or deselections can be changed on-the-fly there through usage of environment variables - for example: - -```shell -SELECTED_TESTS=all DESELECTED_TESTS="" python .ci/scripts/run_sklearn_tests.py -``` - -_**Note:** If building the extension modules in-place [per the instructions here](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md#build-intelr-extension-for-scikit-learn), it requires also setting `$PYTHONPATH` for this script._ - -Further arguments to pytest can be supplied by passing them as arguments to the `.py` runner - for example -```shell -SELECTED_TESTS=all DESELECTED_TESTS="" python .ci/scripts/run_sklearn_tests.py -x -``` - -The tests can also be made to run on GPU, either by passing argument `gpu` to `run_sklearn_tests.sh`, or by passing argument `--device ` to `run_sklearn_tests.py` - example: -```shell -./.ci/scripts/run_sklearn_tests.sh gpu -``` - -Note that [preview mode](https://uxlfoundation.github.io/scikit-learn-intelex/latest/preview.html) is not tested by default - in order to test it, it's necessary to set environment variable `SKLEARNEX_PREVIEW=1` to enable patching of such functionalities before executing either of these scripts (`.sh` / `.py`). The `.sh` script by default will take care of deselecting tests that require preview mode for patching when this environment variable is not set. - -## Producing a test report - -Optionally, a JSON report of the results can be produced (requires package `pytest-json-report`) by setting an environment variable `JSON_REPORT_FILE`, indicating the location where to produce a JSON output file - note that the test runner changes the PyTest root directory, so it should be specified as an absolute path, or otherwise will get written into the `site-packages` folder for `sklearn`: - -```shell -SELECTED_TESTS=all \ -DESELECTED_TESTS="" \ -JSON_REPORT_FILE="$(pwd)/sklearn_test_results.json" \ - python .ci/scripts/run_sklearn_tests.py -``` - -## Comparing test reports - -A small utility to compare two JSON test reports is provided under [tests/util_compare_json_reports.py](https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/tests/util_compare_json_reports.py), which can be useful for example when comparing changes before and after a given commit. - -The file is a python script which produces a new JSON output file highlighting the tests that had different outcomes between two JSON reports. It needs to be executed with the following arguments, prefixed with two dashes and with the value passed after an equal sign (e.g. `--arg1=value`): - -* `json1`: path to a first JSON report file from `pytest-json-report`. -* `json2`: path to a second JSON report file from `pytest-json-report`. -* `name1`: name that the tests from the first file will use as JSON keys in the generated output file. -* `name2`: name that the tests from the second file will use as JSON keys in the generated output file. -* `output`: file name where to save the result JSON file that highlights the differences. - -Example: -```shell -python tests/util_compare_json_reports.py \ - --json1=logs_before.json \ - --json2=logs_after.json \ - --name1="before" \ - --name2="after" \ - --output="diffs_before_after.json" -``` - -The result will be a new JSON file which will contain only entries for tests that were present in both files and which had different outcomes, with a structure as follows: -``` -"test_name": { # taken from 'nodeid' in the pytest json reports - : { # taken from argument 'name1' - ... # json from entry in pytest report under 'tests', minus key 'nodeid' - }, - : { # taken from argument 'name2' - ... # json from entry in pytest report under 'tests', minus key 'nodeid' - } -} -``` diff --git a/tests/util_compare_json_reports.py b/tests/util_compare_json_reports.py index 273feca844..db687bdc1f 100644 --- a/tests/util_compare_json_reports.py +++ b/tests/util_compare_json_reports.py @@ -14,9 +14,8 @@ # limitations under the License. # ============================================================================== -# Note: see file 'scikit-learn-tests.md' for instructions about usage -# of this script: -# https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/scikit-learn-tests.md +# Note: see the docs for instructions about usage of this script: +# https://uxlfoundation.github.io/scikit-learn-intelex/development/tests.html#scikit-learn-s-test-suite import json import sys from typing import Any