diff --git a/.ci/AGENTS.md b/.ci/AGENTS.md new file mode 100644 index 0000000000..7ff23cb67e --- /dev/null +++ b/.ci/AGENTS.md @@ -0,0 +1,103 @@ +# AGENTS.md - CI/CD Infrastructure (.ci/) + +## Purpose +CI/CD infrastructure for building, testing, and releasing Intel Extension for Scikit-learn across multiple platforms. + +## Key Files for Agents +- `.ci/pipeline/ci.yml` - Main CI orchestrator +- `.ci/pipeline/build-and-test-*.yml` - Platform-specific builds +- `.ci/pipeline/linting.yml` - Code quality enforcement +- `.ci/scripts/` - Automation utilities + +## Platform Support +- **Linux/macOS**: Uses conda, Intel DPC++ compiler, MPI support +- **Windows**: Visual Studio 2022, conda-forge packages +- **GPU**: Intel GPU support via DPC++/SYCL (dpctl, dpnp packages) + +## Quality Gates +- **Linting**: black, isort, clang-format, numpydoc validation +- **Testing**: pytest with cross-platform compatibility +- **Coverage**: codecov integration with threshold enforcement + +## Build Dependencies +- **oneDAL**: Downloads nightly builds from upstream oneDAL repo +- **Python**: Matrix testing across Python 3.9-3.13 (verified in .ci/pipeline/ci.yml) +- **sklearn**: Multiple version compatibility (1.0-1.7) +- **GPU Libraries**: dpctl, dpnp for Intel GPU acceleration + +## Release Process +- **Automated**: Dynamic matrix generation for PyPI/conda releases +- **Multi-channel**: Both PyPI wheels and conda packages +- **Quality**: Automated sklearn compatibility testing before release + +## Local Development Setup + +### Quality Tools Configuration (from pyproject.toml) +```bash +# Code formatting +black --line-length 90 +isort --profile black --line-length 90 + +# C++ formatting +clang-format --style=file + +# Documentation validation +numpydoc-validation +``` + +### Build Dependencies Download +```bash +# oneDAL nightly builds (from .github/workflows/ci.yml) +# Automatically downloads from uxlfoundation/oneDAL nightly builds +# Sets DALROOT to downloaded oneDAL location +``` + +### Platform-Specific Build Commands + +**Linux/macOS** (from .ci/pipeline/build-and-test-lnx.yml): +```bash +# Install DPC++ compiler +bash .ci/scripts/install_dpcpp.sh + +# Set up environment +source /opt/intel/oneapi/compiler/latest/env/vars.sh +export DPCPPROOT=/opt/intel/oneapi/compiler/latest + +# Create conda environment +conda create -q -y -n CB -c conda-forge python=3.11 mpich pyyaml +conda activate CB +pip install -r dependencies-dev + +# Build +./conda-recipe/build.sh +``` + +**Windows** (from .ci/pipeline/build-and-test-win.yml): +```batch +# Visual Studio setup +call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall" x64 + +# Build +call conda-recipe\bld.bat +``` + +### Environment Variables for Development +```bash +# From setup.py and CI scripts +export DALROOT=/path/to/onedal # Required +export DPCPPROOT=/opt/intel/oneapi/compiler/latest # For GPU support +export MPIROOT=/path/to/mpi # For distributed computing +export NO_DPC=1 # Disable GPU support +export NO_DIST=1 # Disable distributed computing +export SKLEARNEX_VERSION=2024.7.0 # Version override +export MAKEFLAGS="-j$(nproc)" # Parallel build +``` + +## For AI Agents +- Follow established build templates +- Respect quality gates (linting, testing, coverage) +- Use platform-specific configurations appropriately +- Test across supported Python/sklearn version combinations +- Set required environment variables (DALROOT, DPCPPROOT, MPIROOT) +- Use conda environments to avoid dependency conflicts +- Run pre-commit hooks before submitting changes \ No newline at end of file diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml index d78de976b8..e486f6e7bd 100644 --- a/.github/.licenserc.yaml +++ b/.github/.licenserc.yaml @@ -67,9 +67,11 @@ header: - '.github/CODEOWNERS' - '.github/Pull_Request_template.md' - '.github/renovate.json' + - '.github/instructions/*.md' # Specific files - 'setup.cfg' - 'LICENSE' + - 'AGENTS.md' # External copies of copyrighted work - 'onedal/datatypes/dlpack/dlpack.h' comment: never diff --git a/.github/instructions/build-config.instructions.md b/.github/instructions/build-config.instructions.md new file mode 100644 index 0000000000..d23ea797ae --- /dev/null +++ b/.github/instructions/build-config.instructions.md @@ -0,0 +1,88 @@ +# Build Configuration Files + +## Core Build Files +- `setup.py`: Main build script (500+ lines, complex configuration) +- `pyproject.toml`: Python project metadata + linting configuration +- `dependencies-dev`: Build-time dependencies (Cython, numpy, pybind11, cmake) +- `requirements-test.txt`: Test dependencies with version constraints +- `conda-recipe/meta.yaml`: Conda package build configuration + +## Environment Variables (Critical) +```bash +# MANDATORY for building +export DALROOT=/path/to/onedal # oneDAL installation path (required) + +# OPTIONAL but commonly needed +export MPIROOT=/path/to/mpi # MPI for distributed features +export NO_DIST=1 # Disable distributed mode +export NO_DPC=1 # Disable GPU/SYCL support +export NO_STREAM=1 # Disable streaming mode +export DEBUG_BUILD=1 # Debug symbols + no optimization +export MAKEFLAGS=-j$(nproc) # Parallel build threads +``` + +## Build Process (4 Stages) +1. **Code Generation**: oneDAL C++ headers → Python/Cython sources +2. **oneDAL Bindings**: cmake + pybind11 compilation +3. **Cython Processing**: .pyx files → C++ sources +4. **Final Compilation**: Link everything into Python extensions + +## Dependencies +**Build Dependencies (dependencies-dev):** +- Cython==3.1.1 (exact version required) +- numpy>=2.0 (version varies by Python version) +- pybind11==2.13.6 +- cmake==4.0.2 +- setuptools==79.0.1 + +**Runtime Dependencies:** +- Intel oneDAL 2021.1+ (backwards compatible) +- numpy (version-specific, see requirements-test.txt) +- scikit-learn 1.0-1.7 (see compatibility matrix) + +## Build Commands +```bash +# Development build (RECOMMENDED) +python setup.py develop # Creates .egg-link, editable + +# Production builds +python setup.py install # Full install +python setup.py build_ext --inplace --force # Extensions only + +# Special flags (Linux) +python setup.py build --abs-rpath # Absolute RPATH for custom oneDAL + +# Conda build +conda build . # Uses conda-recipe/meta.yaml +``` + +## Common Build Issues +```bash +# oneDAL not found +RuntimeError: "Not set DALROOT variable" +→ Solution: export DALROOT=/path/to/onedal + +# MPI required but missing +ValueError: "'MPIROOT' is not set, cannot build with distributed mode" +→ Solution: export NO_DIST=1 or set MPIROOT + +# Cython version mismatch +→ Solution: pip install Cython==3.1.1 (exact version) + +# Linking issues (Linux) +→ Solution: Use --abs-rpath flag +``` + +## CI/CD Configuration +- **GitHub Actions**: `.github/workflows/ci.yml` +- **Azure DevOps**: `.ci/pipeline/ci.yml` (main CI system) +- **Pre-commit**: `.pre-commit-config.yaml` (code quality) + +Build timeouts: 120 minutes in CI (can be slow due to oneDAL compilation) + +## Related Instructions +- `general.instructions.md` - Quick start build commands +- `src.instructions.md` - C++/Cython build details +- `tests.instructions.md` - Testing after successful builds + +For platform-specific build details, see `.ci/AGENTS.md` \ No newline at end of file diff --git a/.github/instructions/daal4py.instructions.md b/.github/instructions/daal4py.instructions.md new file mode 100644 index 0000000000..8efdf6860d --- /dev/null +++ b/.github/instructions/daal4py.instructions.md @@ -0,0 +1,49 @@ +# daal4py/* - Direct oneDAL Python Bindings + +## Purpose +Direct Python bindings to Intel oneDAL for maximum performance and model builders for XGBoost/LightGBM conversion. + +## Three Sub-APIs +1. **Native oneDAL**: `import daal4py as d4p` - Direct algorithm access +2. **sklearn-compatible**: `from daal4py.sklearn import ...` - sklearn API with oneDAL backend +3. **Model Builders**: `from daal4py.mb import convert_model` - External model conversion + +## API Overview + +For detailed native oneDAL patterns and model builders, see [daal4py/AGENTS.md](../daal4py/AGENTS.md). + +**Basic Pattern**: +```python +import daal4py as d4p +algorithm = d4p.dbscan(epsilon=0.5, minObservations=5) +result = algorithm.compute(data) +``` + +**Model Conversion**: +```python +from daal4py.mb import convert_model +d4p_model = convert_model(xgb_model) # 10-100x faster inference +``` + +## Testing +```bash +# Native daal4py tests +pytest --verbose --pyargs daal4py +pytest tests/test_daal4py_examples.py # Native API examples +pytest tests/test_model_builders.py # Model conversion tests + +# sklearn compatibility in daal4py +pytest daal4py/sklearn/tests/ # sklearn-compatible API +``` + +## Development Notes +- Native API provides direct oneDAL algorithm access (fastest performance) +- sklearn-compatible API in `daal4py/sklearn/` maintains full sklearn compatibility +- Model builders enable oneDAL inference for models trained with other frameworks + +## Related Instructions +- `general.instructions.md` - Repository setup and build requirements +- `onedal.instructions.md` - Low-level backend that daal4py wraps +- `src.instructions.md` - Core C++/Cython implementation details +- `tests.instructions.md` - Testing native oneDAL algorithms +- See `daal4py/AGENTS.md` for detailed algorithm usage patterns \ No newline at end of file diff --git a/.github/instructions/general.instructions.md b/.github/instructions/general.instructions.md new file mode 100644 index 0000000000..522bc5ec42 --- /dev/null +++ b/.github/instructions/general.instructions.md @@ -0,0 +1,58 @@ +# General Repository Instructions - Intel Extension for Scikit-learn + +## Repository Overview + +**Intel Extension for Scikit-learn** (scikit-learn-intelex) accelerates scikit-learn by 10-100x using Intel oneDAL. Zero code changes required for existing sklearn applications. + +- **Languages**: Python (70%), C++ (25%), Cython (5%) +- **Architecture**: 4-layer system (sklearnex → daal4py → onedal → Intel oneDAL C++) +- **Platforms**: Linux, Windows, macOS; CPU (x86_64, ARM), GPU (Intel via SYCL) +- **Python**: 3.9-3.13 supported + +## Quick Start + +**Build Setup**: See [build-config.instructions.md](build-config.instructions.md) for complete details. +```bash +export DALROOT=/path/to/onedal +python setup.py develop +``` + +**Testing**: See [tests.instructions.md](tests.instructions.md) for comprehensive testing. +```bash +pytest --verbose --pyargs sklearnex +``` + +**Code Quality**: +```bash +pre-commit run --all-files +``` + +## Code Standards + +- **Python**: Black (line-length=90) + isort +- **C++**: clang-format version ≥14 +- **Commits**: Must be signed-off (`git commit -s`) +- **Documentation**: numpydoc format + +## Common Issues & Solutions + +```bash +# Build failures +export NO_DIST=1 # Disable distributed mode if MPI issues +export NO_DPC=1 # Disable GPU if driver issues +python setup.py build_ext --inplace --force --abs-rpath # Linux linking + +# Import/path issues +export PYTHONPATH=$(pwd) # Add repo to path +python setup.py develop # Ensure editable install +``` + +## Related Instructions +- `sklearnex.instructions.md` - Primary sklearn interface and patching +- `daal4py.instructions.md` - Direct oneDAL bindings and model builders +- `onedal.instructions.md` - Low-level C++ bindings +- `src.instructions.md` - Core C++/Cython implementation +- `tests.instructions.md` - Testing infrastructure and validation +- `build-config.instructions.md` - Build system and environment setup + +For detailed implementation guides, see the corresponding AGENTS.md files in each directory. \ No newline at end of file diff --git a/.github/instructions/onedal.instructions.md b/.github/instructions/onedal.instructions.md new file mode 100644 index 0000000000..8ca8f7b303 --- /dev/null +++ b/.github/instructions/onedal.instructions.md @@ -0,0 +1,63 @@ +# onedal/* - Low-Level C++ Bindings + +## Purpose +Pybind11-based C++ bindings providing the bridge between Python and Intel oneDAL C++ library. + +## Key Components +- `datatypes/`: Memory management and array conversions (NumPy, SYCL USM, DLPack) +- `common/`: Policy management, device selection, serialization +- `*/`: Algorithm-specific implementations (cluster/, decomposition/, linear_model/, etc.) +- `spmd/`: Distributed computing interfaces + +## Memory Management +```python +# Zero-copy conversions handled automatically +import numpy as np +from onedal.cluster import DBSCAN + +# NumPy arrays converted to oneDAL tables without copying +X = np.random.random((1000, 10)) +model = DBSCAN().fit(X) # Automatic NumPy → oneDAL conversion +``` + +## Device Context + +For comprehensive device management, see [onedal/AGENTS.md](../onedal/AGENTS.md). + +```python +import dpctl +with dpctl.device_context("gpu:0"): + model = DBSCAN().fit(X) +``` + +## Algorithm Structure +- Each algorithm module follows consistent pattern: + - `fit()` method for training + - `predict()` method for inference (where applicable) + - Parameters match oneDAL C++ API + - Results as Python objects with named attributes + +## Testing +```bash +# Low-level onedal tests +pytest onedal/tests/ # Core functionality +pytest onedal/datatypes/tests/ # Memory management +pytest onedal/common/tests/ # Device/policy tests + +# Algorithm-specific tests +pytest onedal/cluster/tests/test_dbscan.py # DBSCAN implementation +pytest onedal/linear_model/tests/ # Linear models +``` + +## Development Notes +- Direct interface to oneDAL C++ API through pybind11 +- Handles memory management between Python/C++ automatically +- Provides foundation for both daal4py and sklearnex layers +- SPMD module enables distributed computing with MPI + +## Related Instructions +- `general.instructions.md` - Repository setup and build requirements +- `src.instructions.md` - C++/Cython implementation that uses onedal +- `sklearnex.instructions.md` - High-level layer built on onedal +- `daal4py.instructions.md` - Alternative interface to onedal +- See `onedal/AGENTS.md` for detailed technical implementation \ No newline at end of file diff --git a/.github/instructions/sklearnex.instructions.md b/.github/instructions/sklearnex.instructions.md new file mode 100644 index 0000000000..db0afc1776 --- /dev/null +++ b/.github/instructions/sklearnex.instructions.md @@ -0,0 +1,55 @@ +# sklearnex/* - Primary sklearn-compatible Interface + +## Purpose +Primary user interface for sklearn acceleration with patching system and device offloading. + +## Key Files & Functions +- `dispatcher.py`: Patching system (`get_patch_map_core` line 36) +- `_device_offload.py`: GPU/CPU dispatch (`dispatch` function line 72) +- `_config.py`: Global configuration (target_offload, allow_fallback_to_host) +- `base.py`: oneDALEstimator base class for all accelerated algorithms + +## Usage Patterns + +**Global Patching (Most Common):** +```python +from sklearnex import patch_sklearn +patch_sklearn() # All sklearn imports now accelerated +from sklearn.cluster import DBSCAN # Uses oneDAL implementation +``` + +**Selective Patching:** +```python +patch_sklearn(["DBSCAN", "KMeans"]) # Only specific algorithms +``` + +**Direct Import (No Patching):** +```python +from sklearnex.cluster import DBSCAN # Always oneDAL implementation +``` + +**Device Control**: See [sklearnex/AGENTS.md](../sklearnex/AGENTS.md) for comprehensive device configuration. +```python +from sklearnex import config_context +with config_context(target_offload="gpu:0"): + model.fit(X, y) +``` + +## Testing +```bash +# sklearnex-specific tests +pytest --verbose --pyargs sklearnex +pytest sklearnex/tests/test_patching.py # Core patching functionality +pytest sklearnex/tests/test_config.py # Configuration system +``` + +## Development Notes +- All sklearn-compatible algorithms inherit from `base.oneDALEstimator` +- Fallback to original sklearn if oneDAL implementation unavailable +- Device offloading requires Intel GPU drivers and SYCL runtime + +## Related Instructions +- `general.instructions.md` - Repository setup and build requirements +- `onedal.instructions.md` - Low-level backend that sklearnex uses +- `tests.instructions.md` - Testing the sklearn compatibility layer +- See `sklearnex/AGENTS.md` for detailed module information \ No newline at end of file diff --git a/.github/instructions/src.instructions.md b/.github/instructions/src.instructions.md new file mode 100644 index 0000000000..fb3ab01086 --- /dev/null +++ b/.github/instructions/src.instructions.md @@ -0,0 +1,63 @@ +# src/* - Core C++/Cython Implementation + +## Purpose +Core C++/Cython implementation layer providing the foundation for the entire stack. + +## Key Files +- `daal4py.cpp`: Main Cython interface to oneDAL +- `daal4py.h`: C++ headers and type definitions +- `*_builder.pyx`: Model builder implementations (XGBoost, LightGBM conversion) +- `gettree.pyx`: Tree model extraction utilities +- `mpi/`: Distributed computing infrastructure + +## Architecture +- **Cython Interface**: `daal4py.cpp` provides Python↔C++ bridge +- **Memory Management**: `npy4daal.h` handles NumPy array conversions +- **Distributed Computing**: MPI-based implementations in `mpi/` +- **Model Builders**: Cython implementations for external model conversion + +## Build Process +1. **Code Generation**: Python scripts generate C++ from oneDAL headers +2. **Cython Compilation**: `.pyx` files compiled to C++ +3. **C++ Compilation**: Link with oneDAL libraries +4. **Extension Creation**: Python extension modules + +## Development Workflow + +See [build-config.instructions.md](build-config.instructions.md) for environment setup. + +```bash +# Rebuild after C++/Cython changes +python setup.py build_ext --inplace --force +``` + +## MPI/Distributed Features +- Located in `src/mpi/` +- Requires MPI installation (`MPIROOT` environment variable) +- Enable with `mpi4py` for distributed sklearn operations +- Disable with `NO_DIST=1` if MPI unavailable + +## Testing +```bash +# Test distributed features (requires MPI) +mpirun -n 2 python -m pytest tests/test_daal4py_spmd_examples.py + +# Test model builders +pytest tests/test_model_builders.py + +# Test core functionality +pytest tests/test_daal4py_serialization.py +``` + +## Development Notes +- No incremental compilation - full rebuild required for changes +- Use `ccache` for faster development builds +- ASan builds supported for debugging (see INSTALL.md) +- C++ code must follow clang-format style + +## Related Instructions +- `general.instructions.md` - Repository setup and build requirements +- `build-config.instructions.md` - Build system and compilation details +- `onedal.instructions.md` - Python bindings that src/ implements +- `daal4py.instructions.md` - Higher-level API built on src/ +- See `src/AGENTS.md` for detailed implementation guides \ No newline at end of file diff --git a/.github/instructions/tests.instructions.md b/.github/instructions/tests.instructions.md new file mode 100644 index 0000000000..6f57ce7116 --- /dev/null +++ b/.github/instructions/tests.instructions.md @@ -0,0 +1,84 @@ +# tests/* - Testing Infrastructure + +## Test Structure +- `tests/`: Legacy daal4py tests and examples +- Individual module tests in respective directories (sklearnex/tests/, onedal/tests/, etc.) +- `deselected_tests.yaml`: Tests skipped in CI due to platform/dependency issues + +## Test Execution Order (CRITICAL) + +**Preparation**: +```bash +pip install -r requirements-test.txt +``` + +**Core Test Suites** (run in order): +```bash +pytest --verbose -s tests/ # Legacy daal4py tests +pytest --verbose --pyargs daal4py # Native oneDAL API tests +pytest --verbose --pyargs sklearnex # sklearn compatibility tests +``` + +**Specific Categories**: +```bash +pytest tests/test_daal4py_examples.py # Native API examples +pytest tests/test_model_builders.py # XGBoost/LightGBM conversion +pytest tests/test_daal4py_spmd_examples.py # Distributed computing (requires MPI) +``` + +## Test Configuration +```bash +# Environment for testing +export COVERAGE_RCFILE=$(readlink -f .coveragerc) # Coverage configuration +export NO_DIST=1 # Disable distributed tests +export NO_DPC=1 # Disable GPU tests + +# Memory-intensive tests may require >8GB RAM +# GPU tests require Intel GPU + drivers +# Distributed tests require MPI setup (mpirun -n 2 pytest ...) +``` + +## Test Categories + +**Core Functionality:** +- `test_daal4py_examples.py`: Native oneDAL algorithm usage +- `test_estimators.py`: Algorithm parameter validation +- `test_printing.py`: Output formatting and verbose mode + +**Compatibility:** +- `test_examples_sklearnex.py`: sklearn compatibility validation +- `test_npy.py`: NumPy array handling + +**Advanced Features:** +- `test_model_builders.py`: External model conversion (XGBoost/LightGBM/CatBoost) +- `test_daal4py_serialization.py`: Model save/load functionality +- `test_daal4py_spmd_examples.py`: Distributed computing with MPI + +## Deselected Tests +Tests in `deselected_tests.yaml` are skipped in CI due to: +- Platform-specific issues (Windows/Linux differences) +- Hardware requirements (GPU, specific CPU features) +- External dependencies (MPI, specific library versions) +- Memory constraints (large dataset tests) + +## Development Testing +```bash +# Quick development tests (subset) +pytest tests/test_estimators.py # Parameter validation +pytest sklearnex/tests/test_patching.py # Core patching + +# Memory/performance tests +pytest --maxfail=1 tests/ # Stop on first failure + +# Coverage testing +pytest --cov=sklearnex --cov=daal4py --cov=onedal +``` + +## Related Instructions +- `general.instructions.md` - Repository setup and core testing commands +- `sklearnex.instructions.md` - Testing sklearn compatibility layer +- `daal4py.instructions.md` - Testing native oneDAL algorithms +- `onedal.instructions.md` - Testing low-level bindings +- `src.instructions.md` - Testing C++/Cython core and distributed features + +See individual module AGENTS.md files for module-specific testing details. \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..4153815ffe --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,213 @@ +# AGENTS.md - Intel Extension for Scikit-learn + +## Quick Context +- **Purpose**: Accelerate scikit-learn using Intel oneDAL optimizations +- **License**: Apache 2.0 +- **Languages**: Python, C++, Cython +- **Platforms**: CPU (x86_64, ARM), GPU (Intel via SYCL) + +## Architecture (4 Layers) +``` +User Apps → sklearnex/ → daal4py/ → onedal/ → Intel oneDAL C++ +``` + +**Key Layer Functions:** +- `sklearnex/`: sklearn API compatibility + patching +- `daal4py/`: Direct oneDAL access + model builders +- `onedal/`: Pybind11 bindings + memory management +- `src/`: C++/Cython core implementation + +## Entry Points by Use Case + +**For sklearn acceleration:** +```python +from sklearnex import patch_sklearn; patch_sklearn() +# OR direct import +from sklearnex.cluster import DBSCAN +``` + +**For native oneDAL performance:** +```python +import daal4py as d4p +algorithm = d4p.dbscan(epsilon=0.5, minObservations=5) +``` + +**For model conversion:** +```python +from daal4py.mb import convert_model +d4p_model = convert_model(xgb_model) # XGBoost→oneDAL +``` + +## Accelerated Algorithms +- **Clustering**: DBSCAN, K-Means +- **Classification**: SVM, RandomForest, LogisticRegression, NaiveBayes +- **Regression**: LinearRegression, Ridge, Lasso, ElasticNet, SVR +- **Decomposition**: PCA, IncrementalPCA +- **Neighbors**: KNeighbors (classification/regression) +- **Preprocessing**: Scalers, normalizers + +## Device Configuration +```python +from sklearnex import config_context + +# GPU offloading +with config_context(target_offload="gpu:0"): + model.fit(X, y) + +# Force CPU +with config_context(target_offload="cpu"): + model.fit(X, y) +``` + +## Performance Patterns +- **Memory**: Zero-copy NumPy↔oneDAL, SYCL USM for GPU +- **Parallelism**: Intel TBB threading, MPI distributed, SIMD vectorization +- **Fallbacks**: oneDAL → sklearn → error cascade + +## Key Files for AI Agents +- `sklearnex/dispatcher.py`: Patching system (line 36: `get_patch_map_core`) +- `sklearnex/_device_offload.py`: Device dispatch (line 72: `dispatch`) +- `onedal/__init__.py`: Backend selection +- `daal4py/__init__.py`: Native API entry +- `src/`: C++/Cython core (distributed computing, memory management) + +## Development Environment Setup + +### Prerequisites +- **Python**: 3.9-3.13 (verified in setup.py classifiers and README.md badges) +- **oneDAL**: 2021.1+ (backwards compatible, verified in INSTALL.md) +- **Dependencies**: Cython==3.1.1, Jinja2==3.1.6, numpy>=2.0.1, pybind11==2.13.6, cmake==4.0.2 (verified in dependencies-dev file) + +### Build Commands +```bash +# Development setup +pip install -r dependencies-dev # Verified: contains Cython, Jinja2, numpy, pybind11, cmake +export DALROOT=/path/to/onedal # Required (verified in setup.py:53-59) +export MPIROOT=/path/to/mpi # For distributed support (verified in setup.py:95-100) +python setup.py develop # Development mode + +# Environment options +export NO_DPC=1 # Disable GPU support +export NO_DIST=1 # Disable distributed computing +export NO_STREAM=1 # Disable streaming mode +``` + +### Testing Strategy +```bash +# Core test suites (from conda-recipe/run_test.sh) +pytest --verbose -s tests/ # Legacy tests +pytest --verbose --pyargs daal4py # Native oneDAL tests +pytest --verbose --pyargs sklearnex # sklearn compatibility +pytest --verbose --pyargs onedal # Low-level backend +pytest --verbose .ci/scripts/test_global_patch.py # Global patching + +# Distributed testing (requires MPI) +mpirun -n 4 python tests/helper_mpi_tests.py pytest -k spmd --with-mpi --pyargs sklearnex +``` + +## Performance Expectations + +### Benchmarked Speedups +- **General**: 10-100X acceleration (verified in README.md) +- **Training**: Up to 100x speedup mentioned in README.md +- **Inference**: Significant speedup, model builders claim 10-100x for converted models +- **Range**: 1-3 orders of magnitude improvement depending on algorithm/dataset +- **Note**: Specific 27x/36x figures not found in current codebase, general 10-100X claims verified + +### Algorithm Support Decision Matrix + +**oneDAL Acceleration Criteria** (verified in sklearnex/cluster/dbscan.py:108-138): +```python +def _onedal_supported(self, method_name, *data): + # Data requirements (verified in DBSCAN implementation) + - Dense data only (not sp.issparse(X)) + - Supported dtypes: float32, float64 + - Contiguous memory layout preferred + + # Algorithm-specific constraints (verified in actual code) + - DBSCAN: algorithm in ["auto", "brute"], metric="euclidean" or "minkowski" with p=2 + - Parameter compatibility checks via PatchingConditionsChain +``` + +**GPU Support Status** (from sklearnex/AGENTS.md): +- **Full GPU**: DBSCAN, K-Means, PCA, KNeighbors +- **Limited GPU**: LogisticRegression (2024.1+), SVM +- **CPU Only**: RandomForest, Ridge, IncrementalPCA + +### Error Handling and Fallback Strategy + +**Fallback Chain** (verified in onedal/_config.py:45-50): +```python +# Configuration controls fallback behavior +_default_global_config = { + "target_offload": "auto", # Auto device selection + "allow_fallback_to_host": False, # GPU → CPU fallback + "allow_sklearn_after_onedal": True, # oneDAL → sklearn fallback + "use_raw_input": False, # Raw input usage +} +``` + +**Fallback Triggers**: +1. **Unsupported data**: Sparse matrices, unsupported dtypes +2. **Unsupported parameters**: Algorithm-specific limitations +3. **Hardware constraints**: GPU memory limits, device unavailability +4. **Runtime errors**: oneDAL computation failures + +### Memory Management Patterns + +**Critical Requirements** (from sklearnex/utils/validation.py): +```python +# oneDAL requires contiguous data - copying avoided for performance +def _onedal_supported_format(X, xp): + return is_contiguous(X) # C-contiguous preferred +``` + +**Data Layout**: +- **Contiguous arrays**: Required for zero-copy operations +- **Data types**: float32/float64 preferred, automatic conversion when needed +- **Memory layout**: C-contiguous > Fortran-contiguous > non-contiguous + +### GPU Hardware Requirements + +**Supported Intel GPUs**: +- **Integrated**: Intel UHD Graphics, Intel Iris Xe +- **Discrete**: Intel Arc A370M, Arc B580, Arc series +- **Requirements**: SYCL/DPC++ support, Intel oneAPI toolkit +- **Memory**: Unified Shared Memory (USM) support for zero-copy operations + +### Version Compatibility + +**Supported Versions** (verified in README.md badges and setup.py): +- **Python**: 3.9, 3.10, 3.11, 3.12, 3.13 (verified in setup.py:609-613) +- **scikit-learn**: 1.0, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 (verified in README.md badge) +- **oneDAL**: 2021.1+ (backwards compatible only, verified in INSTALL.md) + +### Code Generation vs Manual Implementation + +**When to use generator/** (from INSTALL.md build process): +1. **Automatic**: C++ headers → Python bindings (stage 1 of 4-stage build) +2. **Manual Python**: Direct sklearn interface implementations +3. **Generator changes**: Required for new oneDAL algorithms not yet wrapped +4. **Python changes**: Sufficient for parameter handling, validation, sklearn compatibility + +### SPMD (Distributed) Usage Guidelines + +**When to use SPMD** (from tests/helper_mpi_tests.py, conda-recipe/run_test.sh): +- **Large datasets**: When single-node memory insufficient +- **Supported algorithms**: DBSCAN, K-Means, PCA, Linear Regression +- **Setup**: Requires MPI (Intel MPI or OpenMPI), mpi4py +- **Testing**: `mpirun -n 4` for validation + +**MPI Requirements** (from setup.py): +```python +mpi_root = os.environ.get("MPIROOT", os.environ.get("I_MPI_ROOT")) +# Required unless NO_DIST=1 +``` + +## Component Documentation +- `sklearnex/AGENTS.md`: API patterns, device offloading +- `daal4py/AGENTS.md`: Native oneDAL bindings, model builders +- `onedal/AGENTS.md`: Pybind11 implementation, memory management +- `src/AGENTS.md`: C++/Cython core, distributed computing +- `examples/AGENTS.md`: Usage patterns (113 scripts, 19 notebooks) +- `tests/AGENTS.md`: Testing infrastructure, validation patterns \ No newline at end of file diff --git a/daal4py/AGENTS.md b/daal4py/AGENTS.md new file mode 100644 index 0000000000..306a66ab6b --- /dev/null +++ b/daal4py/AGENTS.md @@ -0,0 +1,433 @@ +# AGENTS.md - daal4py Package + +## Purpose +**Direct Python bindings to Intel oneDAL** for maximum performance + +## Two APIs +1. **Native oneDAL**: `import daal4py as d4p` +2. **sklearn-compatible**: `from daal4py.sklearn import ...` +3. **Model Builders**: `from daal4py.mb import convert_model` + +## Native oneDAL API Usage + +**Basic Pattern:** +```python +import daal4py as d4p +import numpy as np + +# Create algorithm +algorithm = d4p.dbscan(epsilon=0.5, minObservations=5) + +# Run computation +result = algorithm.compute(data) + +# Access results +cluster_labels = result.assignments +core_indices = result.coreIndices +``` + +**Common Algorithms:** +```python +# Clustering +d4p.dbscan(epsilon=0.5, minObservations=5) +d4p.kmeans(nClusters=3, maxIterations=300) + +# Decomposition +d4p.pca(method="defaultDense") +d4p.svd(method="defaultDense") + +# Linear Models +d4p.linear_regression_training() +d4p.ridge_regression_training(ridgeParameters=1.0) +``` + +## sklearn-Compatible API + +**Usage:** +```python +from daal4py.sklearn.cluster import DBSCAN +from daal4py.sklearn.linear_model import Ridge + +# Use like normal sklearn +clusterer = DBSCAN(eps=0.5, min_samples=5) +labels = clusterer.fit_predict(X) +``` + +**Patching System:** +```python +from daal4py.sklearn.monkeypatch import patch_sklearn +patch_sklearn() # Replace sklearn algorithms with daal4py versions +``` + +## Model Builders (`mb/`) + +**Purpose**: Convert external ML models to oneDAL for faster inference + +**Supported Frameworks:** +```python +from daal4py.mb import convert_model + +# XGBoost/LightGBM/CatBoost → oneDAL +externalModel = xgb.XGBClassifier().fit(X, y) +d4p_model = convert_model(externalModel) + +# Use oneDAL for fast prediction +predictions = d4p_model.predict(X_test) +prob = d4p_model.predict_proba(X_test) +``` + +**Benefits**: 10-100x faster inference than original models + +### 3. Monkeypatch System (`sklearn/monkeypatch/`) + +**Purpose**: Original patching mechanism for scikit-learn replacement + +**Core Implementation** (`dispatcher.py:57-200`): +```python +@lru_cache(maxsize=None) +def _get_map_of_algorithms(): + mapping = { + "pca": [[(decomposition_module, "PCA", PCA_daal4py), None]], + "kmeans": [[(cluster_module, "KMeans", KMeans_daal4py), None]], + "dbscan": [[(cluster_module, "DBSCAN", DBSCAN_daal4py), None]], + # ... complete algorithm mapping + } + return mapping +``` + +**Patching Functions**: +- `patch_sklearn()`: Replace sklearn algorithms with daal4py versions +- `unpatch_sklearn()`: Restore original sklearn implementations +- `get_patch_map()`: Retrieve current algorithm mappings +- `enable_patching()`: Context-based patching control + +**Condition Checking**: +```python +def _daal4py_check_supported(estimator, method_name, *data): + # Check data characteristics (density, dtypes, shape) + # Check algorithm parameters + # Check oneDAL version compatibility + # Return boolean + condition chain +``` + +### 4. Model Builders (`mb/`) + +**Purpose**: Convert external ML library models to oneDAL for accelerated inference + +#### Tree-Based Models (`tree_based_builders.py`) + +**Supported Libraries**: +- **XGBoost**: Gradient boosting framework +- **LightGBM**: Microsoft gradient boosting +- **CatBoost**: Yandex gradient boosting +- **Treelite**: Universal tree model format + +**Implementation Pattern**: +```python +class GBTDAALModel(GBTDAALBaseModel): + def __init__(self, model): + # 1. Extract model parameters and structure + # 2. Convert to oneDAL tree format + # 3. Create oneDAL inference model + + def predict(self, X): + # Use oneDAL optimized prediction + + def predict_proba(self, X): + # Probabilistic predictions for classification +``` + +**Conversion Process**: +1. **Tree Extraction**: Parse external model tree structures +2. **Parameter Mapping**: Convert hyperparameters to oneDAL format +3. **Model Creation**: Build oneDAL gradient boosting model +4. **Validation**: Verify numerical equivalence with original model + +#### Logistic Regression Models (`logistic_regression_builders.py`) + +**Supported Sources**: +- sklearn LogisticRegression (binary/multinomial) +- sklearn SGDClassifier (with log loss) +- Direct coefficient specification + +**Features**: +- Binary and multinomial classification +- Coefficient and intercept preservation +- oneDAL optimized prediction pipeline + +### 5. Distributed Computing (SPMD) + +**Purpose**: Single Program Multiple Data parallel processing across multiple nodes + +**Implementation Location**: +- C++ Headers: `src/dist_*.h` files +- Examples: `examples/daal4py/*_spmd.py` + +**Architecture**: +```cpp +// C++ distributed computing framework (src/dist_custom.h) +template +class dist { + // MPI communication primitives + // Data serialization/deserialization + // Distributed algorithm coordination +}; +``` + +**Supported Algorithms**: +- **DBSCAN**: `dist_dbscan.h` - Distributed density clustering +- **K-Means**: `dist_kmeans.h` - Distributed centroid-based clustering +- **Linear Regression**: Distributed least squares +- **PCA**: Distributed principal component analysis +- **Covariance**: Distributed covariance matrix computation + +**SPMD Usage Pattern**: +```python +import daal4py as d4p + +# Initialize distributed backend +d4p.daalinit() + +# Distributed algorithm execution +result = algorithm.compute(local_data_chunk) + +# Finalize and collect results +d4p.daalfini() +``` + +**MPI Integration**: +- Automatic rank and size detection +- Efficient data distribution strategies +- Collective communication operations +- Fault tolerance and load balancing + +## Performance Optimization Strategies + +### 1. Memory Management + +**Zero-Copy Operations**: +- Direct NumPy array access via `make2d()` utility +- In-place data transformations where possible +- Efficient C++ ↔ Python data exchange + +**Memory Layout Optimization**: +```python +# Efficient data preparation (daal4py/sklearn/_utils.py) +def make2d(X): + if X.ndim == 1: + X = X.reshape(1, -1) + return np.ascontiguousarray(X, dtype=np.float64) +``` + +### 2. Algorithmic Optimizations + +**Solver Selection**: +- Analytical solutions for overdetermined systems +- Iterative methods for large-scale problems +- Specialized algorithms for sparse data + +**Parallel Execution**: +- Intel TBB threading for shared-memory parallelism +- MPI for distributed-memory parallelism +- Vectorization via Intel SIMD instructions + +### 3. Data Type Optimization + +**Precision Selection**: +```python +def getFPType(X): + """Determine optimal floating-point precision""" + if hasattr(X, 'dtype'): + if X.dtype == np.float32: + return "float" + else: + return "double" + return "double" # Default to double precision +``` + +### 4. Condition-Based Optimization + +**Patching Conditions** (Pattern across all algorithms): +```python +def _daal4py_supported(self, method_name, *data): + conditions = PatchingConditionsChain("daal4py.algorithm.method") + + # Data characteristics + conditions.and_condition(not sp.issparse(data[0]), "Sparse not supported") + conditions.and_condition(data[0].dtype in [np.float32, np.float64], "Invalid dtype") + + # Algorithm parameters + conditions.and_condition(self.metric == "euclidean", "Only euclidean metric") + conditions.and_condition(self.algorithm == "auto", "Algorithm must be auto") + + return conditions +``` + +## Integration Architecture + +### With oneDAL C++ Library + +**Direct Binding Layer**: +- Cython-based C++ wrapper generation +- Template instantiation for algorithm variants +- Exception handling and error propagation +- Memory management coordination + +**Algorithm Instantiation Pattern**: +```cpp +// C++ algorithm instantiation (generated via Cython) +daal::algorithms::dbscan::Batch algorithm; +algorithm.parameter.epsilon = eps; +algorithm.parameter.minObservations = min_samples; +algorithm.input.set(daal::algorithms::dbscan::data, numericTable); +daal::algorithms::dbscan::ResultPtr result = algorithm.compute(); +``` + +### With sklearnex Package + +**Layered Architecture**: +1. **sklearnex**: High-level API with device offloading +2. **daal4py**: Core algorithms and patching +3. **oneDAL**: Low-level optimized implementations + +**API Delegation**: +```python +# sklearnex delegates to daal4py for compatible cases +if _is_daal4py_supported(): + return daal4py_algorithm.fit(X, y) +else: + return sklearn_algorithm.fit(X, y) +``` + +### With External Libraries + +**Model Conversion Pipeline**: +```python +# XGBoost → oneDAL conversion example +def get_gbt_model_from_xgboost(xgb_model): + # 1. Extract XGBoost JSON representation + # 2. Parse tree structures and parameters + # 3. Convert to oneDAL tree format + # 4. Create oneDAL gradient boosting model + # 5. Return optimized prediction interface +``` + +## Error Handling and Fallbacks + +### Exception Management + +**oneDAL Error Handling**: +- C++ exception translation to Python +- Detailed error messages with context +- Graceful degradation to sklearn when possible + +**Common Error Patterns**: +```python +try: + result = daal4py_algorithm.compute(data) +except RuntimeError as e: + if "not supported" in str(e): + # Fallback to sklearn + return sklearn_algorithm.fit(X, y) + else: + raise +``` + +### Validation and Checks + +**Input Validation**: +- Data type and shape verification +- Parameter range checking +- Memory layout validation +- Feature name consistency + +**Compatibility Checking**: +- oneDAL version requirements +- Algorithm parameter support +- Hardware capability detection + +## Development Guidelines + +### Adding New Algorithms + +1. **Create Native Wrapper**: + ```python + def _daal_algorithm(X, y=None, **params): + # Convert inputs to oneDAL format + # Configure oneDAL algorithm + # Execute computation + # Convert results to expected format + ``` + +2. **Implement sklearn Interface**: + ```python + class Algorithm(sklearn_Algorithm): + def fit(self, X, y=None): + return self._daal_fit(X, y) + ``` + +3. **Add to Dispatcher**: + ```python + # Update monkeypatch/dispatcher.py + mapping["algorithm"] = [[(module, "Algorithm", Algorithm_daal4py), None]] + ``` + +4. **Create Tests**: + ```python + # Numerical accuracy tests + # Performance benchmarks + # Edge case validation + ``` + +### Performance Optimization Guidelines + +- **Minimize Data Copies**: Use views and in-place operations +- **Leverage oneDAL Optimizations**: Choose appropriate algorithms and parameters +- **Profile Memory Usage**: Monitor peak memory consumption +- **Validate Numerically**: Ensure mathematical correctness +- **Benchmark Performance**: Measure against sklearn baselines + +### Distributed Computing Guidelines + +- **Design for Scalability**: Consider communication overhead +- **Handle Data Distribution**: Implement efficient partitioning +- **Manage Dependencies**: Coordinate between nodes +- **Test at Scale**: Validate with realistic data sizes + +## File Location Reference + +### Core Implementation +- `daal4py/__init__.py:53-73` - Core binding imports and initialization +- `daal4py/sklearn/monkeypatch/dispatcher.py:57-200` - Algorithm mapping system +- `src/daal4py.cpp` - Main C++/Cython implementation +- `src/dist_*.h` - Distributed computing headers + +### Algorithm Examples +- `daal4py/sklearn/cluster/dbscan.py:35-56` - DBSCAN oneDAL integration +- `daal4py/sklearn/linear_model/_linear.py` - Linear regression implementation +- `daal4py/sklearn/decomposition/_pca.py` - PCA with oneDAL optimization + +### Model Builders +- `daal4py/mb/tree_based_builders.py:65-200` - GBT model conversion +- `daal4py/mb/logistic_regression_builders.py` - LogReg model conversion +- `daal4py/mb/gbt_convertors.py` - External library integration + +### Distributed Computing +- `examples/daal4py/*_spmd.py` - SPMD usage examples +- `src/dist_dbscan.h:28-100` - Distributed DBSCAN implementation +- `src/mpi/` - MPI communication layer + +## AI Agent Development Guidelines + +When working with daal4py, AI agents should: + +1. **Understand the Native API**: Recognize direct oneDAL algorithm access patterns +2. **Respect Performance Requirements**: Maintain zero-copy operations where possible +3. **Handle Distributed Computing**: Account for MPI coordination and data distribution +4. **Validate Numerically**: Ensure algorithmic correctness against sklearn +5. **Consider Memory Constraints**: Monitor memory usage in large-scale scenarios +6. **Test Across Platforms**: Validate on different hardware configurations +7. **Document Performance**: Clearly specify optimization benefits and limitations +8. **Maintain Compatibility**: Preserve sklearn API contracts and behavior + +The daal4py package represents the performance-critical foundation of the Intel Extension for Scikit-learn, providing both the algorithmic engine and the compatibility layer that enables seamless acceleration of existing scikit-learn workflows. \ No newline at end of file diff --git a/doc/AGENTS.md b/doc/AGENTS.md new file mode 100644 index 0000000000..93899a1727 --- /dev/null +++ b/doc/AGENTS.md @@ -0,0 +1,37 @@ +# AGENTS.md - Documentation (doc/) + +## Purpose +Sphinx-based documentation generation system for Intel Extension for Scikit-learn. + +## Key Files for Agents +- `doc/sources/conf.py` - Sphinx configuration with extensions +- `doc/build-doc.sh` - Documentation build automation +- `doc/sources/algorithms.rst` - Algorithm support matrix +- `doc/sources/daal4py.rst` - API reference with autodoc + +## Build System +- **Sphinx Extensions**: autodoc, nbsphinx, intersphinx, napoleon +- **Notebook Integration**: Jupyter notebooks included via nbsphinx +- **Cross-References**: Links to sklearn, numpy, pandas documentation +- **GitHub Pages**: Automated deployment on releases + +## Content Structure +- **User Guides**: Quick start, performance optimization +- **API Reference**: Auto-generated from docstrings +- **Examples**: Real-world applications (kaggle/, notebooks/) +- **Developer Docs**: Distributed computing, contribution guidelines + +## Build Commands +```bash +# Local development +make html + +# Production deployment +./build-doc.sh --gh-pages +``` + +## For AI Agents +- Use reStructuredText format for documentation +- Include proper docstrings for autodoc generation +- Test documentation builds locally before submitting +- Maintain cross-references and intersphinx links \ No newline at end of file diff --git a/examples/AGENTS.md b/examples/AGENTS.md new file mode 100644 index 0000000000..15725c369e --- /dev/null +++ b/examples/AGENTS.md @@ -0,0 +1,62 @@ +# AGENTS.md - Examples (examples/) + +## Purpose +113 Python scripts and 19 Jupyter notebooks demonstrating Intel Extension for Scikit-learn usage patterns. + +## Directory Structure +- `daal4py/` - Native oneDAL API examples (80+ scripts) +- `sklearnex/` - Accelerated sklearn examples (25+ scripts) +- `mb/` - Model builder examples (XGBoost/LightGBM/CatBoost conversion) +- `notebooks/` - Jupyter tutorials with real datasets +- `utils/` - Utility functions + +## Key Usage Patterns + +### Native oneDAL API +```python +import daal4py as d4p +algorithm = d4p.dbscan(epsilon=0.5, minObservations=5) +result = algorithm.compute(data) +``` + +### Accelerated sklearn +```python +from sklearnex import patch_sklearn +patch_sklearn() # All sklearn imports now accelerated +from sklearn.cluster import DBSCAN +``` + +### GPU Acceleration +```python +from sklearnex import config_context +with config_context(target_offload="gpu:0"): + model.fit(X, y) +``` + +### Distributed Computing +```python +import daal4py as d4p +d4p.daalinit() # Initialize MPI +# ... distributed computation +d4p.daalfini() # Cleanup +``` + +### Model Conversion +```python +from daal4py.mb import convert_model +d4p_model = convert_model(xgb_model) # 10-100x faster inference +``` + +## Algorithm Categories +- **Clustering**: DBSCAN, K-Means +- **Linear Models**: Linear/Ridge/Logistic regression +- **Ensemble**: Random Forest, Gradient boosting +- **Decomposition**: PCA, SVD +- **Statistics**: Moments, covariance +- **SVM**: Classification and regression + +## For AI Agents +- Use examples as templates for new implementations +- Follow established patterns for performance optimization +- Include both sklearn and oneDAL performance comparisons +- Test examples across CPU/GPU configurations \ No newline at end of file diff --git a/generator/AGENTS.md b/generator/AGENTS.md new file mode 100644 index 0000000000..3b24eba7b0 --- /dev/null +++ b/generator/AGENTS.md @@ -0,0 +1,80 @@ +# AGENTS.md - Code Generator (generator/) + +## Purpose +Automated code generation system that creates Python bindings for oneDAL algorithms through C++ header parsing and Jinja2 templates. + +## Key Files +- `gen_daal4py.py` - Main orchestrator (1274 lines) +- `parse.py` - C++ header parser (727 lines) +- `wrapper_gen.py` - Jinja2 template engine (1626 lines) +- `wrappers.py` - Algorithm metadata configuration (1028 lines) +- `format.py` - Type conversion utilities (287 lines) + +## Generation Pipeline +1. **Header Parsing**: Extract classes, enums, templates from oneDAL C++ headers +2. **Metadata Processing**: Filter algorithms, handle required parameters +3. **Template Generation**: Create Cython wrappers using Jinja2 templates +4. **Code Output**: Generate Python API with proper type conversion + +## Algorithm Configuration +```python +# Required parameters for algorithms +required = { + "algorithms::dbscan": [("epsilon", "fptype"), ("minObservations", "size_t")], + "algorithms::kmeans": [("nClusters", "size_t"), ("maxIterations", "size_t")], + # ... 40+ algorithm configurations +} +``` + +## Template System +- **Jinja2 Templates**: Generate consistent Cython wrappers +- **Type Mapping**: Python ↔ C++ type conversion +- **Error Handling**: Input validation and exception handling +- **Memory Management**: Proper C++ object lifecycle + +## When to Modify Generator vs Python Code + +### Modify Generator (`wrappers.py`) When: +```python +# Adding new oneDAL algorithms not yet wrapped +required = { + "algorithms::new_algorithm": [("param1", "size_t"), ("param2", "double")] +} + +# Changing algorithm parameter requirements +no_constructor = { + "algorithms::special_case": {"param": ["type", "default_value"]} +} +``` + +### Direct Python Implementation When: +- Adding sklearn interface compatibility layers +- Implementing parameter validation and conversion +- Creating custom error handling or fallback logic +- Adding utility functions that don't require C++ bindings + +### Build Process Integration +```bash +# Generator runs in stage 1 of 4-stage build (from INSTALL.md) +# 1. Creating C++ and Cython sources from oneDAL C++ headers +# 2. Building oneDAL Python interfaces via cmake and pybind11 +# 3. Running Cython on generated sources +# 4. Compiling and linking them + +# Force regeneration during development +python setup.py build_ext --inplace --force +``` + +### Debugging Generated Code +- Generated files appear in build directories +- Check `generated_sources/` for Cython output +- Use `print()` statements in `wrapper_gen.py` templates for debugging +- Template variables available: `{{ns}}`, `{{algo}}`, `{{args_decl}}`, etc. + +## For AI Agents +- Generator runs automatically during build +- Modify `wrappers.py` to add new algorithm configurations +- Templates in `wrapper_gen.py` handle code patterns +- Type mappings in `format.py` for new data types +- Test generation changes with `python setup.py build_ext --inplace --force` +- Use direct Python implementation for sklearn compatibility layers \ No newline at end of file diff --git a/onedal/AGENTS.md b/onedal/AGENTS.md new file mode 100644 index 0000000000..0bb09f3a3e --- /dev/null +++ b/onedal/AGENTS.md @@ -0,0 +1,54 @@ +# AGENTS.md - oneDAL Backend (onedal/) + +## Purpose +Low-level Python bindings to Intel oneDAL using pybind11, providing CPU/GPU execution and memory management. + +## Key Components +- `__init__.py` - Backend selection (DPC++/Host) +- `_config.py` - Thread-local configuration +- `_device_offload.py` - Device dispatch utilities +- `common/` - Core infrastructure and policies +- `datatypes/` - Data conversion (NumPy, SYCL USM, DLPack) +- Algorithm modules: `cluster/`, `linear_model/`, `decomposition/`, etc. + +## Backend System +```python +# Automatic backend selection +try: + import onedal._onedal_py_dpc # GPU backend +except ImportError: + import onedal._onedal_py_host # CPU backend +``` + +## Configuration +```python +from onedal import config_context + +# GPU acceleration +with config_context(target_offload="gpu:0"): + model.fit(X, y) + +# Auto device selection (default) +with config_context(target_offload="auto"): + model.fit(X, y) # Uses data location to choose device +``` + +## Data Conversion +- **NumPy**: Zero-copy conversion via `to_table()` +- **SYCL USM**: GPU memory sharing (`__sycl_usm_array_interface__`) +- **DLPack**: Cross-framework tensor exchange + +## Algorithm Categories +- **Clustering**: DBSCAN, K-Means +- **Linear Models**: Linear/Ridge/Logistic regression +- **Decomposition**: PCA, Incremental PCA +- **SVM**: SVC, SVR with kernel methods +- **Ensemble**: Random Forest +- **Statistics**: Basic statistics, covariance + +## For AI Agents +- Use `config_context` for device selection +- Prefer zero-copy operations with `to_table()` +- Handle CPU/GPU fallback gracefully +- Monitor memory usage on GPU +- Test across different device configurations \ No newline at end of file diff --git a/sklearnex/AGENTS.md b/sklearnex/AGENTS.md new file mode 100644 index 0000000000..dc0d3a0740 --- /dev/null +++ b/sklearnex/AGENTS.md @@ -0,0 +1,126 @@ +# AGENTS.md - sklearnex Package + +## Purpose +**Primary sklearn-compatible interface** with oneDAL acceleration + +## Core Files +- `dispatcher.py`: Patching system (`get_patch_map_core` line 36) +- `_config.py`: Configuration (`target_offload`, `allow_fallback_to_host`) +- `_device_offload.py`: Device dispatch (`dispatch` function line 72) +- `base.py`: oneDALEstimator base class + +## Usage Patterns + +**Global Patching:** +```python +from sklearnex import patch_sklearn +patch_sklearn() # All sklearn imports use oneDAL +from sklearn.cluster import DBSCAN # Now accelerated +``` + +**Selective Patching:** +```python +patch_sklearn(["DBSCAN", "KMeans"]) # Only specific algorithms +``` + +**Direct Import:** +```python +from sklearnex.cluster import DBSCAN # Always accelerated +``` + +**Status Check:** +```python +from sklearnex import sklearn_is_patched +print(sklearn_is_patched()) # True/False +``` + +## Configuration API + +**Device Control:** +```python +from sklearnex import config_context + +# GPU acceleration +with config_context(target_offload="gpu:0"): + model.fit(X, y) + +# Force CPU +with config_context(target_offload="cpu"): + model.fit(X, y) + +# Auto device selection +with config_context(target_offload="auto"): # Default + model.fit(X, y) +``` + +**Fallback Control:** +```python +# Allow CPU fallback when GPU fails +with config_context(allow_fallback_to_host=True): + model.fit(X_gpu, y_gpu) + +# Allow sklearn fallback when oneDAL fails +with config_context(allow_sklearn_after_onedal=True): + model.fit(X, y) +``` + +## Algorithm Support Conditions + +**Implementation Pattern:** +```python +class Algorithm(BaseAlgorithm, oneDALEstimator, _sklearn_Algorithm): + def _onedal_cpu_supported(self, method_name, *data): + # Check data types, parameters, etc. + return PatchingConditionsChain("sklearnex.algorithm") + + def _onedal_gpu_supported(self, method_name, *data): + # Check GPU-specific requirements + return PatchingConditionsChain("sklearnex.algorithm.gpu") +``` + +**Dispatch Flow:** +1. Check `_onedal_gpu_supported()` → Use GPU oneDAL +2. Check `_onedal_cpu_supported()` → Use CPU oneDAL +3. Fallback → Use original sklearn + +## Algorithm Categories + +**Supported Algorithms with oneDAL:** +- **Clustering**: DBSCAN, K-Means +- **Linear Models**: LogisticRegression, Ridge, LinearRegression +- **Ensemble**: RandomForestClassifier/Regressor +- **Decomposition**: PCA, IncrementalPCA +- **Neighbors**: KNeighborsClassifier/Regressor +- **SVM**: SVC, SVR, NuSVC, NuSVR + +**GPU Support Status:** +- **Full GPU**: DBSCAN, K-Means, PCA, KNeighbors +- **Limited GPU**: LogisticRegression (2024.1+), SVM +- **CPU Only**: RandomForest, Ridge, IncrementalPCA + +## Key Implementation Files +- `sklearnex/dispatcher.py:36` - `get_patch_map_core()` function +- `sklearnex/_device_offload.py:72` - `dispatch()` function +- `sklearnex/_config.py` - Configuration API +- `sklearnex/base.py` - oneDALEstimator base class + +## Distributed Computing (SPMD) +**Location**: `sklearnex/spmd/` +**Usage**: Same API, distributed across MPI nodes +```python +from sklearnex.spmd.cluster import DBSCAN # Distributed version +``` + +## Preview Features +**Activation**: `export SKLEARNEX_PREVIEW=1` +**Location**: `sklearnex/preview/` +**Content**: Experimental algorithms, enhanced covariance, advanced PCA + +## Error Handling +**Fallback Chain**: oneDAL GPU → oneDAL CPU → sklearn → Error + +**Common Fallback Triggers:** +- Sparse data (most algorithms don't support) +- Unsupported parameters +- GPU memory limits +- Wrong data types \ No newline at end of file diff --git a/src/AGENTS.md b/src/AGENTS.md new file mode 100644 index 0000000000..4a3504f1f5 --- /dev/null +++ b/src/AGENTS.md @@ -0,0 +1,49 @@ +# AGENTS.md - Core Implementation (src/) + +## Purpose +C++/Cython implementation providing direct Python bindings to Intel oneDAL with zero-overhead access, memory management, and distributed computing. + +## Key Files +- `daal4py.cpp/.h` - Main C++ interface and NumPy integration +- `npy4daal.h` - NumPy-oneDAL conversion utilities +- `gbt_model_builder.pyx` - Gradient boosting tree builder +- `gettree.pyx` - Tree visitor patterns (sklearn compatibility) +- `transceiver.h` - Communication abstraction for distributed computing +- `dist_*.h` - Distributed algorithm implementations (DBSCAN, K-Means) +- `pickling.h` - Serialization support + +## Core Features + +### Memory Management +```cpp +// Zero-copy NumPy integration with thread-safe reference counting +class NumpyDeleter : public daal::services::DeleterIface { + // GIL-protected cleanup of Python objects +}; +``` + +### Distributed Computing +```cpp +// MPI-based communication layer +class transceiver_iface { + virtual void gather(...) = 0; + virtual void bcast(...) = 0; + virtual void reduce_all(...) = 0; +}; +``` + +### Tree Model Building +```cython +# Cython interface for external model conversion +cdef class gbt_classification_model_builder: + def create_tree(self, n_nodes, class_label) + def add_split(self, feature_index, threshold) + def add_leaf(self, response, cover) +``` + +## For AI Agents +- src/ contains performance-critical C++/Cython code +- Use existing patterns for memory management (zero-copy, GIL protection) +- Distributed algorithms follow map-reduce patterns +- Model builders enable external framework integration (XGBoost→oneDAL) +- Maintain thread safety and cross-platform compatibility \ No newline at end of file diff --git a/tests/AGENTS.md b/tests/AGENTS.md new file mode 100644 index 0000000000..6f09f8d58d --- /dev/null +++ b/tests/AGENTS.md @@ -0,0 +1,117 @@ +# AGENTS.md - Testing Infrastructure (tests/) + +## Purpose +Comprehensive validation infrastructure ensuring numerical accuracy, performance compliance, and cross-platform reliability. + +## Key Test Modules +- `test_daal4py_examples.py` - Native daal4py algorithm validation +- `test_model_builders.py` - External framework integration (XGBoost/LightGBM) +- `test_daal4py_spmd_examples.py` - Distributed computing validation +- `test_estimators.py` - sklearn compatibility validation +- `test_npy.py` - NumPy data type validation +- `run_examples.py` - Cross-platform example execution +- `unittest_data/` - Reference datasets for validation + +## Validation Patterns + +### Numerical Accuracy +```python +# Standard tolerance for floating-point comparisons +np.testing.assert_allclose(actual, expected, atol=1e-05) + +# Matrix reconstruction validation (SVD/QR) +np.testing.assert_allclose(original, reconstructed) +``` + +### Model Builder Testing +```python +# XGBoost conversion accuracy +xgb_predictions = xgb_model.predict(X) +d4p_predictions = convert_model(xgb_model).predict(X) +np.testing.assert_allclose(xgb_predictions, d4p_predictions) +``` + +### Performance Validation +```python +# Execution time limits +@dataclass +class Config: + timeout_cpu_seconds: int = 170 # Default (verified in tests/test_daal4py_examples.py) + # Extended timeouts for complex algorithms +``` + +### Distributed Testing +```python +# MPI-aware testing with proper rank coordination +@unittest.skipUnless(MPI.COMM_WORLD.size > 1, "Not running in distributed mode") +def test_spmd_algorithm(self): + # Distributed algorithm validation +``` + +## Cross-Platform Support +- **OS Detection**: Windows, Linux, macOS compatibility +- **Device Requirements**: CPU/GPU availability checking +- **Dependency Management**: Graceful skipping for missing libraries + +## Test Execution Commands + +### Local Development Testing +```bash +# Complete test suite (verified in conda-recipe/run_test.sh) +pytest --verbose -s tests/ # Legacy/integration tests +pytest --verbose --pyargs daal4py # Native oneDAL API tests +pytest --verbose --pyargs sklearnex # sklearn compatibility tests +pytest --verbose --pyargs onedal # Low-level backend tests +pytest --verbose .ci/scripts/test_global_patch.py # Global patching validation + +# With coverage reporting +pytest --cov=onedal --cov=sklearnex --cov-config=.coveragerc --cov-branch +``` + +### Distributed (SPMD) Testing +```bash +# Requires MPI setup and NO_DIST!=1 +mpirun -n 4 python tests/helper_mpi_tests.py \ + pytest -k spmd --with-mpi --verbose --pyargs sklearnex + +mpirun -n 4 python tests/helper_mpi_tests.py \ + pytest --verbose -s tests/test_daal4py_spmd_examples.py +``` + +### Performance Validation +```python +# Timeout configuration patterns (from test_daal4py_examples.py) +@dataclass +class Config: + timeout_cpu_seconds: int = 170 # Default timeout + # Algorithm-specific overrides: + # - gradient_boosted_classification: 480s + # - complex algorithms: extended timeouts +``` + +### Dependencies and Platform Testing +```python +# Graceful dependency handling (from run_examples.py) +def has_deps(rule): + for rule_item in rule: + try: + importlib.import_module(rule_item) + except ImportError: + return False + return True + +# Platform detection +IS_WIN = plt.system() == "Windows" +IS_MAC = plt.system() == "Darwin" +IS_LIN = plt.system() == "Linux" +``` + +## For AI Agents +- Use `np.testing.assert_allclose(atol=1e-05)` for numerical validation +- Configure appropriate timeouts based on algorithm complexity +- Handle missing dependencies gracefully with `skipTest()` +- Test both sklearn compatibility and numerical accuracy +- Validate model conversion maintains prediction accuracy +- Run distributed tests with `mpirun -n 4` for SPMD algorithms +- Check hardware requirements before GPU tests +- Use coverage reporting for development validation \ No newline at end of file