diff --git a/.ci/AGENTS.md b/.ci/AGENTS.md
new file mode 100644
index 0000000000..7ff23cb67e
--- /dev/null
+++ b/.ci/AGENTS.md
@@ -0,0 +1,103 @@
+# AGENTS.md - CI/CD Infrastructure (.ci/)
+
+## Purpose
+CI/CD infrastructure for building, testing, and releasing Intel Extension for Scikit-learn across multiple platforms.
+
+## Key Files for Agents
+- `.ci/pipeline/ci.yml` - Main CI orchestrator
+- `.ci/pipeline/build-and-test-*.yml` - Platform-specific builds
+- `.ci/pipeline/linting.yml` - Code quality enforcement
+- `.ci/scripts/` - Automation utilities
+
+## Platform Support
+- **Linux/macOS**: Uses conda, Intel DPC++ compiler, MPI support
+- **Windows**: Visual Studio 2022, conda-forge packages
+- **GPU**: Intel GPU support via DPC++/SYCL (dpctl, dpnp packages)
+
+## Quality Gates
+- **Linting**: black, isort, clang-format, numpydoc validation
+- **Testing**: pytest with cross-platform compatibility
+- **Coverage**: codecov integration with threshold enforcement
+
+## Build Dependencies
+- **oneDAL**: Downloads nightly builds from upstream oneDAL repo
+- **Python**: Matrix testing across Python 3.9-3.13 (verified in .ci/pipeline/ci.yml)
+- **sklearn**: Multiple version compatibility (1.0-1.7)
+- **GPU Libraries**: dpctl, dpnp for Intel GPU acceleration
+
+## Release Process
+- **Automated**: Dynamic matrix generation for PyPI/conda releases
+- **Multi-channel**: Both PyPI wheels and conda packages
+- **Quality**: Automated sklearn compatibility testing before release
+
+## Local Development Setup
+
+### Quality Tools Configuration (from pyproject.toml)
+```bash
+# Code formatting
+black --line-length 90 <files>
+isort --profile black --line-length 90 <files>
+
+# C++ formatting
+clang-format --style=file <cpp_files>
+
+# Documentation validation
+numpydoc-validation <python_files>
+```
+
+### Build Dependencies Download
+```bash
+# oneDAL nightly builds (from .github/workflows/ci.yml)
+# Automatically downloads from uxlfoundation/oneDAL nightly builds
+# Sets DALROOT to downloaded oneDAL location
+```
+
+### Platform-Specific Build Commands
+
+**Linux/macOS** (from .ci/pipeline/build-and-test-lnx.yml):
+```bash
+# Install DPC++ compiler
+bash .ci/scripts/install_dpcpp.sh
+
+# Set up environment
+source /opt/intel/oneapi/compiler/latest/env/vars.sh
+export DPCPPROOT=/opt/intel/oneapi/compiler/latest
+
+# Create conda environment
+conda create -q -y -n CB -c conda-forge python=3.11 mpich pyyaml
+conda activate CB
+pip install -r dependencies-dev
+
+# Build
+./conda-recipe/build.sh
+```
+
+**Windows** (from .ci/pipeline/build-and-test-win.yml):
+```batch
+# Visual Studio setup
+call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall" x64
+
+# Build
+call conda-recipe\bld.bat
+```
+
+### Environment Variables for Development
+```bash
+# From setup.py and CI scripts
+export DALROOT=/path/to/onedal          # Required
+export DPCPPROOT=/opt/intel/oneapi/compiler/latest  # For GPU support
+export MPIROOT=/path/to/mpi             # For distributed computing
+export NO_DPC=1                         # Disable GPU support
+export NO_DIST=1                        # Disable distributed computing
+export SKLEARNEX_VERSION=2024.7.0       # Version override
+export MAKEFLAGS="-j$(nproc)"           # Parallel build
+```
+
+## For AI Agents
+- Follow established build templates
+- Respect quality gates (linting, testing, coverage)
+- Use platform-specific configurations appropriately
+- Test across supported Python/sklearn version combinations
+- Set required environment variables (DALROOT, DPCPPROOT, MPIROOT)
+- Use conda environments to avoid dependency conflicts
+- Run pre-commit hooks before submitting changes
\ No newline at end of file
diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml
index d78de976b8..e486f6e7bd 100644
--- a/.github/.licenserc.yaml
+++ b/.github/.licenserc.yaml
@@ -67,9 +67,11 @@ header:
     - '.github/CODEOWNERS'
     - '.github/Pull_Request_template.md'
     - '.github/renovate.json'
+    - '.github/instructions/*.md'
     # Specific files
     - 'setup.cfg'
     - 'LICENSE'
+    - 'AGENTS.md'
     # External copies of copyrighted work
     - 'onedal/datatypes/dlpack/dlpack.h'
   comment: never
diff --git a/.github/instructions/build-config.instructions.md b/.github/instructions/build-config.instructions.md
new file mode 100644
index 0000000000..d23ea797ae
--- /dev/null
+++ b/.github/instructions/build-config.instructions.md
@@ -0,0 +1,88 @@
+# Build Configuration Files
+
+## Core Build Files
+- `setup.py`: Main build script (500+ lines, complex configuration)
+- `pyproject.toml`: Python project metadata + linting configuration
+- `dependencies-dev`: Build-time dependencies (Cython, numpy, pybind11, cmake)
+- `requirements-test.txt`: Test dependencies with version constraints
+- `conda-recipe/meta.yaml`: Conda package build configuration
+
+## Environment Variables (Critical)
+```bash
+# MANDATORY for building
+export DALROOT=/path/to/onedal               # oneDAL installation path (required)
+
+# OPTIONAL but commonly needed
+export MPIROOT=/path/to/mpi                  # MPI for distributed features
+export NO_DIST=1                             # Disable distributed mode
+export NO_DPC=1                              # Disable GPU/SYCL support
+export NO_STREAM=1                           # Disable streaming mode
+export DEBUG_BUILD=1                         # Debug symbols + no optimization
+export MAKEFLAGS=-j$(nproc)                  # Parallel build threads
+```
+
+## Build Process (4 Stages)
+1. **Code Generation**: oneDAL C++ headers → Python/Cython sources
+2. **oneDAL Bindings**: cmake + pybind11 compilation
+3. **Cython Processing**: .pyx files → C++ sources
+4. **Final Compilation**: Link everything into Python extensions
+
+## Dependencies
+**Build Dependencies (dependencies-dev):**
+- Cython==3.1.1 (exact version required)
+- numpy>=2.0 (version varies by Python version)
+- pybind11==2.13.6
+- cmake==4.0.2
+- setuptools==79.0.1
+
+**Runtime Dependencies:**
+- Intel oneDAL 2021.1+ (backwards compatible)
+- numpy (version-specific, see requirements-test.txt)
+- scikit-learn 1.0-1.7 (see compatibility matrix)
+
+## Build Commands
+```bash
+# Development build (RECOMMENDED)
+python setup.py develop                       # Creates .egg-link, editable
+
+# Production builds
+python setup.py install                       # Full install
+python setup.py build_ext --inplace --force   # Extensions only
+
+# Special flags (Linux)
+python setup.py build --abs-rpath             # Absolute RPATH for custom oneDAL
+
+# Conda build
+conda build .                                 # Uses conda-recipe/meta.yaml
+```
+
+## Common Build Issues
+```bash
+# oneDAL not found
+RuntimeError: "Not set DALROOT variable"
+→ Solution: export DALROOT=/path/to/onedal
+
+# MPI required but missing
+ValueError: "'MPIROOT' is not set, cannot build with distributed mode"
+→ Solution: export NO_DIST=1 or set MPIROOT
+
+# Cython version mismatch
+→ Solution: pip install Cython==3.1.1 (exact version)
+
+# Linking issues (Linux)
+→ Solution: Use --abs-rpath flag
+```
+
+## CI/CD Configuration
+- **GitHub Actions**: `.github/workflows/ci.yml`
+- **Azure DevOps**: `.ci/pipeline/ci.yml` (main CI system)
+- **Pre-commit**: `.pre-commit-config.yaml` (code quality)
+
+Build timeouts: 120 minutes in CI (can be slow due to oneDAL compilation)
+
+## Related Instructions
+- `general.instructions.md` - Quick start build commands
+- `src.instructions.md` - C++/Cython build details
+- `tests.instructions.md` - Testing after successful builds
+
+For platform-specific build details, see `.ci/AGENTS.md`
\ No newline at end of file
diff --git a/.github/instructions/daal4py.instructions.md b/.github/instructions/daal4py.instructions.md
new file mode 100644
index 0000000000..8efdf6860d
--- /dev/null
+++ b/.github/instructions/daal4py.instructions.md
@@ -0,0 +1,49 @@
+# daal4py/* - Direct oneDAL Python Bindings
+
+## Purpose
+Direct Python bindings to Intel oneDAL for maximum performance and model builders for XGBoost/LightGBM conversion.
+
+## Three Sub-APIs
+1. **Native oneDAL**: `import daal4py as d4p` - Direct algorithm access
+2. **sklearn-compatible**: `from daal4py.sklearn import ...` - sklearn API with oneDAL backend
+3. **Model Builders**: `from daal4py.mb import convert_model` - External model conversion
+
+## API Overview
+
+For detailed native oneDAL patterns and model builders, see [daal4py/AGENTS.md](../daal4py/AGENTS.md).
+
+**Basic Pattern**:
+```python
+import daal4py as d4p
+algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
+result = algorithm.compute(data)
+```
+
+**Model Conversion**:
+```python
+from daal4py.mb import convert_model
+d4p_model = convert_model(xgb_model)  # 10-100x faster inference
+```
+
+## Testing
+```bash
+# Native daal4py tests
+pytest --verbose --pyargs daal4py
+pytest tests/test_daal4py_examples.py         # Native API examples
+pytest tests/test_model_builders.py           # Model conversion tests
+
+# sklearn compatibility in daal4py
+pytest daal4py/sklearn/tests/                 # sklearn-compatible API
+```
+
+## Development Notes
+- Native API provides direct oneDAL algorithm access (fastest performance)
+- sklearn-compatible API in `daal4py/sklearn/` maintains full sklearn compatibility
+- Model builders enable oneDAL inference for models trained with other frameworks
+
+## Related Instructions
+- `general.instructions.md` - Repository setup and build requirements
+- `onedal.instructions.md` - Low-level backend that daal4py wraps
+- `src.instructions.md` - Core C++/Cython implementation details
+- `tests.instructions.md` - Testing native oneDAL algorithms
+- See `daal4py/AGENTS.md` for detailed algorithm usage patterns
\ No newline at end of file
diff --git a/.github/instructions/general.instructions.md b/.github/instructions/general.instructions.md
new file mode 100644
index 0000000000..522bc5ec42
--- /dev/null
+++ b/.github/instructions/general.instructions.md
@@ -0,0 +1,58 @@
+# General Repository Instructions - Intel Extension for Scikit-learn
+
+## Repository Overview
+
+**Intel Extension for Scikit-learn** (scikit-learn-intelex) accelerates scikit-learn by 10-100x using Intel oneDAL. Zero code changes required for existing sklearn applications.
+
+- **Languages**: Python (70%), C++ (25%), Cython (5%)
+- **Architecture**: 4-layer system (sklearnex → daal4py → onedal → Intel oneDAL C++)
+- **Platforms**: Linux, Windows, macOS; CPU (x86_64, ARM), GPU (Intel via SYCL)
+- **Python**: 3.9-3.13 supported
+
+## Quick Start
+
+**Build Setup**: See [build-config.instructions.md](build-config.instructions.md) for complete details.
+```bash
+export DALROOT=/path/to/onedal
+python setup.py develop
+```
+
+**Testing**: See [tests.instructions.md](tests.instructions.md) for comprehensive testing.
+```bash
+pytest --verbose --pyargs sklearnex
+```
+
+**Code Quality**:
+```bash
+pre-commit run --all-files
+```
+
+## Code Standards
+
+- **Python**: Black (line-length=90) + isort
+- **C++**: clang-format version ≥14
+- **Commits**: Must be signed-off (`git commit -s`)
+- **Documentation**: numpydoc format
+
+## Common Issues & Solutions
+
+```bash
+# Build failures
+export NO_DIST=1                    # Disable distributed mode if MPI issues
+export NO_DPC=1                     # Disable GPU if driver issues
+python setup.py build_ext --inplace --force --abs-rpath  # Linux linking
+
+# Import/path issues
+export PYTHONPATH=$(pwd)            # Add repo to path
+python setup.py develop             # Ensure editable install
+```
+
+## Related Instructions
+- `sklearnex.instructions.md` - Primary sklearn interface and patching
+- `daal4py.instructions.md` - Direct oneDAL bindings and model builders
+- `onedal.instructions.md` - Low-level C++ bindings
+- `src.instructions.md` - Core C++/Cython implementation
+- `tests.instructions.md` - Testing infrastructure and validation
+- `build-config.instructions.md` - Build system and environment setup
+
+For detailed implementation guides, see the corresponding AGENTS.md files in each directory.
\ No newline at end of file
diff --git a/.github/instructions/onedal.instructions.md b/.github/instructions/onedal.instructions.md
new file mode 100644
index 0000000000..8ca8f7b303
--- /dev/null
+++ b/.github/instructions/onedal.instructions.md
@@ -0,0 +1,63 @@
+# onedal/* - Low-Level C++ Bindings
+
+## Purpose
+Pybind11-based C++ bindings providing the bridge between Python and Intel oneDAL C++ library.
+
+## Key Components
+- `datatypes/`: Memory management and array conversions (NumPy, SYCL USM, DLPack)
+- `common/`: Policy management, device selection, serialization
+- `*/`: Algorithm-specific implementations (cluster/, decomposition/, linear_model/, etc.)
+- `spmd/`: Distributed computing interfaces
+
+## Memory Management
+```python
+# Zero-copy conversions handled automatically
+import numpy as np
+from onedal.cluster import DBSCAN
+
+# NumPy arrays converted to oneDAL tables without copying
+X = np.random.random((1000, 10))
+model = DBSCAN().fit(X)  # Automatic NumPy → oneDAL conversion
+```
+
+## Device Context
+
+For comprehensive device management, see [onedal/AGENTS.md](../onedal/AGENTS.md).
+
+```python
+import dpctl
+with dpctl.device_context("gpu:0"):
+    model = DBSCAN().fit(X)
+```
+
+## Algorithm Structure
+- Each algorithm module follows consistent pattern:
+  - `fit()` method for training
+  - `predict()` method for inference (where applicable)
+  - Parameters match oneDAL C++ API
+  - Results as Python objects with named attributes
+
+## Testing
+```bash
+# Low-level onedal tests
+pytest onedal/tests/                           # Core functionality
+pytest onedal/datatypes/tests/                 # Memory management
+pytest onedal/common/tests/                    # Device/policy tests
+
+# Algorithm-specific tests
+pytest onedal/cluster/tests/test_dbscan.py     # DBSCAN implementation
+pytest onedal/linear_model/tests/              # Linear models
+```
+
+## Development Notes
+- Direct interface to oneDAL C++ API through pybind11
+- Handles memory management between Python/C++ automatically
+- Provides foundation for both daal4py and sklearnex layers
+- SPMD module enables distributed computing with MPI
+
+## Related Instructions
+- `general.instructions.md` - Repository setup and build requirements
+- `src.instructions.md` - C++/Cython implementation that uses onedal
+- `sklearnex.instructions.md` - High-level layer built on onedal
+- `daal4py.instructions.md` - Alternative interface to onedal
+- See `onedal/AGENTS.md` for detailed technical implementation
\ No newline at end of file
diff --git a/.github/instructions/sklearnex.instructions.md b/.github/instructions/sklearnex.instructions.md
new file mode 100644
index 0000000000..db0afc1776
--- /dev/null
+++ b/.github/instructions/sklearnex.instructions.md
@@ -0,0 +1,55 @@
+# sklearnex/* - Primary sklearn-compatible Interface
+
+## Purpose
+Primary user interface for sklearn acceleration with patching system and device offloading.
+
+## Key Files & Functions
+- `dispatcher.py`: Patching system (`get_patch_map_core` line 36)
+- `_device_offload.py`: GPU/CPU dispatch (`dispatch` function line 72)
+- `_config.py`: Global configuration (target_offload, allow_fallback_to_host)
+- `base.py`: oneDALEstimator base class for all accelerated algorithms
+
+## Usage Patterns
+
+**Global Patching (Most Common):**
+```python
+from sklearnex import patch_sklearn
+patch_sklearn()                      # All sklearn imports now accelerated
+from sklearn.cluster import DBSCAN   # Uses oneDAL implementation
+```
+
+**Selective Patching:**
+```python
+patch_sklearn(["DBSCAN", "KMeans"])  # Only specific algorithms
+```
+
+**Direct Import (No Patching):**
+```python
+from sklearnex.cluster import DBSCAN  # Always oneDAL implementation
+```
+
+**Device Control**: See [sklearnex/AGENTS.md](../sklearnex/AGENTS.md) for comprehensive device configuration.
+```python
+from sklearnex import config_context
+with config_context(target_offload="gpu:0"):
+    model.fit(X, y)
+```
+
+## Testing
+```bash
+# sklearnex-specific tests
+pytest --verbose --pyargs sklearnex
+pytest sklearnex/tests/test_patching.py       # Core patching functionality
+pytest sklearnex/tests/test_config.py         # Configuration system
+```
+
+## Development Notes
+- All sklearn-compatible algorithms inherit from `base.oneDALEstimator`
+- Fallback to original sklearn if oneDAL implementation unavailable
+- Device offloading requires Intel GPU drivers and SYCL runtime
+
+## Related Instructions
+- `general.instructions.md` - Repository setup and build requirements
+- `onedal.instructions.md` - Low-level backend that sklearnex uses
+- `tests.instructions.md` - Testing the sklearn compatibility layer
+- See `sklearnex/AGENTS.md` for detailed module information
\ No newline at end of file
diff --git a/.github/instructions/src.instructions.md b/.github/instructions/src.instructions.md
new file mode 100644
index 0000000000..fb3ab01086
--- /dev/null
+++ b/.github/instructions/src.instructions.md
@@ -0,0 +1,63 @@
+# src/* - Core C++/Cython Implementation
+
+## Purpose
+Core C++/Cython implementation layer providing the foundation for the entire stack.
+
+## Key Files
+- `daal4py.cpp`: Main Cython interface to oneDAL
+- `daal4py.h`: C++ headers and type definitions
+- `*_builder.pyx`: Model builder implementations (XGBoost, LightGBM conversion)
+- `gettree.pyx`: Tree model extraction utilities
+- `mpi/`: Distributed computing infrastructure
+
+## Architecture
+- **Cython Interface**: `daal4py.cpp` provides Python↔C++ bridge
+- **Memory Management**: `npy4daal.h` handles NumPy array conversions
+- **Distributed Computing**: MPI-based implementations in `mpi/`
+- **Model Builders**: Cython implementations for external model conversion
+
+## Build Process
+1. **Code Generation**: Python scripts generate C++ from oneDAL headers
+2. **Cython Compilation**: `.pyx` files compiled to C++
+3. **C++ Compilation**: Link with oneDAL libraries
+4. **Extension Creation**: Python extension modules
+
+## Development Workflow
+
+See [build-config.instructions.md](build-config.instructions.md) for environment setup.
+
+```bash
+# Rebuild after C++/Cython changes
+python setup.py build_ext --inplace --force
+```
+
+## MPI/Distributed Features
+- Located in `src/mpi/`
+- Requires MPI installation (`MPIROOT` environment variable)
+- Enable with `mpi4py` for distributed sklearn operations
+- Disable with `NO_DIST=1` if MPI unavailable
+
+## Testing
+```bash
+# Test distributed features (requires MPI)
+mpirun -n 2 python -m pytest tests/test_daal4py_spmd_examples.py
+
+# Test model builders
+pytest tests/test_model_builders.py
+
+# Test core functionality
+pytest tests/test_daal4py_serialization.py
+```
+
+## Development Notes
+- No incremental compilation - full rebuild required for changes
+- Use `ccache` for faster development builds
+- ASan builds supported for debugging (see INSTALL.md)
+- C++ code must follow clang-format style
+
+## Related Instructions
+- `general.instructions.md` - Repository setup and build requirements
+- `build-config.instructions.md` - Build system and compilation details
+- `onedal.instructions.md` - Python bindings that src/ implements
+- `daal4py.instructions.md` - Higher-level API built on src/
+- See `src/AGENTS.md` for detailed implementation guides
\ No newline at end of file
diff --git a/.github/instructions/tests.instructions.md b/.github/instructions/tests.instructions.md
new file mode 100644
index 0000000000..6f57ce7116
--- /dev/null
+++ b/.github/instructions/tests.instructions.md
@@ -0,0 +1,84 @@
+# tests/* - Testing Infrastructure
+
+## Test Structure
+- `tests/`: Legacy daal4py tests and examples
+- Individual module tests in respective directories (sklearnex/tests/, onedal/tests/, etc.)
+- `deselected_tests.yaml`: Tests skipped in CI due to platform/dependency issues
+
+## Test Execution Order (CRITICAL)
+
+**Preparation**:
+```bash
+pip install -r requirements-test.txt
+```
+
+**Core Test Suites** (run in order):
+```bash
+pytest --verbose -s tests/                    # Legacy daal4py tests
+pytest --verbose --pyargs daal4py             # Native oneDAL API tests
+pytest --verbose --pyargs sklearnex           # sklearn compatibility tests
+```
+
+**Specific Categories**:
+```bash
+pytest tests/test_daal4py_examples.py         # Native API examples
+pytest tests/test_model_builders.py           # XGBoost/LightGBM conversion
+pytest tests/test_daal4py_spmd_examples.py    # Distributed computing (requires MPI)
+```
+
+## Test Configuration
+```bash
+# Environment for testing
+export COVERAGE_RCFILE=$(readlink -f .coveragerc)  # Coverage configuration
+export NO_DIST=1                              # Disable distributed tests
+export NO_DPC=1                               # Disable GPU tests
+
+# Memory-intensive tests may require >8GB RAM
+# GPU tests require Intel GPU + drivers
+# Distributed tests require MPI setup (mpirun -n 2 pytest ...)
+```
+
+## Test Categories
+
+**Core Functionality:**
+- `test_daal4py_examples.py`: Native oneDAL algorithm usage
+- `test_estimators.py`: Algorithm parameter validation
+- `test_printing.py`: Output formatting and verbose mode
+
+**Compatibility:**
+- `test_examples_sklearnex.py`: sklearn compatibility validation
+- `test_npy.py`: NumPy array handling
+
+**Advanced Features:**
+- `test_model_builders.py`: External model conversion (XGBoost/LightGBM/CatBoost)
+- `test_daal4py_serialization.py`: Model save/load functionality
+- `test_daal4py_spmd_examples.py`: Distributed computing with MPI
+
+## Deselected Tests
+Tests in `deselected_tests.yaml` are skipped in CI due to:
+- Platform-specific issues (Windows/Linux differences)
+- Hardware requirements (GPU, specific CPU features)
+- External dependencies (MPI, specific library versions)
+- Memory constraints (large dataset tests)
+
+## Development Testing
+```bash
+# Quick development tests (subset)
+pytest tests/test_estimators.py               # Parameter validation
+pytest sklearnex/tests/test_patching.py       # Core patching
+
+# Memory/performance tests
+pytest --maxfail=1 tests/                     # Stop on first failure
+
+# Coverage testing
+pytest --cov=sklearnex --cov=daal4py --cov=onedal
+```
+
+## Related Instructions
+- `general.instructions.md` - Repository setup and core testing commands
+- `sklearnex.instructions.md` - Testing sklearn compatibility layer
+- `daal4py.instructions.md` - Testing native oneDAL algorithms
+- `onedal.instructions.md` - Testing low-level bindings
+- `src.instructions.md` - Testing C++/Cython core and distributed features
+
+See individual module AGENTS.md files for module-specific testing details.
\ No newline at end of file
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000000..4153815ffe
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,213 @@
+# AGENTS.md - Intel Extension for Scikit-learn
+
+## Quick Context
+- **Purpose**: Accelerate scikit-learn using Intel oneDAL optimizations
+- **License**: Apache 2.0
+- **Languages**: Python, C++, Cython
+- **Platforms**: CPU (x86_64, ARM), GPU (Intel via SYCL)
+
+## Architecture (4 Layers)
+```
+User Apps → sklearnex/ → daal4py/ → onedal/ → Intel oneDAL C++
+```
+
+**Key Layer Functions:**
+- `sklearnex/`: sklearn API compatibility + patching
+- `daal4py/`: Direct oneDAL access + model builders
+- `onedal/`: Pybind11 bindings + memory management
+- `src/`: C++/Cython core implementation
+
+## Entry Points by Use Case
+
+**For sklearn acceleration:**
+```python
+from sklearnex import patch_sklearn; patch_sklearn()
+# OR direct import
+from sklearnex.cluster import DBSCAN
+```
+
+**For native oneDAL performance:**
+```python
+import daal4py as d4p
+algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
+```
+
+**For model conversion:**
+```python
+from daal4py.mb import convert_model
+d4p_model = convert_model(xgb_model)  # XGBoost→oneDAL
+```
+
+## Accelerated Algorithms
+- **Clustering**: DBSCAN, K-Means
+- **Classification**: SVM, RandomForest, LogisticRegression, NaiveBayes
+- **Regression**: LinearRegression, Ridge, Lasso, ElasticNet, SVR
+- **Decomposition**: PCA, IncrementalPCA
+- **Neighbors**: KNeighbors (classification/regression)
+- **Preprocessing**: Scalers, normalizers
+
+## Device Configuration
+```python
+from sklearnex import config_context
+
+# GPU offloading
+with config_context(target_offload="gpu:0"):
+    model.fit(X, y)
+
+# Force CPU
+with config_context(target_offload="cpu"):
+    model.fit(X, y)
+```
+
+## Performance Patterns
+- **Memory**: Zero-copy NumPy↔oneDAL, SYCL USM for GPU
+- **Parallelism**: Intel TBB threading, MPI distributed, SIMD vectorization
+- **Fallbacks**: oneDAL → sklearn → error cascade
+
+## Key Files for AI Agents
+- `sklearnex/dispatcher.py`: Patching system (line 36: `get_patch_map_core`)
+- `sklearnex/_device_offload.py`: Device dispatch (line 72: `dispatch`)
+- `onedal/__init__.py`: Backend selection
+- `daal4py/__init__.py`: Native API entry
+- `src/`: C++/Cython core (distributed computing, memory management)
+
+## Development Environment Setup
+
+### Prerequisites
+- **Python**: 3.9-3.13 (verified in setup.py classifiers and README.md badges)
+- **oneDAL**: 2021.1+ (backwards compatible, verified in INSTALL.md)
+- **Dependencies**: Cython==3.1.1, Jinja2==3.1.6, numpy>=2.0.1, pybind11==2.13.6, cmake==4.0.2 (verified in dependencies-dev file)
+
+### Build Commands
+```bash
+# Development setup
+pip install -r dependencies-dev  # Verified: contains Cython, Jinja2, numpy, pybind11, cmake
+export DALROOT=/path/to/onedal    # Required (verified in setup.py:53-59)
+export MPIROOT=/path/to/mpi       # For distributed support (verified in setup.py:95-100)
+python setup.py develop           # Development mode
+
+# Environment options
+export NO_DPC=1                 # Disable GPU support
+export NO_DIST=1                # Disable distributed computing
+export NO_STREAM=1              # Disable streaming mode
+```
+
+### Testing Strategy
+```bash
+# Core test suites (from conda-recipe/run_test.sh)
+pytest --verbose -s tests/                    # Legacy tests
+pytest --verbose --pyargs daal4py            # Native oneDAL tests
+pytest --verbose --pyargs sklearnex          # sklearn compatibility
+pytest --verbose --pyargs onedal             # Low-level backend
+pytest --verbose .ci/scripts/test_global_patch.py  # Global patching
+
+# Distributed testing (requires MPI)
+mpirun -n 4 python tests/helper_mpi_tests.py pytest -k spmd --with-mpi --pyargs sklearnex
+```
+
+## Performance Expectations
+
+### Benchmarked Speedups
+- **General**: 10-100X acceleration (verified in README.md)
+- **Training**: Up to 100x speedup mentioned in README.md
+- **Inference**: Significant speedup, model builders claim 10-100x for converted models
+- **Range**: 1-3 orders of magnitude improvement depending on algorithm/dataset
+- **Note**: Specific 27x/36x figures not found in current codebase, general 10-100X claims verified
+
+### Algorithm Support Decision Matrix
+
+**oneDAL Acceleration Criteria** (verified in sklearnex/cluster/dbscan.py:108-138):
+```python
+def _onedal_supported(self, method_name, *data):
+    # Data requirements (verified in DBSCAN implementation)
+    - Dense data only (not sp.issparse(X))
+    - Supported dtypes: float32, float64
+    - Contiguous memory layout preferred
+
+    # Algorithm-specific constraints (verified in actual code)
+    - DBSCAN: algorithm in ["auto", "brute"], metric="euclidean" or "minkowski" with p=2
+    - Parameter compatibility checks via PatchingConditionsChain
+```
+
+**GPU Support Status** (from sklearnex/AGENTS.md):
+- **Full GPU**: DBSCAN, K-Means, PCA, KNeighbors
+- **Limited GPU**: LogisticRegression (2024.1+), SVM
+- **CPU Only**: RandomForest, Ridge, IncrementalPCA
+
+### Error Handling and Fallback Strategy
+
+**Fallback Chain** (verified in onedal/_config.py:45-50):
+```python
+# Configuration controls fallback behavior
+_default_global_config = {
+    "target_offload": "auto",               # Auto device selection
+    "allow_fallback_to_host": False,        # GPU → CPU fallback
+    "allow_sklearn_after_onedal": True,     # oneDAL → sklearn fallback
+    "use_raw_input": False,                 # Raw input usage
+}
+```
+
+**Fallback Triggers**:
+1. **Unsupported data**: Sparse matrices, unsupported dtypes
+2. **Unsupported parameters**: Algorithm-specific limitations
+3. **Hardware constraints**: GPU memory limits, device unavailability
+4. **Runtime errors**: oneDAL computation failures
+
+### Memory Management Patterns
+
+**Critical Requirements** (from sklearnex/utils/validation.py):
+```python
+# oneDAL requires contiguous data - copying avoided for performance
+def _onedal_supported_format(X, xp):
+    return is_contiguous(X)  # C-contiguous preferred
+```
+
+**Data Layout**:
+- **Contiguous arrays**: Required for zero-copy operations
+- **Data types**: float32/float64 preferred, automatic conversion when needed
+- **Memory layout**: C-contiguous > Fortran-contiguous > non-contiguous
+
+### GPU Hardware Requirements
+
+**Supported Intel GPUs**:
+- **Integrated**: Intel UHD Graphics, Intel Iris Xe
+- **Discrete**: Intel Arc A370M, Arc B580, Arc series
+- **Requirements**: SYCL/DPC++ support, Intel oneAPI toolkit
+- **Memory**: Unified Shared Memory (USM) support for zero-copy operations
+
+### Version Compatibility
+
+**Supported Versions** (verified in README.md badges and setup.py):
+- **Python**: 3.9, 3.10, 3.11, 3.12, 3.13 (verified in setup.py:609-613)
+- **scikit-learn**: 1.0, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 (verified in README.md badge)
+- **oneDAL**: 2021.1+ (backwards compatible only, verified in INSTALL.md)
+
+### Code Generation vs Manual Implementation
+
+**When to use generator/** (from INSTALL.md build process):
+1. **Automatic**: C++ headers → Python bindings (stage 1 of 4-stage build)
+2. **Manual Python**: Direct sklearn interface implementations
+3. **Generator changes**: Required for new oneDAL algorithms not yet wrapped
+4. **Python changes**: Sufficient for parameter handling, validation, sklearn compatibility
+
+### SPMD (Distributed) Usage Guidelines
+
+**When to use SPMD** (from tests/helper_mpi_tests.py, conda-recipe/run_test.sh):
+- **Large datasets**: When single-node memory insufficient
+- **Supported algorithms**: DBSCAN, K-Means, PCA, Linear Regression
+- **Setup**: Requires MPI (Intel MPI or OpenMPI), mpi4py
+- **Testing**: `mpirun -n 4` for validation
+
+**MPI Requirements** (from setup.py):
+```python
+mpi_root = os.environ.get("MPIROOT", os.environ.get("I_MPI_ROOT"))
+# Required unless NO_DIST=1
+```
+
+## Component Documentation
+- `sklearnex/AGENTS.md`: API patterns, device offloading
+- `daal4py/AGENTS.md`: Native oneDAL bindings, model builders
+- `onedal/AGENTS.md`: Pybind11 implementation, memory management
+- `src/AGENTS.md`: C++/Cython core, distributed computing
+- `examples/AGENTS.md`: Usage patterns (113 scripts, 19 notebooks)
+- `tests/AGENTS.md`: Testing infrastructure, validation patterns
\ No newline at end of file
diff --git a/daal4py/AGENTS.md b/daal4py/AGENTS.md
new file mode 100644
index 0000000000..306a66ab6b
--- /dev/null
+++ b/daal4py/AGENTS.md
@@ -0,0 +1,433 @@
+# AGENTS.md - daal4py Package
+
+## Purpose
+**Direct Python bindings to Intel oneDAL** for maximum performance
+
+## Two APIs
+1. **Native oneDAL**: `import daal4py as d4p`
+2. **sklearn-compatible**: `from daal4py.sklearn import ...`
+3. **Model Builders**: `from daal4py.mb import convert_model`
+
+## Native oneDAL API Usage
+
+**Basic Pattern:**
+```python
+import daal4py as d4p
+import numpy as np
+
+# Create algorithm
+algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
+
+# Run computation
+result = algorithm.compute(data)
+
+# Access results
+cluster_labels = result.assignments
+core_indices = result.coreIndices
+```
+
+**Common Algorithms:**
+```python
+# Clustering
+d4p.dbscan(epsilon=0.5, minObservations=5)
+d4p.kmeans(nClusters=3, maxIterations=300)
+
+# Decomposition
+d4p.pca(method="defaultDense")
+d4p.svd(method="defaultDense")
+
+# Linear Models
+d4p.linear_regression_training()
+d4p.ridge_regression_training(ridgeParameters=1.0)
+```
+
+## sklearn-Compatible API
+
+**Usage:**
+```python
+from daal4py.sklearn.cluster import DBSCAN
+from daal4py.sklearn.linear_model import Ridge
+
+# Use like normal sklearn
+clusterer = DBSCAN(eps=0.5, min_samples=5)
+labels = clusterer.fit_predict(X)
+```
+
+**Patching System:**
+```python
+from daal4py.sklearn.monkeypatch import patch_sklearn
+patch_sklearn()  # Replace sklearn algorithms with daal4py versions
+```
+
+## Model Builders (`mb/`)
+
+**Purpose**: Convert external ML models to oneDAL for faster inference
+
+**Supported Frameworks:**
+```python
+from daal4py.mb import convert_model
+
+# XGBoost/LightGBM/CatBoost → oneDAL
+externalModel = xgb.XGBClassifier().fit(X, y)
+d4p_model = convert_model(externalModel)
+
+# Use oneDAL for fast prediction
+predictions = d4p_model.predict(X_test)
+prob = d4p_model.predict_proba(X_test)
+```
+
+**Benefits**: 10-100x faster inference than original models
+
+### 3. Monkeypatch System (`sklearn/monkeypatch/`)
+
+**Purpose**: Original patching mechanism for scikit-learn replacement
+
+**Core Implementation** (`dispatcher.py:57-200`):
+```python
+@lru_cache(maxsize=None)
+def _get_map_of_algorithms():
+    mapping = {
+        "pca": [[(decomposition_module, "PCA", PCA_daal4py), None]],
+        "kmeans": [[(cluster_module, "KMeans", KMeans_daal4py), None]],
+        "dbscan": [[(cluster_module, "DBSCAN", DBSCAN_daal4py), None]],
+        # ... complete algorithm mapping
+    }
+    return mapping
+```
+
+**Patching Functions**:
+- `patch_sklearn()`: Replace sklearn algorithms with daal4py versions
+- `unpatch_sklearn()`: Restore original sklearn implementations
+- `get_patch_map()`: Retrieve current algorithm mappings
+- `enable_patching()`: Context-based patching control
+
+**Condition Checking**:
+```python
+def _daal4py_check_supported(estimator, method_name, *data):
+    # Check data characteristics (density, dtypes, shape)
+    # Check algorithm parameters
+    # Check oneDAL version compatibility
+    # Return boolean + condition chain
+```
+
+### 4. Model Builders (`mb/`)
+
+**Purpose**: Convert external ML library models to oneDAL for accelerated inference
+
+#### Tree-Based Models (`tree_based_builders.py`)
+
+**Supported Libraries**:
+- **XGBoost**: Gradient boosting framework
+- **LightGBM**: Microsoft gradient boosting
+- **CatBoost**: Yandex gradient boosting
+- **Treelite**: Universal tree model format
+
+**Implementation Pattern**:
+```python
+class GBTDAALModel(GBTDAALBaseModel):
+    def __init__(self, model):
+        # 1. Extract model parameters and structure
+        # 2. Convert to oneDAL tree format
+        # 3. Create oneDAL inference model
+
+    def predict(self, X):
+        # Use oneDAL optimized prediction
+
+    def predict_proba(self, X):
+        # Probabilistic predictions for classification
+```
+
+**Conversion Process**:
+1. **Tree Extraction**: Parse external model tree structures
+2. **Parameter Mapping**: Convert hyperparameters to oneDAL format
+3. **Model Creation**: Build oneDAL gradient boosting model
+4. **Validation**: Verify numerical equivalence with original model
+
+#### Logistic Regression Models (`logistic_regression_builders.py`)
+
+**Supported Sources**:
+- sklearn LogisticRegression (binary/multinomial)
+- sklearn SGDClassifier (with log loss)
+- Direct coefficient specification
+
+**Features**:
+- Binary and multinomial classification
+- Coefficient and intercept preservation
+- oneDAL optimized prediction pipeline
+
+### 5. Distributed Computing (SPMD)
+
+**Purpose**: Single Program Multiple Data parallel processing across multiple nodes
+
+**Implementation Location**:
+- C++ Headers: `src/dist_*.h` files
+- Examples: `examples/daal4py/*_spmd.py`
+
+**Architecture**:
+```cpp
+// C++ distributed computing framework (src/dist_custom.h)
+template <typename T1, typename T2>
+class dist {
+    // MPI communication primitives
+    // Data serialization/deserialization
+    // Distributed algorithm coordination
+};
+```
+
+**Supported Algorithms**:
+- **DBSCAN**: `dist_dbscan.h` - Distributed density clustering
+- **K-Means**: `dist_kmeans.h` - Distributed centroid-based clustering
+- **Linear Regression**: Distributed least squares
+- **PCA**: Distributed principal component analysis
+- **Covariance**: Distributed covariance matrix computation
+
+**SPMD Usage Pattern**:
+```python
+import daal4py as d4p
+
+# Initialize distributed backend
+d4p.daalinit()
+
+# Distributed algorithm execution
+result = algorithm.compute(local_data_chunk)
+
+# Finalize and collect results
+d4p.daalfini()
+```
+
+**MPI Integration**:
+- Automatic rank and size detection
+- Efficient data distribution strategies
+- Collective communication operations
+- Fault tolerance and load balancing
+
+## Performance Optimization Strategies
+
+### 1. Memory Management
+
+**Zero-Copy Operations**:
+- Direct NumPy array access via `make2d()` utility
+- In-place data transformations where possible
+- Efficient C++ ↔ Python data exchange
+
+**Memory Layout Optimization**:
+```python
+# Efficient data preparation (daal4py/sklearn/_utils.py)
+def make2d(X):
+    if X.ndim == 1:
+        X = X.reshape(1, -1)
+    return np.ascontiguousarray(X, dtype=np.float64)
+```
+
+### 2. Algorithmic Optimizations
+
+**Solver Selection**:
+- Analytical solutions for overdetermined systems
+- Iterative methods for large-scale problems
+- Specialized algorithms for sparse data
+
+**Parallel Execution**:
+- Intel TBB threading for shared-memory parallelism
+- MPI for distributed-memory parallelism
+- Vectorization via Intel SIMD instructions
+
+### 3. Data Type Optimization
+
+**Precision Selection**:
+```python
+def getFPType(X):
+    """Determine optimal floating-point precision"""
+    if hasattr(X, 'dtype'):
+        if X.dtype == np.float32:
+            return "float"
+        else:
+            return "double"
+    return "double"  # Default to double precision
+```
+
+### 4. Condition-Based Optimization
+
+**Patching Conditions** (Pattern across all algorithms):
+```python
+def _daal4py_supported(self, method_name, *data):
+    conditions = PatchingConditionsChain("daal4py.algorithm.method")
+
+    # Data characteristics
+    conditions.and_condition(not sp.issparse(data[0]), "Sparse not supported")
+    conditions.and_condition(data[0].dtype in [np.float32, np.float64], "Invalid dtype")
+
+    # Algorithm parameters
+    conditions.and_condition(self.metric == "euclidean", "Only euclidean metric")
+    conditions.and_condition(self.algorithm == "auto", "Algorithm must be auto")
+
+    return conditions
+```
+
+## Integration Architecture
+
+### With oneDAL C++ Library
+
+**Direct Binding Layer**:
+- Cython-based C++ wrapper generation
+- Template instantiation for algorithm variants
+- Exception handling and error propagation
+- Memory management coordination
+
+**Algorithm Instantiation Pattern**:
+```cpp
+// C++ algorithm instantiation (generated via Cython)
+daal::algorithms::dbscan::Batch<float, daal::algorithms::dbscan::defaultDense> algorithm;
+algorithm.parameter.epsilon = eps;
+algorithm.parameter.minObservations = min_samples;
+algorithm.input.set(daal::algorithms::dbscan::data, numericTable);
+daal::algorithms::dbscan::ResultPtr result = algorithm.compute();
+```
+
+### With sklearnex Package
+
+**Layered Architecture**:
+1. **sklearnex**: High-level API with device offloading
+2. **daal4py**: Core algorithms and patching
+3. **oneDAL**: Low-level optimized implementations
+
+**API Delegation**:
+```python
+# sklearnex delegates to daal4py for compatible cases
+if _is_daal4py_supported():
+    return daal4py_algorithm.fit(X, y)
+else:
+    return sklearn_algorithm.fit(X, y)
+```
+
+### With External Libraries
+
+**Model Conversion Pipeline**:
+```python
+# XGBoost → oneDAL conversion example
+def get_gbt_model_from_xgboost(xgb_model):
+    # 1. Extract XGBoost JSON representation
+    # 2. Parse tree structures and parameters
+    # 3. Convert to oneDAL tree format
+    # 4. Create oneDAL gradient boosting model
+    # 5. Return optimized prediction interface
+```
+
+## Error Handling and Fallbacks
+
+### Exception Management
+
+**oneDAL Error Handling**:
+- C++ exception translation to Python
+- Detailed error messages with context
+- Graceful degradation to sklearn when possible
+
+**Common Error Patterns**:
+```python
+try:
+    result = daal4py_algorithm.compute(data)
+except RuntimeError as e:
+    if "not supported" in str(e):
+        # Fallback to sklearn
+        return sklearn_algorithm.fit(X, y)
+    else:
+        raise
+```
+
+### Validation and Checks
+
+**Input Validation**:
+- Data type and shape verification
+- Parameter range checking
+- Memory layout validation
+- Feature name consistency
+
+**Compatibility Checking**:
+- oneDAL version requirements
+- Algorithm parameter support
+- Hardware capability detection
+
+## Development Guidelines
+
+### Adding New Algorithms
+
+1. **Create Native Wrapper**:
+   ```python
+   def _daal_algorithm(X, y=None, **params):
+       # Convert inputs to oneDAL format
+       # Configure oneDAL algorithm
+       # Execute computation
+       # Convert results to expected format
+   ```
+
+2. **Implement sklearn Interface**:
+   ```python
+   class Algorithm(sklearn_Algorithm):
+       def fit(self, X, y=None):
+           return self._daal_fit(X, y)
+   ```
+
+3. **Add to Dispatcher**:
+   ```python
+   # Update monkeypatch/dispatcher.py
+   mapping["algorithm"] = [[(module, "Algorithm", Algorithm_daal4py), None]]
+   ```
+
+4. **Create Tests**:
+   ```python
+   # Numerical accuracy tests
+   # Performance benchmarks
+   # Edge case validation
+   ```
+
+### Performance Optimization Guidelines
+
+- **Minimize Data Copies**: Use views and in-place operations
+- **Leverage oneDAL Optimizations**: Choose appropriate algorithms and parameters
+- **Profile Memory Usage**: Monitor peak memory consumption
+- **Validate Numerically**: Ensure mathematical correctness
+- **Benchmark Performance**: Measure against sklearn baselines
+
+### Distributed Computing Guidelines
+
+- **Design for Scalability**: Consider communication overhead
+- **Handle Data Distribution**: Implement efficient partitioning
+- **Manage Dependencies**: Coordinate between nodes
+- **Test at Scale**: Validate with realistic data sizes
+
+## File Location Reference
+
+### Core Implementation
+- `daal4py/__init__.py:53-73` - Core binding imports and initialization
+- `daal4py/sklearn/monkeypatch/dispatcher.py:57-200` - Algorithm mapping system
+- `src/daal4py.cpp` - Main C++/Cython implementation
+- `src/dist_*.h` - Distributed computing headers
+
+### Algorithm Examples
+- `daal4py/sklearn/cluster/dbscan.py:35-56` - DBSCAN oneDAL integration
+- `daal4py/sklearn/linear_model/_linear.py` - Linear regression implementation
+- `daal4py/sklearn/decomposition/_pca.py` - PCA with oneDAL optimization
+
+### Model Builders
+- `daal4py/mb/tree_based_builders.py:65-200` - GBT model conversion
+- `daal4py/mb/logistic_regression_builders.py` - LogReg model conversion
+- `daal4py/mb/gbt_convertors.py` - External library integration
+
+### Distributed Computing
+- `examples/daal4py/*_spmd.py` - SPMD usage examples
+- `src/dist_dbscan.h:28-100` - Distributed DBSCAN implementation
+- `src/mpi/` - MPI communication layer
+
+## AI Agent Development Guidelines
+
+When working with daal4py, AI agents should:
+
+1. **Understand the Native API**: Recognize direct oneDAL algorithm access patterns
+2. **Respect Performance Requirements**: Maintain zero-copy operations where possible
+3. **Handle Distributed Computing**: Account for MPI coordination and data distribution
+4. **Validate Numerically**: Ensure algorithmic correctness against sklearn
+5. **Consider Memory Constraints**: Monitor memory usage in large-scale scenarios
+6. **Test Across Platforms**: Validate on different hardware configurations
+7. **Document Performance**: Clearly specify optimization benefits and limitations
+8. **Maintain Compatibility**: Preserve sklearn API contracts and behavior
+
+The daal4py package represents the performance-critical foundation of the Intel Extension for Scikit-learn, providing both the algorithmic engine and the compatibility layer that enables seamless acceleration of existing scikit-learn workflows.
\ No newline at end of file
diff --git a/doc/AGENTS.md b/doc/AGENTS.md
new file mode 100644
index 0000000000..93899a1727
--- /dev/null
+++ b/doc/AGENTS.md
@@ -0,0 +1,37 @@
+# AGENTS.md - Documentation (doc/)
+
+## Purpose
+Sphinx-based documentation generation system for Intel Extension for Scikit-learn.
+
+## Key Files for Agents
+- `doc/sources/conf.py` - Sphinx configuration with extensions
+- `doc/build-doc.sh` - Documentation build automation
+- `doc/sources/algorithms.rst` - Algorithm support matrix
+- `doc/sources/daal4py.rst` - API reference with autodoc
+
+## Build System
+- **Sphinx Extensions**: autodoc, nbsphinx, intersphinx, napoleon
+- **Notebook Integration**: Jupyter notebooks included via nbsphinx
+- **Cross-References**: Links to sklearn, numpy, pandas documentation
+- **GitHub Pages**: Automated deployment on releases
+
+## Content Structure
+- **User Guides**: Quick start, performance optimization
+- **API Reference**: Auto-generated from docstrings
+- **Examples**: Real-world applications (kaggle/, notebooks/)
+- **Developer Docs**: Distributed computing, contribution guidelines
+
+## Build Commands
+```bash
+# Local development
+make html
+
+# Production deployment
+./build-doc.sh --gh-pages
+```
+
+## For AI Agents
+- Use reStructuredText format for documentation
+- Include proper docstrings for autodoc generation
+- Test documentation builds locally before submitting
+- Maintain cross-references and intersphinx links
\ No newline at end of file
diff --git a/examples/AGENTS.md b/examples/AGENTS.md
new file mode 100644
index 0000000000..15725c369e
--- /dev/null
+++ b/examples/AGENTS.md
@@ -0,0 +1,62 @@
+# AGENTS.md - Examples (examples/)
+
+## Purpose
+113 Python scripts and 19 Jupyter notebooks demonstrating Intel Extension for Scikit-learn usage patterns.
+
+## Directory Structure
+- `daal4py/` - Native oneDAL API examples (80+ scripts)
+- `sklearnex/` - Accelerated sklearn examples (25+ scripts)
+- `mb/` - Model builder examples (XGBoost/LightGBM/CatBoost conversion)
+- `notebooks/` - Jupyter tutorials with real datasets
+- `utils/` - Utility functions
+
+## Key Usage Patterns
+
+### Native oneDAL API
+```python
+import daal4py as d4p
+algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
+result = algorithm.compute(data)
+```
+
+### Accelerated sklearn
+```python
+from sklearnex import patch_sklearn
+patch_sklearn()  # All sklearn imports now accelerated
+from sklearn.cluster import DBSCAN
+```
+
+### GPU Acceleration
+```python
+from sklearnex import config_context
+with config_context(target_offload="gpu:0"):
+    model.fit(X, y)
+```
+
+### Distributed Computing
+```python
+import daal4py as d4p
+d4p.daalinit()  # Initialize MPI
+# ... distributed computation
+d4p.daalfini()  # Cleanup
+```
+
+### Model Conversion
+```python
+from daal4py.mb import convert_model
+d4p_model = convert_model(xgb_model)  # 10-100x faster inference
+```
+
+## Algorithm Categories
+- **Clustering**: DBSCAN, K-Means
+- **Linear Models**: Linear/Ridge/Logistic regression
+- **Ensemble**: Random Forest, Gradient boosting
+- **Decomposition**: PCA, SVD
+- **Statistics**: Moments, covariance
+- **SVM**: Classification and regression
+
+## For AI Agents
+- Use examples as templates for new implementations
+- Follow established patterns for performance optimization
+- Include both sklearn and oneDAL performance comparisons
+- Test examples across CPU/GPU configurations
\ No newline at end of file
diff --git a/generator/AGENTS.md b/generator/AGENTS.md
new file mode 100644
index 0000000000..3b24eba7b0
--- /dev/null
+++ b/generator/AGENTS.md
@@ -0,0 +1,80 @@
+# AGENTS.md - Code Generator (generator/)
+
+## Purpose
+Automated code generation system that creates Python bindings for oneDAL algorithms through C++ header parsing and Jinja2 templates.
+
+## Key Files
+- `gen_daal4py.py` - Main orchestrator (1274 lines)
+- `parse.py` - C++ header parser (727 lines)
+- `wrapper_gen.py` - Jinja2 template engine (1626 lines)
+- `wrappers.py` - Algorithm metadata configuration (1028 lines)
+- `format.py` - Type conversion utilities (287 lines)
+
+## Generation Pipeline
+1. **Header Parsing**: Extract classes, enums, templates from oneDAL C++ headers
+2. **Metadata Processing**: Filter algorithms, handle required parameters
+3. **Template Generation**: Create Cython wrappers using Jinja2 templates
+4. **Code Output**: Generate Python API with proper type conversion
+
+## Algorithm Configuration
+```python
+# Required parameters for algorithms
+required = {
+    "algorithms::dbscan": [("epsilon", "fptype"), ("minObservations", "size_t")],
+    "algorithms::kmeans": [("nClusters", "size_t"), ("maxIterations", "size_t")],
+    # ... 40+ algorithm configurations
+}
+```
+
+## Template System
+- **Jinja2 Templates**: Generate consistent Cython wrappers
+- **Type Mapping**: Python ↔ C++ type conversion
+- **Error Handling**: Input validation and exception handling
+- **Memory Management**: Proper C++ object lifecycle
+
+## When to Modify Generator vs Python Code
+
+### Modify Generator (`wrappers.py`) When:
+```python
+# Adding new oneDAL algorithms not yet wrapped
+required = {
+    "algorithms::new_algorithm": [("param1", "size_t"), ("param2", "double")]
+}
+
+# Changing algorithm parameter requirements
+no_constructor = {
+    "algorithms::special_case": {"param": ["type", "default_value"]}
+}
+```
+
+### Direct Python Implementation When:
+- Adding sklearn interface compatibility layers
+- Implementing parameter validation and conversion
+- Creating custom error handling or fallback logic
+- Adding utility functions that don't require C++ bindings
+
+### Build Process Integration
+```bash
+# Generator runs in stage 1 of 4-stage build (from INSTALL.md)
+# 1. Creating C++ and Cython sources from oneDAL C++ headers
+# 2. Building oneDAL Python interfaces via cmake and pybind11
+# 3. Running Cython on generated sources
+# 4. Compiling and linking them
+
+# Force regeneration during development
+python setup.py build_ext --inplace --force
+```
+
+### Debugging Generated Code
+- Generated files appear in build directories
+- Check `generated_sources/` for Cython output
+- Use `print()` statements in `wrapper_gen.py` templates for debugging
+- Template variables available: `{{ns}}`, `{{algo}}`, `{{args_decl}}`, etc.
+
+## For AI Agents
+- Generator runs automatically during build
+- Modify `wrappers.py` to add new algorithm configurations
+- Templates in `wrapper_gen.py` handle code patterns
+- Type mappings in `format.py` for new data types
+- Test generation changes with `python setup.py build_ext --inplace --force`
+- Use direct Python implementation for sklearn compatibility layers
\ No newline at end of file
diff --git a/onedal/AGENTS.md b/onedal/AGENTS.md
new file mode 100644
index 0000000000..0bb09f3a3e
--- /dev/null
+++ b/onedal/AGENTS.md
@@ -0,0 +1,54 @@
+# AGENTS.md - oneDAL Backend (onedal/)
+
+## Purpose
+Low-level Python bindings to Intel oneDAL using pybind11, providing CPU/GPU execution and memory management.
+
+## Key Components
+- `__init__.py` - Backend selection (DPC++/Host)
+- `_config.py` - Thread-local configuration
+- `_device_offload.py` - Device dispatch utilities
+- `common/` - Core infrastructure and policies
+- `datatypes/` - Data conversion (NumPy, SYCL USM, DLPack)
+- Algorithm modules: `cluster/`, `linear_model/`, `decomposition/`, etc.
+
+## Backend System
+```python
+# Automatic backend selection
+try:
+    import onedal._onedal_py_dpc  # GPU backend
+except ImportError:
+    import onedal._onedal_py_host  # CPU backend
+```
+
+## Configuration
+```python
+from onedal import config_context
+
+# GPU acceleration
+with config_context(target_offload="gpu:0"):
+    model.fit(X, y)
+
+# Auto device selection (default)
+with config_context(target_offload="auto"):
+    model.fit(X, y)  # Uses data location to choose device
+```
+
+## Data Conversion
+- **NumPy**: Zero-copy conversion via `to_table()`
+- **SYCL USM**: GPU memory sharing (`__sycl_usm_array_interface__`)
+- **DLPack**: Cross-framework tensor exchange
+
+## Algorithm Categories
+- **Clustering**: DBSCAN, K-Means
+- **Linear Models**: Linear/Ridge/Logistic regression
+- **Decomposition**: PCA, Incremental PCA
+- **SVM**: SVC, SVR with kernel methods
+- **Ensemble**: Random Forest
+- **Statistics**: Basic statistics, covariance
+
+## For AI Agents
+- Use `config_context` for device selection
+- Prefer zero-copy operations with `to_table()`
+- Handle CPU/GPU fallback gracefully
+- Monitor memory usage on GPU
+- Test across different device configurations
\ No newline at end of file
diff --git a/sklearnex/AGENTS.md b/sklearnex/AGENTS.md
new file mode 100644
index 0000000000..dc0d3a0740
--- /dev/null
+++ b/sklearnex/AGENTS.md
@@ -0,0 +1,126 @@
+# AGENTS.md - sklearnex Package
+
+## Purpose
+**Primary sklearn-compatible interface** with oneDAL acceleration
+
+## Core Files
+- `dispatcher.py`: Patching system (`get_patch_map_core` line 36)
+- `_config.py`: Configuration (`target_offload`, `allow_fallback_to_host`)
+- `_device_offload.py`: Device dispatch (`dispatch` function line 72)
+- `base.py`: oneDALEstimator base class
+
+## Usage Patterns
+
+**Global Patching:**
+```python
+from sklearnex import patch_sklearn
+patch_sklearn()  # All sklearn imports use oneDAL
+from sklearn.cluster import DBSCAN  # Now accelerated
+```
+
+**Selective Patching:**
+```python
+patch_sklearn(["DBSCAN", "KMeans"])  # Only specific algorithms
+```
+
+**Direct Import:**
+```python
+from sklearnex.cluster import DBSCAN  # Always accelerated
+```
+
+**Status Check:**
+```python
+from sklearnex import sklearn_is_patched
+print(sklearn_is_patched())  # True/False
+```
+
+## Configuration API
+
+**Device Control:**
+```python
+from sklearnex import config_context
+
+# GPU acceleration
+with config_context(target_offload="gpu:0"):
+    model.fit(X, y)
+
+# Force CPU
+with config_context(target_offload="cpu"):
+    model.fit(X, y)
+
+# Auto device selection
+with config_context(target_offload="auto"):  # Default
+    model.fit(X, y)
+```
+
+**Fallback Control:**
+```python
+# Allow CPU fallback when GPU fails
+with config_context(allow_fallback_to_host=True):
+    model.fit(X_gpu, y_gpu)
+
+# Allow sklearn fallback when oneDAL fails
+with config_context(allow_sklearn_after_onedal=True):
+    model.fit(X, y)
+```
+
+## Algorithm Support Conditions
+
+**Implementation Pattern:**
+```python
+class Algorithm(BaseAlgorithm, oneDALEstimator, _sklearn_Algorithm):
+    def _onedal_cpu_supported(self, method_name, *data):
+        # Check data types, parameters, etc.
+        return PatchingConditionsChain("sklearnex.algorithm")
+
+    def _onedal_gpu_supported(self, method_name, *data):
+        # Check GPU-specific requirements
+        return PatchingConditionsChain("sklearnex.algorithm.gpu")
+```
+
+**Dispatch Flow:**
+1. Check `_onedal_gpu_supported()` → Use GPU oneDAL
+2. Check `_onedal_cpu_supported()` → Use CPU oneDAL
+3. Fallback → Use original sklearn
+
+## Algorithm Categories
+
+**Supported Algorithms with oneDAL:**
+- **Clustering**: DBSCAN, K-Means
+- **Linear Models**: LogisticRegression, Ridge, LinearRegression
+- **Ensemble**: RandomForestClassifier/Regressor
+- **Decomposition**: PCA, IncrementalPCA
+- **Neighbors**: KNeighborsClassifier/Regressor
+- **SVM**: SVC, SVR, NuSVC, NuSVR
+
+**GPU Support Status:**
+- **Full GPU**: DBSCAN, K-Means, PCA, KNeighbors
+- **Limited GPU**: LogisticRegression (2024.1+), SVM
+- **CPU Only**: RandomForest, Ridge, IncrementalPCA
+
+## Key Implementation Files
+- `sklearnex/dispatcher.py:36` - `get_patch_map_core()` function
+- `sklearnex/_device_offload.py:72` - `dispatch()` function
+- `sklearnex/_config.py` - Configuration API
+- `sklearnex/base.py` - oneDALEstimator base class
+
+## Distributed Computing (SPMD)
+**Location**: `sklearnex/spmd/`
+**Usage**: Same API, distributed across MPI nodes
+```python
+from sklearnex.spmd.cluster import DBSCAN  # Distributed version
+```
+
+## Preview Features
+**Activation**: `export SKLEARNEX_PREVIEW=1`
+**Location**: `sklearnex/preview/`
+**Content**: Experimental algorithms, enhanced covariance, advanced PCA
+
+## Error Handling
+**Fallback Chain**: oneDAL GPU → oneDAL CPU → sklearn → Error
+
+**Common Fallback Triggers:**
+- Sparse data (most algorithms don't support)
+- Unsupported parameters
+- GPU memory limits
+- Wrong data types
\ No newline at end of file
diff --git a/src/AGENTS.md b/src/AGENTS.md
new file mode 100644
index 0000000000..4a3504f1f5
--- /dev/null
+++ b/src/AGENTS.md
@@ -0,0 +1,49 @@
+# AGENTS.md - Core Implementation (src/)
+
+## Purpose
+C++/Cython implementation providing direct Python bindings to Intel oneDAL with zero-overhead access, memory management, and distributed computing.
+
+## Key Files
+- `daal4py.cpp/.h` - Main C++ interface and NumPy integration
+- `npy4daal.h` - NumPy-oneDAL conversion utilities
+- `gbt_model_builder.pyx` - Gradient boosting tree builder
+- `gettree.pyx` - Tree visitor patterns (sklearn compatibility)
+- `transceiver.h` - Communication abstraction for distributed computing
+- `dist_*.h` - Distributed algorithm implementations (DBSCAN, K-Means)
+- `pickling.h` - Serialization support
+
+## Core Features
+
+### Memory Management
+```cpp
+// Zero-copy NumPy integration with thread-safe reference counting
+class NumpyDeleter : public daal::services::DeleterIface {
+    // GIL-protected cleanup of Python objects
+};
+```
+
+### Distributed Computing
+```cpp
+// MPI-based communication layer
+class transceiver_iface {
+    virtual void gather(...) = 0;
+    virtual void bcast(...) = 0;
+    virtual void reduce_all(...) = 0;
+};
+```
+
+### Tree Model Building
+```cython
+# Cython interface for external model conversion
+cdef class gbt_classification_model_builder:
+    def create_tree(self, n_nodes, class_label)
+    def add_split(self, feature_index, threshold)
+    def add_leaf(self, response, cover)
+```
+
+## For AI Agents
+- src/ contains performance-critical C++/Cython code
+- Use existing patterns for memory management (zero-copy, GIL protection)
+- Distributed algorithms follow map-reduce patterns
+- Model builders enable external framework integration (XGBoost→oneDAL)
+- Maintain thread safety and cross-platform compatibility
\ No newline at end of file
diff --git a/tests/AGENTS.md b/tests/AGENTS.md
new file mode 100644
index 0000000000..6f09f8d58d
--- /dev/null
+++ b/tests/AGENTS.md
@@ -0,0 +1,117 @@
+# AGENTS.md - Testing Infrastructure (tests/)
+
+## Purpose
+Comprehensive validation infrastructure ensuring numerical accuracy, performance compliance, and cross-platform reliability.
+
+## Key Test Modules
+- `test_daal4py_examples.py` - Native daal4py algorithm validation
+- `test_model_builders.py` - External framework integration (XGBoost/LightGBM)
+- `test_daal4py_spmd_examples.py` - Distributed computing validation
+- `test_estimators.py` - sklearn compatibility validation
+- `test_npy.py` - NumPy data type validation
+- `run_examples.py` - Cross-platform example execution
+- `unittest_data/` - Reference datasets for validation
+
+## Validation Patterns
+
+### Numerical Accuracy
+```python
+# Standard tolerance for floating-point comparisons
+np.testing.assert_allclose(actual, expected, atol=1e-05)
+
+# Matrix reconstruction validation (SVD/QR)
+np.testing.assert_allclose(original, reconstructed)
+```
+
+### Model Builder Testing
+```python
+# XGBoost conversion accuracy
+xgb_predictions = xgb_model.predict(X)
+d4p_predictions = convert_model(xgb_model).predict(X)
+np.testing.assert_allclose(xgb_predictions, d4p_predictions)
+```
+
+### Performance Validation
+```python
+# Execution time limits
+@dataclass
+class Config:
+    timeout_cpu_seconds: int = 170  # Default (verified in tests/test_daal4py_examples.py)
+    # Extended timeouts for complex algorithms
+```
+
+### Distributed Testing
+```python
+# MPI-aware testing with proper rank coordination
+@unittest.skipUnless(MPI.COMM_WORLD.size > 1, "Not running in distributed mode")
+def test_spmd_algorithm(self):
+    # Distributed algorithm validation
+```
+
+## Cross-Platform Support
+- **OS Detection**: Windows, Linux, macOS compatibility
+- **Device Requirements**: CPU/GPU availability checking
+- **Dependency Management**: Graceful skipping for missing libraries
+
+## Test Execution Commands
+
+### Local Development Testing
+```bash
+# Complete test suite (verified in conda-recipe/run_test.sh)
+pytest --verbose -s tests/                    # Legacy/integration tests
+pytest --verbose --pyargs daal4py            # Native oneDAL API tests
+pytest --verbose --pyargs sklearnex          # sklearn compatibility tests
+pytest --verbose --pyargs onedal             # Low-level backend tests
+pytest --verbose .ci/scripts/test_global_patch.py  # Global patching validation
+
+# With coverage reporting
+pytest --cov=onedal --cov=sklearnex --cov-config=.coveragerc --cov-branch
+```
+
+### Distributed (SPMD) Testing
+```bash
+# Requires MPI setup and NO_DIST!=1
+mpirun -n 4 python tests/helper_mpi_tests.py \
+    pytest -k spmd --with-mpi --verbose --pyargs sklearnex
+
+mpirun -n 4 python tests/helper_mpi_tests.py \
+    pytest --verbose -s tests/test_daal4py_spmd_examples.py
+```
+
+### Performance Validation
+```python
+# Timeout configuration patterns (from test_daal4py_examples.py)
+@dataclass
+class Config:
+    timeout_cpu_seconds: int = 170  # Default timeout
+    # Algorithm-specific overrides:
+    # - gradient_boosted_classification: 480s
+    # - complex algorithms: extended timeouts
+```
+
+### Dependencies and Platform Testing
+```python
+# Graceful dependency handling (from run_examples.py)
+def has_deps(rule):
+    for rule_item in rule:
+        try:
+            importlib.import_module(rule_item)
+        except ImportError:
+            return False
+    return True
+
+# Platform detection
+IS_WIN = plt.system() == "Windows"
+IS_MAC = plt.system() == "Darwin"
+IS_LIN = plt.system() == "Linux"
+```
+
+## For AI Agents
+- Use `np.testing.assert_allclose(atol=1e-05)` for numerical validation
+- Configure appropriate timeouts based on algorithm complexity
+- Handle missing dependencies gracefully with `skipTest()`
+- Test both sklearn compatibility and numerical accuracy
+- Validate model conversion maintains prediction accuracy
+- Run distributed tests with `mpirun -n 4` for SPMD algorithms
+- Check hardware requirements before GPU tests
+- Use coverage reporting for development validation
\ No newline at end of file