From 19051b9f13cc836f67399cef0ef5ad63d872f790 Mon Sep 17 00:00:00 2001 From: Ethan Glaser Date: Wed, 30 Jul 2025 12:08:22 -0700 Subject: [PATCH 1/4] doc: local trees parameter documentation --- doc/sources/algorithms.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst index d6e6a37dd5..f063d7aea8 100755 --- a/doc/sources/algorithms.rst +++ b/doc/sources/algorithms.rst @@ -473,6 +473,10 @@ Classification - ``criterion`` != `'gini'` - ``oob_score`` = `True` - ``sample_weight`` != `None` + + **Additional parameters:** + + - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. `n_estimators` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. - Multi-output and sparse data are not supported * - :obj:`sklearn.ensemble.ExtraTreesClassifier` - All parameters are supported except: @@ -525,6 +529,10 @@ Regression - ``criterion`` != `'mse'` - ``oob_score`` = `True` - ``sample_weight`` != `None` + + **Additional parameters:** + + - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. `n_estimators` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. - Multi-output and sparse data are not supported * - :obj:`sklearn.ensemble.ExtraTreesRegressor` - All parameters are supported except: From 1242c791da2b74840ff2bae671804bba9ddef36c Mon Sep 17 00:00:00 2001 From: ethanglaser <42726565+ethanglaser@users.noreply.github.com> Date: Wed, 30 Jul 2025 12:13:21 -0700 Subject: [PATCH 2/4] Update doc/sources/algorithms.rst --- doc/sources/algorithms.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst index f063d7aea8..93211d72ee 100755 --- a/doc/sources/algorithms.rst +++ b/doc/sources/algorithms.rst @@ -476,7 +476,7 @@ Classification **Additional parameters:** - - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. `n_estimators` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. + - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. - Multi-output and sparse data are not supported * - :obj:`sklearn.ensemble.ExtraTreesClassifier` - All parameters are supported except: From f27908967df8ee1aa6a468e8a0f275f47afd1b29 Mon Sep 17 00:00:00 2001 From: ethanglaser <42726565+ethanglaser@users.noreply.github.com> Date: Wed, 30 Jul 2025 12:13:39 -0700 Subject: [PATCH 3/4] Update doc/sources/algorithms.rst --- doc/sources/algorithms.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst index 93211d72ee..42b917b6e4 100755 --- a/doc/sources/algorithms.rst +++ b/doc/sources/algorithms.rst @@ -532,7 +532,7 @@ Regression **Additional parameters:** - - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. `n_estimators` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. + - ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. - Multi-output and sparse data are not supported * - :obj:`sklearn.ensemble.ExtraTreesRegressor` - All parameters are supported except: From 1d075e57db44a51651fd475b412dc6b798d0fceb Mon Sep 17 00:00:00 2001 From: y Date: Fri, 26 Sep 2025 11:01:25 -0700 Subject: [PATCH 4/4] add local_trees_mode details to tuning guide --- doc/sources/guide/acceleration.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/sources/guide/acceleration.rst b/doc/sources/guide/acceleration.rst index ea368b4029..63df27f89d 100644 --- a/doc/sources/guide/acceleration.rst +++ b/doc/sources/guide/acceleration.rst @@ -73,3 +73,10 @@ times, especially for larger data sets. However, due to the reduced fidelity of the data, the resulting model can present worse performance metrics compared to a model trained on the original data. In such cases, the number of bins can be increased with the ``max_bins`` parameter. + +Another parameter that can improve performance at large scale for Random Forest, +specifically the ``sklearnex.spmd.ensemble`` ``RandomForestClassifier`` and +``RandomForestRegressor`` classes, is ``local_trees_mode``. This uses an +alternative backend that is more conducive to scalability when running on more +GPUs. The default is ``False``, but setting to ``True`` enables this functionality. +This parameter is only available in the ``spmd`` module, for multi-GPU use.