Skip to content

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Jul 30, 2025

Description

Follow-up to #2615 (and uxlfoundation/oneDAL#3139). Adds documentation of additional parameter to SPMD forest estimators. Open to discussion on the best way to do this since I don't believe we have any prior references for this.

image

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Copy link

codecov bot commented Jul 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure ?
github 73.19% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 30 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@david-cortes-intel
Copy link
Contributor

@ethanglaser The only section that I'm aware of where extra parameters are documented is here:
https://uxlfoundation.github.io/scikit-learn-intelex/2025.7/guide/acceleration.html#random-forest

The title of the doc section doesn't match at all with the contents, but perhaps you could put it there for now next to the other extra parameters of decision trees, and then later we can revisit the structuring of the docs.

increased with the ``max_bins`` parameter.

Another parameter that can improve performance at large scale for Random Forest,
specifically the ``sklearnex.spmd.ensemble`` ``RandomForestClassifier`` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use links to the sklearn docs of the classes here, as done elsewhere - e.g. :obj:`sklearn.ensemble.RandomForestClassifier`


**Additional parameters:**

- ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is not very descriptive.

  • Does it mean that the result has n_estimators*n_ranks trees?
  • Does the data get moved across ranks, or does each rank use the data that it owns?
  • Maybe could also refer to them as 'rank/nodes' as otherwise it might not be immediately clear what a 'rank' here refers to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could point to oneDAL docs, where this functionality was implemented. @Alexandr-Solovev can we get this documented in oneDAL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants