-
Notifications
You must be signed in to change notification settings - Fork 183
doc: local trees parameter documentation #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
doc: local trees parameter documentation #2636
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
@ethanglaser The only section that I'm aware of where extra parameters are documented is here: The title of the doc section doesn't match at all with the contents, but perhaps you could put it there for now next to the other extra parameters of decision trees, and then later we can revisit the structuring of the docs. |
increased with the ``max_bins`` parameter. | ||
|
||
Another parameter that can improve performance at large scale for Random Forest, | ||
specifically the ``sklearnex.spmd.ensemble`` ``RandomForestClassifier`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use links to the sklearn docs of the classes here, as done elsewhere - e.g. :obj:`sklearn.ensemble.RandomForestClassifier`
|
||
**Additional parameters:** | ||
|
||
- ``local_trees_mode`` (bool, default=False): Enables local trees mode for distributed training. ``n_estimators`` is per rank, with isolated learning occurring on each processor before merging into a single model. This mode is experimental but scales better than default. This parameter is specific to the SPMD implementation and is not present in the standard scikit-learn API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this is not very descriptive.
- Does it mean that the result has
n_estimators*n_ranks
trees? - Does the data get moved across ranks, or does each rank use the data that it owns?
- Maybe could also refer to them as 'rank/nodes' as otherwise it might not be immediately clear what a 'rank' here refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could point to oneDAL docs, where this functionality was implemented. @Alexandr-Solovev can we get this documented in oneDAL?
Description
Follow-up to #2615 (and uxlfoundation/oneDAL#3139). Adds documentation of additional parameter to SPMD forest estimators. Open to discussion on the best way to do this since I don't believe we have any prior references for this.
Checklist to comply with before moving PR from draft:
PR completeness and readability