Skip to content

Commit 967c39b

Browse files
authored
Implementation of Gradient Boosting Nearest Neighbors (GBNN) (#99)
This PR adds support for using ensembles of `GradientBoostingRegressor` and `GradientBoostingClassifier` models in a kNN context. As far as we're aware, this is the first implementation of using gradient-boosting models paired with nearest neighbor imputation, but is a natural extension of RFNN first developed by Crookston and Finley and implemented in `sknnr`. Separate gradient-boosting estimators are built for each target feature present in `y` (or `y_fit`) and the type of estimator (either regression or classification) is determined by the data type of each target (floating-point and integer use regression estimators, string and `pd.Categorical` use classification estimators). As with RFNN, node IDs from the reference samples that were used to fit the model are captured. New samples are run through the fit estimators and their node IDs are compared to reference node IDs using a weighted Hamming distance to identify nearest neighbors. An abbreviated history of the commits on this PR are listed below: - Add `GBNodeTransformer`. Follows the same implementation pattern as `RFNodeTransformer` and uses @aazuspan's logic for setting gradient boosting model tree weights based on loss reduction. - Include transformer tests for `GBNodeTransformer`. This commit also fixes one error and one warning in `TreeNodeTransformer` found during testing. In `transform`, returning `X` from the call to `_validate_data` is necessary because `GradientBoostingRegressor.apply` expects that `X` is a numpy array (calls `.shape()`). In `_fit`, returing `X` from the call to `_validate_data` returns `X` as a numpy array and removes the warning about fitting with feature names. - Add `GBNNRegressor`. Follows the same implementation pattern as `RFNNRegressor`, but does not yet have `tree_weights` parameter that allows user control over setting gradient boosting models tree weights. - Reorganize tree weighting functions in `GBNodeTransformer`. Move `delta_loss` into separate function to calculate tree weights for a single gradient boosting model and create `tree_weighting_method` as argument to `GBNodeTransformer` with `delta_loss` and `uniform` choices. - Add `GBNNRegressor` to tests with regression data - Handle weights and nodes for multi-class GB classifiers. Gradient boosting classifiers with a multiclass target (i.e. more than two distinct labels) behave differently than either continuous regression or binary classification. At each iteration, a separate tree is built for each class such that the final node matrix for a multi-class forest will have shape (`n_samples`, `n_estimators`, `n_classes`). In order to fit the NN paradigm with Hamming distance finding, these forests must be accommodated. This commit makes the following changes: 1) Adds a new estimator attribute called `n_trees_per_iteration_` which is a list of size `self.n_forests_` and captures the number of parallel trees created per iteration. For GB multi-class forests, this is set to `n_classes` (> 2). For all other forests this is set to 1. 2) Adjusts the data structure of the estimator attribute called `tree_weights_`. Previously, this was a 2D numpy array of shape (`n_forests_`, `n_estimators`), but because multiclass forests will have (`n_estimators` * `n_classes`) trees, the weights shape may vary from forest to forest. Now this is a list with size `n_forests_` of numpy arrays with shape (`n_estimators` * `n_trees_per_iteration_[i]`); 3) Modified forest weights when applied to multi-class forests. Because there are more trees in these forests, the forest weight needs to be adjusted (i.e. divided by `n_trees_per_iteration_[i]`) when calculating final weight. This ensures that each forest continues to have the user-specified (or equal) weights. 4) Removed `ensemble_delta_loss` and consolidated it into `GBNodeTransformer._set_tree_weights` - Coarsen classes in regression test to avoid numerical imprecision. The `test_estimators_with_mixed_type_forests` test was using two different features: Total_BA and the species with the maximum basal area for each sample. The latter produced a column with 11 classes, some with only 1 or 2 presence records. When building the GB forests, there were numerical precision issues in that the actual trees built differed between Windows and Unix. The maximum species feature has now been reduced to three classes: AGBR, TSHE (the two most commonly abundant species) and anything else (OTHER). - Standardize Hamming weights to sum to 1.0 across all forests. There are two levels of standardization in this commit: first, forest weights are standardized such that their sum is 1.0 and second, tree weights within each forest are standardized such that their sum is equal to their corresponding (standardized) forest weights. In the case of a multi-class GBNN classifier, each class's tree weights will be (forest_weight / n_classes). - Replace hard-coded value for `factor` with sklearn logic. This commit replaces the multiplicative factor of scaling the initial loss calculation. Previously, we used a hard-coded value of 2, but now base the logic on scikit-learn's implementation in `BaseGradientBoosting._fit_stages`. This required regenerating regression files for two test instances because they use a `HalfMultnomialLoss` which should have a factor of 1. - Rename `delta_loss` to `train_improvement` and document intended use - Add documentation pages for `GBNN` / `GBNodeTransformer` - Better error checking on user-supplied forest weights - Force `algorithm` to `brute` for tree-based methods. Hamming distance is best suited for using brute-force finding and other algorithms (e.g. `BallTree`, `KDTree`) are not compatible. Remove `algorithm` and `leaf_size` (only applicable to `BallTree` and `KDTree`) as user parameters.
1 parent d9c1512 commit 967c39b

30 files changed

+955
-81
lines changed

docs/abbreviations.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
*[MSN]: Most Similar Neighbor
33
*[kNN]: k-nearest neighbor
44
*[RFNN]: Random Forest Nearest Neighbor
5+
*[GBNN]: Gradient Boosting Nearest Neighbor
56
*[CCorA]: Canonical Correlation Analysis
67
*[CCA]: Canonical Correspondence Analysis

docs/mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@ nav:
1515
- GNNRegressor: api/estimators/gnn.md
1616
- MSNRegressor: api/estimators/msn.md
1717
- RFNNRegressor: api/estimators/rfnn.md
18+
- GBNNRegressor: api/estimators/gbnn.md
1819
- Transformers:
1920
- StandardScalerWithDOF: api/transformers/standardscalerwithdof.md
2021
- MahalanobisTransformer: api/transformers/mahalanobis.md
2122
- CCATransformer: api/transformers/cca.md
2223
- CCorATransformer: api/transformers/ccora.md
2324
- RFNodeTransformer: api/transformers/rfnode.md
25+
- GBNodeTransformer: api/transformers/gbnode.md
2426
- Datasets:
2527
- Dataset: api/datasets/dataset.md
2628
- "Moscow Mountain / St. Joes": api/datasets/moscow_stjoes.md

docs/pages/api/estimators/gbnn.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: sknnr.GBNNRegressor
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: sknnr.transformers.GBNodeTransformer

docs/pages/usage.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
## Estimators
22

3-
`sknnr` provides six estimators that are fully compatible, drop-in replacements for `scikit-learn` estimators:
3+
`sknnr` provides seven estimators that are fully compatible, drop-in replacements for `scikit-learn` estimators:
44

55
- [RawKNNRegressor](api/estimators/raw.md)
66
- [EuclideanKNNRegressor](api/estimators/euclidean.md)
77
- [MahalanobisKNNRegressor](api/estimators/mahalanobis.md)
88
- [GNNRegressor](api/estimators/gnn.md)
99
- [MSNRegressor](api/estimators/msn.md)
1010
- [RFNNRegressor](api/estimators/rfnn.md)
11+
- [GBNNRegressor](api/estimators/gbnn.md)
1112

1213
These estimators can be used like any other `sklearn` regressor (or [classifier](#regression-and-classification))[^sklearn-docs].
1314

@@ -128,11 +129,11 @@ print(y.loc[neighbor_ids[0]])
128129

129130
### Y-Fit Data
130131

131-
The [GNNRegressor](api/estimators/gnn.md), [MSNRegressor](api/estimators/msn.md), and [RFNNRegressor](api/estimators/rfnn.md) estimators can be fit with `X` and `y` data, but they also accept an optional `y_fit` parameter. If provided, `y_fit` is used to fit the ordination transformer while `y` is used to fit the kNN regressor.
132+
The [GNNRegressor](api/estimators/gnn.md), [MSNRegressor](api/estimators/msn.md), [RFNNRegressor](api/estimators/rfnn.md), and [GBNNRegressor](api/estimators/gbnn.md) estimators can be fit with `X` and `y` data, but they also accept an optional `y_fit` parameter. If provided, `y_fit` is used to fit the transformer while `y` is used to fit the kNN regressor.
132133

133-
In forest attribute estimation, the underlying ordination transformations for two of these estimators (CCA for GNN and CCorA for MSN) typically use a matrix of species abundances or presence/absence information to relate the species data to environmental covariates, but often the user wants predictions based not on these features, but rather attributes that describe forest structure (e.g. biomass) or composition (e.g. species richness). In this case, the species matrix would be specified as `y_fit` and the stand attributes would be specified as `y`.
134+
In forest attribute estimation, the underlying transformations for two of these estimators (CCA for GNN and CCorA for MSN) typically use a matrix of species abundances or presence/absence information to relate the species data to environmental covariates, but often the user wants predictions based not on these features, but rather attributes that describe forest structure (e.g. biomass) or composition (e.g. species richness). In this case, the species matrix would be specified as `y_fit` and the stand attributes would be specified as `y`.
134135

135-
For RFNN, the `y_fit` parameter can be used to specify the attributes for which individual random forests will be created (one forest per feature). As with GNN and MSN, the `y` parameter can then be used to specify the attributes that will be predicted by the nearest neighbors.
136+
For RFNN and GBNN, the `y_fit` parameter can be used to specify the attributes for which individual forests will be created (one forest per feature). As with GNN and MSN, the `y` parameter can then be used to specify the attributes that will be predicted by the nearest neighbors.
136137

137138
```python
138139
from sknnr import GNNRegressor
@@ -152,9 +153,11 @@ est = GNNRegressor(n_components=3).fit(X, y)
152153

153154
The maximum number of components depends on the input data and the estimator. Specifying `n_components` greater than the maximum number of components will raise an error.
154155

155-
### RFNN Distance Metric
156+
### RFNN and GBNN Distance Metric
156157

157-
For all estimators other than [RFNNRegressor](api/estimators/rfnn.md), the distance metric used to determine nearest neighbors is the Euclidean distance between samples in the transformed space. RFNN, on the other hand, first builds a random forest for each feature in the `y` (or `y_fit`) matrix and then captures the node IDs (_not_ values) for each sample on every forest and tree. The distance between samples is calculated using [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance), which captures the number of node IDs that are different between the target and reference samples and then divided by the total number of nodes. Therefore, a target and reference sample that share _all_ node IDs would have a distance of 0, whereas a target and reference sample that share _no_ node IDs would have a distance of 1.
158+
For all estimators other than [RFNNRegressor](api/estimators/rfnn.md) and [GBNNRegressor](api/estimators/gbnn.md), the distance metric used to determine nearest neighbors is the Euclidean distance between samples in the transformed space. RFNN and GBNN, on the other hand, first build a forest for each feature in the `y` (or `y_fit`) matrix and then capture the node IDs (_not_ values) for each sample on every forest and tree. The distance between samples is calculated using [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance), which captures the number of node IDs that are different between the target and reference samples and then divided by the total number of nodes. Therefore, a target and reference sample that share _all_ node IDs would have a distance of 0, whereas a target and reference sample that share _no_ node IDs would have a distance of 1.
159+
160+
Additionally, GBNN allows users to specify the `tree_weighting_method` parameter, which applies weights to the Hamming distance calculation based on the importance of the tree stage in training. When `tree_weighting_method` is set to `"train_improvement"`, tree stages that contribute more to reducing training loss are weighted more heavily. When `tree_weighting_method` is set to `"uniform"`, all trees are weighted equally.
158161

159162
### Custom Transformers
160163

@@ -174,6 +177,7 @@ print(cca.fit_transform(X, y))
174177
- [CCATransformer](api/transformers/cca.md)
175178
- [CCorATransformer](api/transformers/ccora.md)
176179
- [RFNodeTransformer](api/transformers/rfnode.md)
180+
- [GBNodeTransformer](api/transformers/gbnode.md)
177181

178182
## Datasets
179183

src/sknnr/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from .__about__ import __version__ # noqa: F401
22
from ._base import RawKNNRegressor
33
from ._euclidean import EuclideanKNNRegressor
4+
from ._gbnn import GBNNRegressor
45
from ._gnn import GNNRegressor
56
from ._mahalanobis import MahalanobisKNNRegressor
67
from ._msn import MSNRegressor
@@ -13,4 +14,5 @@
1314
"MSNRegressor",
1415
"GNNRegressor",
1516
"RFNNRegressor",
17+
"GBNNRegressor",
1618
]

src/sknnr/_gbnn.py

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
from __future__ import annotations
2+
3+
from typing import Callable, Literal
4+
5+
from numpy.typing import ArrayLike
6+
from sklearn.base import BaseEstimator, TransformerMixin
7+
8+
from ._weighted_trees import WeightedTreesNNRegressor
9+
from .transformers import GBNodeTransformer
10+
11+
12+
class GBNNRegressor(WeightedTreesNNRegressor):
13+
"""
14+
Regression using Gradient Boosting Nearest Neighbors (GBNN) imputation.
15+
16+
New data is predicted by similarity of its node indexes to training
17+
set node indexes when run through multiple univariate gradient boosting
18+
models. A gradient boosting model is fit to each target in the training
19+
set and node indexes are captured for each tree in each forest for each
20+
training sample. Node indexes are then captured for inference data and
21+
distance is calculated as the dissimilarity between node indexes.
22+
23+
Gradient boosting models are constructed using either scikit-learn's
24+
`GradientBoostingRegressor` or `GradientBoostingClassifier` classes based on
25+
the data type of each target (`y` or `y_fit`) in the training set. If the
26+
target is numeric (e.g. `int` or `float`), a `GradientBoostingRegressor` is
27+
used. If the target is categorical (e.g. `str` or `pd.Categorical`), a
28+
`GradientBoostingClassifier` is used. The
29+
`sknnr.transformers.GBNodeTransformer` class is responsible for constructing
30+
the gradient boosting models and capturing the node indexes.
31+
32+
See `sklearn.neighbors.KNeighborsRegressor` for more detail on
33+
parameters associated with nearest neighbors. See
34+
`sklearn.ensemble.GradientBoostingRegressor` and
35+
`sklearn.ensemble.GradientBoostingClassifier` for more detail on parameters
36+
associated with gradient boosting. Note that some parameters (e.g.
37+
`loss` and `alpha`) are specified separately for regression and
38+
classification and have `_reg` and `_clf` suffixes.
39+
40+
Parameters
41+
----------
42+
loss_reg : {"squared_error", "absolute_error", "huber", "quantile"},
43+
default="squared_error"
44+
Loss function to be optimized for regression.
45+
loss_clf : {"log_loss", "exponential"}, default="log_loss"
46+
The loss function to be used for classification.
47+
learning_rate : float, default=0.1
48+
Learning rate shrinks the contribution of each tree by `learning_rate`.
49+
n_estimators : int, default=100
50+
The number of boosting stages to perform.
51+
subsample : float, default=1.0
52+
The fraction of samples to be used for fitting the individual base
53+
learners.
54+
criterion : {"friedman_mse", "squared_error"}, default="friedman_mse"
55+
The function to measure the quality of a split.
56+
min_samples_split : int or float, default=2
57+
The minimum number of samples required to split an internal node.
58+
min_samples_leaf : int or float, default=1
59+
The minimum number of samples required to be at a leaf node.
60+
min_weight_fraction_leaf : float, default=0.0
61+
The minimum weighted fraction of the sum total of weights (of all the
62+
input samples) required to be at a leaf node.
63+
max_depth : int or None, default=3
64+
Maximum depth of the individual regression estimators.
65+
min_impurity_decrease : float, default=0.0
66+
A node will be split if this split induces a decrease of the impurity
67+
greater than or equal to this value.
68+
init : estimator, "zero" or None, default=None
69+
An estimator object that is used to compute the initial predictions.
70+
random_state : int, RandomState instance or None, default=None
71+
Controls the random seed given to each Tree estimator at each boosting
72+
iteration.
73+
max_features : {"sqrt", "log2"}, int or float, default=None
74+
The number of features to consider when looking for the best split.
75+
alpha_reg : float, default=0.9
76+
The alpha-quantile of the huber loss function and the quantile loss
77+
function.
78+
verbose : int, default=0
79+
Enable verbose output.
80+
max_leaf_nodes : int or None, default=None
81+
Grow trees with `max_leaf_nodes` in best-first fashion.
82+
warm_start : bool, default=False
83+
When set to `True`, reuse the solution of the previous call to fit and
84+
add more estimators to the ensemble, otherwise, just erase the previous
85+
solution.
86+
validation_fraction : float, default=0.1
87+
The proportion of training data to set aside as validation set for
88+
early stopping.
89+
n_iter_no_change : int or None, default=None
90+
`n_iter_no_change` is used to decide if early stopping will be used to
91+
terminate training when validation score is not improving.
92+
tol : float, default=1e-4
93+
Tolerance for the early stopping.
94+
ccp_alpha : non-negative float, default=0.0
95+
Complexity parameter used for Minimal Cost-Complexity Pruning.
96+
forest_weights: {"uniform"}, array-like of shape (n_targets), default="uniform"
97+
Weights assigned to each target in the training set when calculating
98+
Hamming distance between node indexes. This allows for differential
99+
weighting of targets when calculating distances. Note that all trees
100+
associated with a target will receive the same weight. If "uniform",
101+
each tree is assigned equal weight.
102+
tree_weighting_method: {"train_improvement", "uniform"},
103+
default="train_improvement"
104+
The method used to weight the trees in each gradient boosting model.
105+
n_neighbors : int, default=5
106+
Number of neighbors to use by default for `kneighbors` queries.
107+
weights : {"uniform", "distance"}, callable or None, default="uniform"
108+
Weight function used in prediction.
109+
n_jobs : int or None, default=None
110+
The number of jobs to run in parallel.
111+
112+
Attributes
113+
----------
114+
effective_metric_ : str
115+
Always set to 'hamming'.
116+
effective_metric_params_ : dict
117+
Always empty.
118+
hamming_weights_ : np.array
119+
When `fit`, provides the weights on each tree in each forest when
120+
calculating the Hamming distance.
121+
independent_prediction_ : np.array
122+
When `fit`, provides the prediction for training data not allowing
123+
self-assignment during neighbor search.
124+
independent_score_ : double
125+
When `fit`, the mean coefficient of determination of the independent
126+
prediction across all features.
127+
n_features_in_ : int
128+
Number of features that the transformer outputs. This is equal to the
129+
number of features in `y` (or `y_fit`) * `n_estimators_per_forest`.
130+
n_samples_fit_ : int
131+
Number of samples in the fitted data.
132+
transformer_ : GBNodeTransformer
133+
The fitted transformer which holds the built gradient boosting models
134+
for each feature.
135+
y_fit_ : np.array or pd.DataFrame
136+
When `y_fit` is passed to `fit`, the data used to construct the
137+
individual gradient boosting models. Note that all `y` data is used
138+
for prediction.
139+
140+
Notes
141+
-----
142+
The `tree_weighting_method` parameter determines how the trees in each
143+
forest are weighted when calculating distances between node indexes.
144+
If `tree_weighting_method` is set to "train_improvement", tree weights are
145+
calculated as a function of the change in loss between successive trees
146+
in the gradient boosting estimator. As such, weights are directly
147+
proportional to the loss function specified and the user may want to
148+
choose the appropriate loss function (i.e. `loss_reg` or `loss_clf`)
149+
for their task.
150+
151+
If `tree_weighting_method` is set to "uniform", all trees are weighted
152+
equally.
153+
"""
154+
155+
def __init__(
156+
self,
157+
*,
158+
loss_reg: Literal[
159+
"squared_error", "absolute_error", "huber", "quantile"
160+
] = "squared_error",
161+
loss_clf: Literal["log_loss", "exponential"] = "log_loss",
162+
learning_rate: float = 0.1,
163+
n_estimators: int = 100,
164+
subsample: float = 1.0,
165+
criterion: Literal["friedman_mse", "squared_error"] = "friedman_mse",
166+
min_samples_split: int | float = 2,
167+
min_samples_leaf: int | float = 1,
168+
min_weight_fraction_leaf: float = 0.0,
169+
max_depth: int | None = 3,
170+
min_impurity_decrease: float = 0.0,
171+
init: BaseEstimator | Literal["zero"] | None = None,
172+
random_state: int | None = None,
173+
max_features: Literal["sqrt", "log2"] | int | float | None = None,
174+
alpha_reg: float = 0.9,
175+
verbose: int = 0,
176+
max_leaf_nodes: int | None = None,
177+
warm_start: bool = False,
178+
validation_fraction: float = 0.1,
179+
n_iter_no_change: int | None = None,
180+
tol: float = 0.0001,
181+
ccp_alpha: float = 0.0,
182+
forest_weights: Literal["uniform"] | ArrayLike[float] = "uniform",
183+
tree_weighting_method: Literal[
184+
"train_improvement", "uniform"
185+
] = "train_improvement",
186+
n_neighbors: int = 5,
187+
weights: Literal["uniform", "distance"] | Callable = "uniform",
188+
n_jobs: int | None = None,
189+
):
190+
self.loss_reg = loss_reg
191+
self.loss_clf = loss_clf
192+
self.learning_rate = learning_rate
193+
self.n_estimators = n_estimators
194+
self.subsample = subsample
195+
self.criterion = criterion
196+
self.min_samples_split = min_samples_split
197+
self.min_samples_leaf = min_samples_leaf
198+
self.min_weight_fraction_leaf = min_weight_fraction_leaf
199+
self.max_depth = max_depth
200+
self.min_impurity_decrease = min_impurity_decrease
201+
self.init = init
202+
self.random_state = random_state
203+
self.max_features = max_features
204+
self.alpha_reg = alpha_reg
205+
self.verbose = verbose
206+
self.max_leaf_nodes = max_leaf_nodes
207+
self.warm_start = warm_start
208+
self.validation_fraction = validation_fraction
209+
self.n_iter_no_change = n_iter_no_change
210+
self.tol = tol
211+
self.ccp_alpha = ccp_alpha
212+
self.forest_weights = forest_weights
213+
self.tree_weighting_method = tree_weighting_method
214+
215+
super().__init__(
216+
n_neighbors=n_neighbors,
217+
weights=weights,
218+
n_jobs=n_jobs,
219+
)
220+
221+
def _get_transformer(self) -> TransformerMixin:
222+
return GBNodeTransformer(
223+
loss_reg=self.loss_reg,
224+
loss_clf=self.loss_clf,
225+
learning_rate=self.learning_rate,
226+
n_estimators=self.n_estimators,
227+
subsample=self.subsample,
228+
criterion=self.criterion,
229+
min_samples_split=self.min_samples_split,
230+
min_samples_leaf=self.min_samples_leaf,
231+
min_weight_fraction_leaf=self.min_weight_fraction_leaf,
232+
max_depth=self.max_depth,
233+
min_impurity_decrease=self.min_impurity_decrease,
234+
init=self.init,
235+
random_state=self.random_state,
236+
max_features=self.max_features,
237+
alpha_reg=self.alpha_reg,
238+
verbose=self.verbose,
239+
max_leaf_nodes=self.max_leaf_nodes,
240+
warm_start=self.warm_start,
241+
validation_fraction=self.validation_fraction,
242+
n_iter_no_change=self.n_iter_no_change,
243+
tol=self.tol,
244+
ccp_alpha=self.ccp_alpha,
245+
tree_weighting_method=self.tree_weighting_method,
246+
)

src/sknnr/_rfnn.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,6 @@ class RFNNRegressor(WeightedTreesNNRegressor):
109109
Number of neighbors to use by default for `kneighbors` queries.
110110
weights : {"uniform", "distance"}, callable or None, default="uniform"
111111
Weight function used in prediction.
112-
algorithm : {"auto", "ball_tree", "kd_tree", "brute"}, default="auto"
113-
Algorithm used to compute the nearest neighbors.
114-
leaf_size : int, default=30
115-
Leaf size passed to `BallTree` or `KDTree`.
116112
117113
Attributes
118114
----------
@@ -184,8 +180,6 @@ def __init__(
184180
forest_weights: Literal["uniform"] | ArrayLike[float] = "uniform",
185181
n_neighbors: int = 5,
186182
weights: Literal["uniform", "distance"] | Callable = "uniform",
187-
algorithm: Literal["auto", "ball_tree", "kd_tree", "brute"] = "auto",
188-
leaf_size: int = 30,
189183
):
190184
self.n_estimators = n_estimators
191185
self.criterion_reg = criterion_reg
@@ -213,8 +207,6 @@ def __init__(
213207
super().__init__(
214208
n_neighbors=n_neighbors,
215209
weights=weights,
216-
algorithm=algorithm,
217-
leaf_size=leaf_size,
218210
n_jobs=self.n_jobs,
219211
)
220212

0 commit comments

Comments
 (0)