-
-
Notifications
You must be signed in to change notification settings - Fork 606
Description
Bug Report & Proposed Fix
Component: river/ensemble/stacking.py
Class: StackingClassifier
Library: River
Summary
This report proposes API consistency and feature safety fixes for StackingClassifier. The current implementation contains issues that may lead to broken method chaining, silent feature corruption, and inconsistent capability reporting.
These fixes are backward compatible and improve robustness for streaming ensemble learning.
Issue 1 β learn_one Does Not Return self
Problem
River estimators are expected to return self from learn_one to support method chaining and API consistency.
Current implementation ends without a return statement, causing None to be returned.
Impact
Breaks usage patterns such as:
model.learn_one(x, y).predict_proba_one(x)Fix
Add a return statement at the end of learn_one:
self.meta_classifier.learn_one(oof, y)
return selfIssue 2 β Feature Name Collision When include_features=True
Problem
Original features are merged directly into the meta-feature dictionary:
if self.include_features:
oof.update(x)If an input feature name matches a stacking feature (e.g. "oof_0_True"), it will overwrite the prediction feature silently.
Impact
Silent corruption of meta-model training data leading to degraded or unstable performance.
Fix
Namespace original features to prevent collision:
if self.include_features:
oof.update({f"orig_{k}": v for k, v in x.items()})Issue 3 β Meta-Feature Space Drift with New Classes
Problem
Base classifiers may not output probabilities for unseen classes. When a new class appears later in the stream, new meta-features are introduced dynamically.
Impact
Non-stationary feature space for the meta-classifier may slow convergence and introduce instability.
Suggested Improvement
Ensure probabilities for all known classes are included, defaulting to 0.0 when absent. This may require standardized class tracking across River classifiers.
Issue 4 β _multiclass Property May Misrepresent Capability
Problem
The current implementation relies only on the meta-classifier to determine multiclass capability:
@property
def _multiclass(self):
return self.meta_classifier._multiclassThis ignores the capabilities of base models.
Impact
May incorrectly advertise the ensemble as binary-only or multiclass when underlying models differ.
Fix
Use both base and meta models to determine capability:
@property
def _multiclass(self):
return (
all(getattr(model, "_multiclass", False) for model in self)
and getattr(self.meta_classifier, "_multiclass", False)
)Proposed Patch (Combined)
def learn_one(self, x, y):
oof = {}
for i, clf in enumerate(self):
y_pred = clf.predict_proba_one(x)
for k, p in y_pred.items():
oof[f"oof_{i}_{k}"] = p
clf.learn_one(x, y)
if self.include_features:
oof.update({f"orig_{k}": v for k, v in x.items()})
self.meta_classifier.learn_one(oof, y)
return self
@property
def _multiclass(self):
return (
all(getattr(model, "_multiclass", False) for model in self)
and getattr(self.meta_classifier, "_multiclass", False)
)Benefits of This Fix
β Restores River API consistency
β Prevents silent feature overwrites
β Improves stability in streaming classification
β More accurate capability reporting
These changes improve reliability while keeping the behavior aligned with River's online learning design principles.