Skip to content

learn_one Should Return self (API Consistency)Β #1752

@codeMaestro78

Description

@codeMaestro78

Bug Report & Proposed Fix

Component: river/ensemble/stacking.py
Class: StackingClassifier
Library: River


Summary

This report proposes API consistency and feature safety fixes for StackingClassifier. The current implementation contains issues that may lead to broken method chaining, silent feature corruption, and inconsistent capability reporting.

These fixes are backward compatible and improve robustness for streaming ensemble learning.


Issue 1 β€” learn_one Does Not Return self

Problem

River estimators are expected to return self from learn_one to support method chaining and API consistency.

Current implementation ends without a return statement, causing None to be returned.

Impact

Breaks usage patterns such as:

model.learn_one(x, y).predict_proba_one(x)

Fix

Add a return statement at the end of learn_one:

self.meta_classifier.learn_one(oof, y)
return self

Issue 2 β€” Feature Name Collision When include_features=True

Problem

Original features are merged directly into the meta-feature dictionary:

if self.include_features:
    oof.update(x)

If an input feature name matches a stacking feature (e.g. "oof_0_True"), it will overwrite the prediction feature silently.

Impact

Silent corruption of meta-model training data leading to degraded or unstable performance.

Fix

Namespace original features to prevent collision:

if self.include_features:
    oof.update({f"orig_{k}": v for k, v in x.items()})

Issue 3 β€” Meta-Feature Space Drift with New Classes

Problem

Base classifiers may not output probabilities for unseen classes. When a new class appears later in the stream, new meta-features are introduced dynamically.

Impact

Non-stationary feature space for the meta-classifier may slow convergence and introduce instability.

Suggested Improvement

Ensure probabilities for all known classes are included, defaulting to 0.0 when absent. This may require standardized class tracking across River classifiers.


Issue 4 β€” _multiclass Property May Misrepresent Capability

Problem

The current implementation relies only on the meta-classifier to determine multiclass capability:

@property
def _multiclass(self):
    return self.meta_classifier._multiclass

This ignores the capabilities of base models.

Impact

May incorrectly advertise the ensemble as binary-only or multiclass when underlying models differ.

Fix

Use both base and meta models to determine capability:

@property
def _multiclass(self):
    return (
        all(getattr(model, "_multiclass", False) for model in self)
        and getattr(self.meta_classifier, "_multiclass", False)
    )

Proposed Patch (Combined)

def learn_one(self, x, y):
    oof = {}

    for i, clf in enumerate(self):
        y_pred = clf.predict_proba_one(x)
        for k, p in y_pred.items():
            oof[f"oof_{i}_{k}"] = p
        clf.learn_one(x, y)

    if self.include_features:
        oof.update({f"orig_{k}": v for k, v in x.items()})

    self.meta_classifier.learn_one(oof, y)
    return self

@property
def _multiclass(self):
    return (
        all(getattr(model, "_multiclass", False) for model in self)
        and getattr(self.meta_classifier, "_multiclass", False)
    )

Benefits of This Fix

βœ” Restores River API consistency
βœ” Prevents silent feature overwrites
βœ” Improves stability in streaming classification
βœ” More accurate capability reporting


These changes improve reliability while keeping the behavior aligned with River's online learning design principles.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions