Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble #1512

Copilot · 2026-01-22T23:27:57Z

sklearn 1.7+ changed estimator type detection from checking _estimator_type attribute to using get_tags(). Since BaseEstimator inherits from ClassifierMixin, FLAML regression estimators were incorrectly tagged as classifiers, causing StackingRegressor to reject them during ensemble construction.

Changes

flaml/automl/model.py: Override __sklearn_tags__() in BaseEstimator to dynamically return correct tags based on _estimator_type instance attribute set during initialization
test/automl/test_sklearn_17_compat.py: Add compatibility tests for regression and classification estimators with sklearn's is_regressor()/is_classifier() checks

Example

from sklearn.base import is_regressor
from flaml.automl.model import ExtraTreesEstimator

# Before: returns False (incorrect)
# After: returns True (correct)
is_regressor(ExtraTreesEstimator(task='regression'))

The fix is backward compatible with sklearn < 1.7 where the old _estimator_type attribute mechanism still works.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug]: Regression ensemble fails with scikit-learn 1.7.2: ValueError: The estimator ExtraTreesEstimator should be a regressor</issue_title>
<issue_description>### Describe the bug

When running flaml.AutoML with task="regression" and ensemble=True, the AutoML search completes and then fails during the ensemble construction step (stacking). The failure happens inside scikit-learn StackingRegressor.fit() with:

ValueError: The estimator ExtraTreesEstimator should be a regressor.

This occurs even though the training matrix contains only numeric features (all float64, no object columns, no categorical dtype). I also see the warning:

Using passthrough=False for ensemble because the data contain categorical features.

…but the input data passed to fit() is purely numeric.

Steps to reproduce

Option A (minimal check, likely root cause)

Step 1: Install
pip install flaml==2.5.0 scikit-learn==1.7.2 numpy pandas lightgbm xgboost
Step 2: Run
from sklearn.base import is_regressor
from flaml.automl.model import ExtraTreesEstimator

print("is_regressor(ExtraTreesEstimator()):", is_regressor(ExtraTreesEstimator()))
Step 3: Observe
The output is False (expected: True). This makes scikit-learn stacking validation fail when ExtraTreesEstimator is used as a base regressor.

Option B (end-to-end AutoML failure during ensemble building)

Step 1: Install
pip install flaml==2.5.0 scikit-learn==1.7.2 numpy pandas lightgbm xgboost catboost
Step 2: Run
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from flaml import AutoML

# Numeric-only regression dataset
rng = np.random.RandomState(42)
X = pd.DataFrame(rng.randn(5000, 30), columns=[f"f{i}" for i in range(30)]).astype("float64")
y = X["f0"] * 3.0 - X["f1"] * 2.0 + rng.randn(5000) * 0.1

print("dtypes (should be float64 only):")
print(X.dtypes)

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

automl = AutoML()
automl.fit(
    X_train=X_train,
    y_train=y_train,
    X_val=X_val,
    y_val=y_val,
    task="regression",
    metric="rmse",
    eval_method="holdout",
    ensemble=True,
    estimator_list=["extra_tree", "rf", "xgboost", "lgbm"],
    time_budget=60,
    verbose=3,
    seed=42,
)
Step 3: Observe
The run crashes during ensemble building with:
ValueError: The estimator ExtraTreesEstimator should be a regressor.

Model Used

flaml.AutoML with task="regression", eval_method="holdout", ensemble=True, and a mixed estimator list including extra_tree (plus rf, xgboost, lgbm).

Expected Behavior

I expected FLAML to successfully build the ensemble model for regression, or (at minimum) skip any estimator that cannot be used as a regressor in stacking instead of raising a hard error.

Screenshots and logs

Relevant log lines and stack trace excerpt:
[flaml.automl.logger: ...] INFO - Building ensemble with tuned estimators
[flaml.automl.logger: ...] WARNING - Using passthrough=False for ensemble because the data contain categorical features.

ValueError: The estimator ExtraTreesEstimator should be a regressor.
  ...
  File .../sklearn/ensemble/_base.py:237, in _BaseHeterogeneousEnsemble._validate_estimators
    raise ValueError("The estimator ExtraTreesEstimator should be a regressor.")
Additional Information

FLAML Version: 2.5.0

scikit-learn Version: 1.7.2

Python Version: 3.10 (Conda env: azureml_py310_sdkv2)

Operating System: Linux (Azure ML Compute environment)

Data: numeric-only features (all float64, no categorical/object columns)</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [Bug]: Regression ensemble fails with scikit-learn 1.7.2: ValueError: The estimator ExtraTreesEstimator should be a regressor #1511

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Initial plan

d54ed78

Copilot AI assigned Copilot and thinkall Jan 22, 2026

Copilot started work on behalf of thinkall January 22, 2026 23:28 View session

Copilot AI and others added 2 commits January 22, 2026 23:33

Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

ba917be

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Address code review feedback: improve __sklearn_tags__ implementation

a75632d

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix regression ensemble issue with ExtraTreesEstimator~~ Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble Jan 22, 2026

Copilot AI requested a review from thinkall January 22, 2026 23:38

Copilot finished work on behalf of thinkall January 22, 2026 23:38

thinkall marked this pull request as ready for review January 23, 2026 02:10

thinkall added 2 commits January 23, 2026 02:16

Fix format error

233de25

Emphasize pre-commit

b8d1cb2

jianglibigdata approved these changes Jan 23, 2026

View reviewed changes

thinkall merged commit fc4efe3 into main Jan 23, 2026
16 checks passed

thinkall deleted the copilot/fix-regression-ensemble-error branch January 23, 2026 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble #1512

Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble #1512

Uh oh!

Copilot AI commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble #1512

Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble #1512

Uh oh!

Conversation

Copilot AI commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Steps to reproduce

Option A (minimal check, likely root cause)

Option B (end-to-end AutoML failure during ensemble building)

Model Used

Expected Behavior

Screenshots and logs

Additional Information

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 22, 2026 •

edited

Loading