Skip to content

Implement get_feature_names_out in BinningProcess #382

@josp70

Description

@josp70

Summary

Please add a get_feature_names_out method to optbinning.binning.binning_process.BinningProcess so that it plays nicely with scikit‑learn ecosystems (e.g., Pipeline, ColumnTransformer) that rely on get_feature_names_out for downstream introspection and composite transformers.
This capability is needed when BinningProcess is used inside a ColumnTransformer/Pipeline and a parent component calls get_feature_names_out() to assemble the final schema.

Use case

When composing preprocessing pipelines, scikit‑learn expects each transformer to expose the names of the columns it outputs. Many components (including Pipeline, ColumnTransformer, and tools built atop them) call get_feature_names_out() to resolve the output schema. Because BinningProcess does not currently implement this method, the parent pipeline cannot compute its global feature name list.
As a result, we have to monkey‑patch a method at runtime to continue using BinningProcess in production pipelines.

Current behavior

Calling get_feature_names_out() on a ColumnTransformer/Pipeline that includes a BinningProcess step raises an error because BinningProcess lacks that method.

from optbinning import BinningProcess

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import pandas as pd

# Load dataset from URL
url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
data = pd.read_csv(url)

variable_names = [x for x in data.columns if x != "medv"]
X = data[variable_names]
y = data["medv"]

num_cols = [x for x in variable_names if x != "chas"]
cat_cols = ["chas"]

bp = BinningProcess(variable_names=variable_names, 
                                  categorical_variables=cat_cols)

ct = ColumnTransformer(
    transformers=[
        ("optbin", bp, variable_names)
    ],
    remainder="drop",
)

pipe = Pipeline([("prep", ct)])

pipe.fit(X, y)

# Fails today because BinningProcess doesn’t implement get_feature_names_out
pipe.get_feature_names_out()

Fails with this:

AttributeError: Transformer optbin (type BinningProcess) does not provide get_feature_names_out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions