-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Summary
Please add a get_feature_names_out method to optbinning.binning.binning_process.BinningProcess so that it plays nicely with scikit‑learn ecosystems (e.g., Pipeline, ColumnTransformer) that rely on get_feature_names_out for downstream introspection and composite transformers.
This capability is needed when BinningProcess is used inside a ColumnTransformer/Pipeline and a parent component calls get_feature_names_out() to assemble the final schema.
Use case
When composing preprocessing pipelines, scikit‑learn expects each transformer to expose the names of the columns it outputs. Many components (including Pipeline, ColumnTransformer, and tools built atop them) call get_feature_names_out() to resolve the output schema. Because BinningProcess does not currently implement this method, the parent pipeline cannot compute its global feature name list.
As a result, we have to monkey‑patch a method at runtime to continue using BinningProcess in production pipelines.
Current behavior
Calling get_feature_names_out() on a ColumnTransformer/Pipeline that includes a BinningProcess step raises an error because BinningProcess lacks that method.
from optbinning import BinningProcess
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import pandas as pd
# Load dataset from URL
url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
data = pd.read_csv(url)
variable_names = [x for x in data.columns if x != "medv"]
X = data[variable_names]
y = data["medv"]
num_cols = [x for x in variable_names if x != "chas"]
cat_cols = ["chas"]
bp = BinningProcess(variable_names=variable_names,
categorical_variables=cat_cols)
ct = ColumnTransformer(
transformers=[
("optbin", bp, variable_names)
],
remainder="drop",
)
pipe = Pipeline([("prep", ct)])
pipe.fit(X, y)
# Fails today because BinningProcess doesn’t implement get_feature_names_out
pipe.get_feature_names_out()
Fails with this:
AttributeError: Transformer optbin (type BinningProcess) does not provide get_feature_names_out.