Skip to content

Returning a numpy array in one hot encoder #442

@adriencrtr

Description

@adriencrtr

Expected Behavior

Even if the category_encoders.one_hot.OneHotEncoder doesn't encode any features, we would expect it to convert a pd.DataFrame into a numpy.ndarray if we set the parameter :
return_df=False

Actual Behavior

When the category_encoders.one_hot.OneHotEncoder deals with a dataframe with only numerical features, the parameter cols is empty and the parameter return_df=False, the fit_transform method returns a pd.DataFrame object.

Steps to Reproduce the Problem

import numpy as np
import pandas as pd

from category_encoders.one_hot import OneHotEncoder

rng = np.random.RandomState(42)

This works

n_rows = 100

col1 = rng.rand(n_rows) * 100
col2 = rng.randint(1, 100, n_rows)
col3 = rng.choice([True, False], n_rows)
modalities = ['A', 'B', 'C', 'D']
col4 = rng.choice(modalities, n_rows)

df = pd.DataFrame({
    'Numeric1': col1,
    'Numeric2': col2,
    'Boolean': col3,
    'Object': col4
})

encoder = OneHotEncoder(
    cols=df.select_dtypes(include=["object", "bool"]).columns,
    return_df=False,
    handle_missing='return_nan'
)
X = encoder.fit_transform(df)
type(X)

Out: pandas.core.frame.DataFrame

This is the unexpected behavior

data = rng.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=200)
df = pd.DataFrame(data=data, columns=["Column 1", "Column 2"])

encoder = OneHotEncoder(
    cols=df.select_dtypes(include=["object", "bool"]).columns,
    return_df=False,
    handle_missing='return_nan'
)
X = encoder.fit_transform(df)
type(X)

Out: numpy.ndarray

Specifications

  • Version: 2.6.3
  • Platform: macOS Sonoma 14.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions