Skip to content

Multidimensional/composite target encodingΒ #429

@sunishchal-recentive

Description

@sunishchal-recentive

I would like to do target encoding on the composite of multiple columns, but the current functionality only allows a single column to be encoded.

For example: I have a column names product and another named color and I'd like a unique target encoding value for each product+color combination. Currently, I can only have an encoding for each unique product and each unique color separately.

The workaround would be to concatenate the column values together and then target encode, but that is a bit clunky and leads to some unnecessary categorical features in my dataframe. Let me know if this is something worth raising a PR for.

The implementation I'm thinking is optionally allowing a new argument called something like composite_cols (open to better naming suggestions). This arg can be a list of lists, where each inner list indicates the column names to be concatenated together, and each element in the outer list makes up a composite column. If passed, convert the values to string and concatenate them together before passing into the encoder the same way as regular cols. The composite column can be named as the concatenation of all its component column names.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions