Skip to content

[ENH] Automatically convert non-string categorical data in attributes_arff_from_df #1489

@alphaleporus

Description

@alphaleporus

Is your feature request related to a problem? Please describe.
Currently, attributes_arff_from_df raises a ValueError if a pandas DataFrame contains a categorical column with non-string values (e.g., integers [0, 1]). The user is forced to manually cast these to strings before passing the DataFrame.

Describe the solution you'd like
Instead of raising an error immediately, the function should attempt to automatically convert the categories to strings using .astype(str). This improves UX for users working with mixed-type or integer-encoded categorical data.

Describe alternatives you've considered
Keep raising the error, but improve the message. However, automatic conversion is more user-friendly as ARFF expects string nominals anyway.

Additional context
I have a fix implemented locally and can submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions