Skip to content

Potential Issue with the ExcelFormer Example #460

@mtreca

Description

@mtreca

Hi all!

Stepping through the example script for ExcelFormer, I notice that this line fails with my custom dataset.

AFAIK this seems due to CatToNumTransform adding _{i} strings to the end of categorical feature names, but these names not being changed in the output TensorFrame of the CatToNumTransform. Hence, the mutual_info_sort.transformed_stats being passed to ExcelFormer on line 107 contains _{i} categorical column names while the actual TensorFrame does not.

Case in point, calling this snippet to manually rename statistics to their original name fixes the issue:

fixed_stats = cat_to_num.transformed_stats
for cat_feature in categorical_feature_names:
    stats = fixed_stats.pop(f"{cat_feature}_0")
    fixed_stats[cat_feature] = stats

That fix might not work if the classification task is other than binary though, hence the preferred fix would be for CatToNumTransform to actually rename the column names of the TensorFrames it transforms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions