Skip to content

ENH: Get rid of int_categorical dtype when n_outcomes <= 5 #66

@hmgaudecker

Description

@hmgaudecker

Noted this pattern while reviewing:

    out["med_schwierigkeit_treppen_pl"] = object_to_int_categorical(
        raw_data["ple0004"],
        renaming={"[3] Gar nicht": 0, "[2] Ein wenig": 1, "[1] Stark": 2},
        ordered=True,
    )

This is problematic because I'll always have to look up the definition when actually using the variable. Should read (I have not looked up whther this is valid code):

    out["med_schwierigkeit_treppen_pl"] = object_to_categorical(
        raw_data["ple0004"],
        renaming={"[3] Gar nicht": "Gar nicht", "[2] Ein wenig": "Ein wenig", "[1] Stark": "Stark"},
        ordered=True,
    )

Applies to all occurrences of this when the number of outcomes is small / all of them have meaningful names. In those cases, either we have an int, or a categorical.

Counterexample: Likert-scales, i.e., the question is literally "On a scale from 1 to 7 where 1 means ... and 7 means ..., what ..."

Edited after realising the Likert-scale case

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions