Skip to content

find better way of handling nullable bool columns in pandas #672

@liamholmes31

Description

@liamholmes31

What?
pandas bool handling introduces some messy logic to our code (see link below)

https://github.com/azukds/tubular/blob/5bd08306ecc1768baf9347712fa8bf2d13eaf047/tubular/mapping.py#L396C1-L396C9

the issue comes from how pandas handles bool columns with nulls, depending on the backend they can be:

  • put into the all purposes 'object' type
  • for more recent backends, put into a Boolean type which supports nulls

The default behaviour when creating a column seems to be to use the object type, which we then have to cast to a better boolean type.

Should look into whether there's a way to force better types to be used by default (might require raising our min pandas version).

The below could be useful:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

Why?
tidy up handling of nullable booleans

How?
Will be a bit of a research piece to find the best solution!

Metadata

Metadata

Assignees

No one assigned

    Labels

    tech-debtcleaning up legacy code, or making changes for maintainability purposes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions