Output formatting: the repr of the Categorical categories (quoted or unquoted strings?)

Because of the new string dtype, we also implicitly changes the representation of the unique categories in the Categorical dtype repr (aside the object -> str change for the dtype):

>>> pd.options.future.infer_string = False
>>> pd.Categorical(list("abca"))
['a', 'b', 'c', 'a']
Categories (3, object): ['a', 'b', 'c']
>>> pd.options.future.infer_string = True
>>> pd.Categorical(list("abca"))
['a', 'b', 'c', 'a']
Categories (3, str): [a, b, c]
So the actual array values are always quotes, but the list of unique categories in the dtype repr goes from ['a', 'b', 'c'] to [a, b, c].

Brock already fixed a bunch of xfails in the tests because of this in https://github.com/pandas-dev/pandas/pull/61727. And we also run into this issue for the failing doctests (https://github.com/pandas-dev/pandas/issues/61886).

@jbrockmendel mentioned there:

It isn't 100% obvious that the new repr for Categoricals is an improvement, but it's non-crazy.

With which I agree, also no strong opinion either way.

But before we also go fixing doctests, let's confirm that we are OK with this change. Because if we don't have a strong opinion that it is an improvement, we could also leave it how it was originally (and avoiding some breakage because of this for downstream projects or users (eg who also have doctests))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Output formatting: the repr of the Categorical categories (quoted or unquoted strings?) #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Output formatting: the repr of the Categorical categories (quoted or unquoted strings?) #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions