-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
BUG: creating Categorical from pandas Index/Series with "object" dtype infers string #62080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 15 commits
c4e1c18
e1a893d
cfa767f
c0ae870
5188b81
b63a723
0fb42cc
9216954
8f460ac
87a54fe
cddc574
e83e4f9
5ed039a
9b4b2d9
4855994
1b81162
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -454,6 +454,11 @@ def __init__( | |
codes = arr.indices.to_numpy() | ||
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered) | ||
else: | ||
# Check for pandas Series/ Index with object dtye | ||
preserve_object_dtpe = False | ||
if isinstance(values, (ABCSeries, ABCIndex)): | ||
if getattr(values.dtype, "name", None) == "object": | ||
preserve_object_dtpe = True | ||
if not isinstance(values, ABCIndex): | ||
# in particular RangeIndex xref test_index_equal_range_categories | ||
values = sanitize_array(values, None) | ||
|
@@ -470,7 +475,13 @@ def __init__( | |
"by passing in a categories argument." | ||
) from err | ||
|
||
# we're inferring from values | ||
# If we should prserve object dtype, force categories to object dtype | ||
|
||
if preserve_object_dtpe: | ||
# Only preserve object dtype if not all elements are strings | ||
if not all(isinstance(x, str) for x in categories): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this check necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the change to always preserve object dtype for categories when constructing a Categorical from a pandas Series or Index with dtype="object" is a behavioral change that affects a wide range of pandas internals and user-facing APIs. |
||
from pandas import Index | ||
|
||
categories = Index(categories, dtype=object, copy=False) | ||
dtype = CategoricalDtype(categories, dtype.ordered) | ||
|
||
elif isinstance(values.dtype, CategoricalDtype): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just check values.dtype == object