-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
CLN: Index.append() refactoring #16236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
5029592
d5c7d77
44800a4
528295d
554ee79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ | |
is_object_dtype, | ||
is_bool_dtype, | ||
is_dtype_equal, | ||
is_range, | ||
_NS_DTYPE, | ||
_TD_DTYPE) | ||
from pandas.core.dtypes.generic import ( | ||
|
@@ -45,6 +46,8 @@ def get_dtype_kinds(l): | |
# if to_concat contains different tz, | ||
# the result must be object dtype | ||
typ = str(arr.dtype) | ||
elif is_range(arr): | ||
|
||
typ = 'range' | ||
elif is_datetime64_dtype(dtype): | ||
typ = 'datetime' | ||
elif is_timedelta64_dtype(dtype): | ||
|
@@ -559,3 +562,38 @@ def convert_sparse(x, axis): | |
# coerce to object if needed | ||
result = result.astype('object') | ||
return result | ||
|
||
|
||
def _concat_indexes_same_dtype_rangeindex(indexes): | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a comment about what this is doing / guarantees and example would be nice as well. |
||
start = step = next = None | ||
|
||
for obj in indexes: | ||
if not len(obj): | ||
continue | ||
|
||
if start is None: | ||
# This is set by the first non-empty index | ||
start = obj._start | ||
if step is None and len(obj) > 1: | ||
step = obj._step | ||
elif step is None: | ||
# First non-empty index had only one element | ||
if obj._start == start: | ||
return _concat_index_asobject(indexes) | ||
step = obj._start - start | ||
|
||
non_consecutive = ((step != obj._step and len(obj) > 1) or | ||
(next is not None and obj._start != next)) | ||
if non_consecutive: | ||
# Not nice... but currently what happens in NumericIndex: | ||
|
||
return _concat_index_asobject(indexes) | ||
|
||
if step is not None: | ||
next = obj[-1] + step | ||
|
||
if start is None: | ||
start = obj._start | ||
step = obj._step | ||
stop = obj._stop if next is None else next | ||
return indexes[0].__class__(start, stop, step) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is ugly... but the only alternative I see (an import inside the function) is uglier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its ok here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah no, this wouldn't play nice with the case when no range index is returned, ignore this |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1741,10 +1741,9 @@ def append(self, other): | |
names = set([obj.name for obj in to_concat]) | ||
name = None if len(names) > 1 else self.name | ||
|
||
if self.is_categorical(): | ||
# if calling index is category, don't check dtype of others | ||
from pandas.core.indexes.category import CategoricalIndex | ||
return CategoricalIndex._append_same_dtype(self, to_concat, name) | ||
return self._concat(to_concat, name) | ||
|
||
def _concat(self, to_concat, name): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think it would make more sense to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since it is only use by append, I prefer using append in the name, but no strong feelings There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right that it's currently used only by append, but usually you expect x.append(y) to concatenate x to y or to elements of y; instead this only concatenates elements of y. So since you don't object I will go with my proposal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
in the end it is used to concatenate both y to x, just that this is passed like that in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, I don't object to that. We can agree it is a concat operation used to implement appending: the switch happens when |
||
|
||
typs = _concat.get_dtype_kinds(to_concat) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not similar to the other methods, which detect a type. a RangeIndex is not a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called in
get_dtype_kinds
, see below, so I could have just usedisinstance(arr, ABCRangeIndex)
. But I tried to be coherent withis_categorical
andis_sparse
... what exactly is a "type"?!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are conflating a type with an Index. is_categorical will detect a categorical type which could be a Categorical (or the dtype=='category'), a CategoricalIndex happens to be of this type as well.
However RangeIndex is simply an Index subclassing Int64Index. its not a type (its dtype is int64). types can be the dtype of an Index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... still,
... so the type returned by
get_dtype_kinds
is already not the dtype, and not even thedtype.kind
.But anyway, I don't particularly care about changing
get_dtype_kinds
. We just need a method which can tell us whether two indexes can be natively concatenated: this is currentlyget_dtype_kinds
, but I can write a new one if you prefer.