-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
PERF: Native version of is_hashable
#59565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/lib.pyx
Outdated
# fail this test. | ||
|
||
# Reconsider this decision once this numpy bug is fixed: | ||
# https://github.com/numpy/numpy/issues/5562 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this issue has been closed. maybe this can be reconsidered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've benchmarked a version using isinstance(obj, collections.abc.Hashable)
. It is considerably slower than the current python implementation (~2x)
Can you provide some measurements on the perf impact? |
215fe25
to
40f6965
Compare
I've benchmarked the current pure Python implementation against two different Cython versions with the following results. Overall the proposed cython version is twice a fast as the pure python version in the happy path and 10x faster in the exceptional path. The proposed cython version is ~33 % faster than the simple Cython version in the happy and 10x faster in the exceptional path def is_hashable(obj: object) -> bool:
try:
hash(obj)
except TypeError:
return False
else:
return True
cpdef bint c_is_hashable(object obj) noexcept:
if PyLong_CheckExact(obj) or PyFloat_CheckExact(obj) or PyUnicode_CheckExact(obj):
return True
if PyTuple_CheckExact(obj) or PyFrozenSet_CheckExact(obj):
for o in obj:
if not c_is_hashable(o):
return False
return True
if PyAnySet_CheckExact(obj) or PyDict_CheckExact(obj) or PyList_CheckExact(obj):
return False
try:
hash(obj)
except TypeError:
return False
else:
return True
cpdef bint c_is_hashable2(object obj) noexcept:
try:
hash(obj)
except TypeError:
return False
else:
return True
for obj in (2, "hello world", ("hello", "world", 1, 2.0), ([],), np.int64(5), []):
print(repr(obj))
%timeit is_hashable(obj)
%timeit c_is_hashable(obj)
%timeit c_is_hashable2(obj)
print("")
>>> 2
>>> 58.8 ns ± 0.344 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 33.3 ns ± 0.333 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 47.9 ns ± 0.284 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 'hello world'
>>> 75.4 ns ± 0.703 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 33.7 ns ± 0.561 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 42.1 ns ± 0.508 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> ('hello', 'world', 1, 2.0)
>>> 101 ns ± 0.421 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 46.8 ns ± 0.493 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 71.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> ([],)
>>> 392 ns ± 2.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> 39.4 ns ± 0.87 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 464 ns ± 1.95 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> np.int64(5)
>>> 66 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 47.8 ns ± 1.08 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 46.6 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> []
>>> 396 ns ± 6.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> 36 ns ± 0.578 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 477 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) |
9c80d17
to
befea90
Compare
59fca35
to
f9a3950
Compare
pandas/_libs/lib.pyx
Outdated
return util.is_float_object(obj) | ||
|
||
|
||
cpdef bint is_hashable(object obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does moving this to Cython make a difference for performance? I assume most of the gain is just from short circuiting a hash attempt through the isinstance checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some benchmarking - but not addressing exact this concern. However the checks for tuple and frozenset will probably be slow in pure Python even using all()
f9a3950
to
5f79b59
Compare
is_long = PyLong_CheckExact(obj) | ||
is_float = PyFloat_CheckExact(obj) | ||
is_unicode = PyUnicode_CheckExact(obj) | ||
is_tuple = PyTuple_CheckExact(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason to use CheckExact
over Check
for the container types? I think the latter is preferable, given subclasses of these aren't that uncommon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation for this is pandas' own FrozenList
which is hashable and a subclass of list
. I agree that PyXXXX_Check(...)
would be more intuitive but due to corner cases like FrozenList
I think we need the exact checks to maintain the correct behaviour
|
||
# tuple and frozenset is hashable if and only if all elements are hashable | ||
if is_tuple: | ||
for o in <tuple>obj: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the cast actually matter? I again would think Cython would just always defer to PySequence_GetItem
for iteration.
If that's true, I suppose there could be some difference between PySequence_GetItem
and a more exact PyTuple_GETITEM
call that could be dispatched to, with the latter not having to deal with reference counting. However, the point of writing in Cython is to happily trade some performance compared to the raw C code for some higher level abstractions.
Where that line really matters is up for debate, but I generally worry about trying to be too exact in this function. I'm also a little unclear why the non-hashable items can't short-circuit to quickly throw an error during a PyObject_Hash
call. I would think the Python built-in tuple and frozen set should be able to do this faster than us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've looked into the generated C-code for this version with the casts and a version without the cast. Assuming that the C-compiler is able to hoist some conditions out of the generated loops, the generated assembly for the two versions should be exactly the same. I went too far down the path of micro optimisations.
As to whether the built-in tuple
and frozenset
can do this faster than us I've looked into the implementations in CPython. tuple
essentially call the hash function on all the elements in a loop and do some bit-twiddling to combine the hashes into one hash. The C-code of that loop is very similar to that generated by cython but we save calls to PyObject_Hash(...)
on the elements which on the micro level should give a nice speed-up. For frozenset
I've discover that it always is hashable, which makes perfect sense as elements must hashable to be put in a set
. This discovery allows me to make some simplifications to the code.
CPython implementation
tuple
Note PyTuple_GET_ITEM(...)
is essentially just indexing into the underlying array
https://github.com/python/cpython/blob/0cba289870d5cd41f24b2f63b9480e4593aa2330/Include/cpython/tupleobject.h#L27
Is there anything user-facing that would be improved by this? I see the performance improvements in the benchmarks you posted, but being down in the range of nanoseconds I'm unsure if its worth the added complexity |
Fair point! See below user-facing benchmark. I cannot judge if it is worth it or not >>> idx = pd.MultiIndex([[] for _ in range(20)], [[] for _ in range(20)])
>>> new_names = [f"new_name_{i}" for i in range(20)]
>>> %timeit idx.set_names(new_names, inplace=True)
8.68 μs ± 1.71 μs per loop (mean ± std. dev. of 7 runs, 100,000 loops each) <-- before
7.79 μs ± 3.86 μs per loop (mean ± std. dev. of 7 runs, 100,000 loops each) <-- after |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the PR, but agreed that I'm not sure this maintenance burden of this new implementation is worth the minimal speedup. Thanks for the effort here but closing |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Add native version of
is_hashable(...)
for some marginal performance improvements.