-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
PERF: Native version of is_hashable
#59565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
76e3cf3
8eb6ac4
e7e392b
3e44533
5f79b59
4e3ff48
1f51061
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,19 +19,30 @@ from cpython.datetime cimport ( | |
time, | ||
timedelta, | ||
) | ||
from cpython.dict cimport PyDict_CheckExact | ||
from cpython.float cimport PyFloat_CheckExact | ||
from cpython.iterator cimport PyIter_Check | ||
from cpython.list cimport PyList_CheckExact | ||
from cpython.long cimport PyLong_CheckExact | ||
from cpython.number cimport PyNumber_Check | ||
from cpython.object cimport ( | ||
Py_EQ, | ||
PyObject, | ||
PyObject_Hash, | ||
PyObject_RichCompareBool, | ||
) | ||
from cpython.ref cimport Py_INCREF | ||
from cpython.sequence cimport PySequence_Check | ||
from cpython.set cimport ( | ||
PyAnySet_CheckExact, | ||
PyFrozenSet_CheckExact, | ||
) | ||
from cpython.tuple cimport ( | ||
PyTuple_CheckExact, | ||
PyTuple_New, | ||
PyTuple_SET_ITEM, | ||
) | ||
from cpython.unicode cimport PyUnicode_CheckExact | ||
from cython cimport ( | ||
Py_ssize_t, | ||
floating, | ||
|
@@ -1089,6 +1100,79 @@ def is_float(obj: object) -> bool: | |
return util.is_float_object(obj) | ||
|
||
|
||
cpdef bint is_hashable(object obj): | ||
""" | ||
Return True if hash(obj) will succeed, False otherwise. | ||
|
||
Some types will pass a test against collections.abc.Hashable but fail when | ||
they are actually hashed with hash(). | ||
|
||
Distinguish between these and other types by trying the call to hash() and | ||
seeing if they raise TypeError. | ||
|
||
Returns | ||
------- | ||
bool | ||
|
||
Examples | ||
-------- | ||
>>> import collections | ||
>>> from pandas.api.types import is_hashable | ||
>>> a = ([],) | ||
>>> isinstance(a, collections.abc.Hashable) | ||
True | ||
>>> is_hashable(a) | ||
False | ||
""" | ||
cdef: | ||
bint is_none | ||
bint is_long | ||
bint is_float | ||
bint is_unicode | ||
bint is_tuple | ||
bint is_frozen_set | ||
bint is_dict | ||
bint is_list | ||
bint is_any_set | ||
|
||
# Perform all checks in order to be nice to the branch predictor | ||
is_none = obj is None | ||
is_long = PyLong_CheckExact(obj) | ||
is_float = PyFloat_CheckExact(obj) | ||
is_unicode = PyUnicode_CheckExact(obj) | ||
is_tuple = PyTuple_CheckExact(obj) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any particular reason to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The motivation for this is pandas' own |
||
is_frozen_set = PyFrozenSet_CheckExact(obj) | ||
is_dict = PyDict_CheckExact(obj) | ||
is_list = PyList_CheckExact(obj) | ||
is_any_set = PyAnySet_CheckExact(obj) | ||
|
||
if is_none or is_long or is_float or is_unicode: | ||
return True | ||
|
||
# tuple and frozenset is hashable if and only if all elements are hashable | ||
if is_tuple: | ||
for o in <tuple>obj: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the cast actually matter? I again would think Cython would just always defer to If that's true, I suppose there could be some difference between Where that line really matters is up for debate, but I generally worry about trying to be too exact in this function. I'm also a little unclear why the non-hashable items can't short-circuit to quickly throw an error during a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I've looked into the generated C-code for this version with the casts and a version without the cast. Assuming that the C-compiler is able to hoist some conditions out of the generated loops, the generated assembly for the two versions should be exactly the same. I went too far down the path of micro optimisations. As to whether the built-in CPython implementationtupleNote |
||
if not is_hashable(o): | ||
return False | ||
return True | ||
|
||
if is_frozen_set: | ||
for o in <frozenset>obj: | ||
if not is_hashable(o): | ||
return False | ||
return True | ||
|
||
if is_dict or is_list or is_any_set: | ||
return False | ||
|
||
try: | ||
PyObject_Hash(obj) | ||
except TypeError: | ||
return False | ||
else: | ||
return True | ||
|
||
|
||
def is_integer(obj: object) -> bool: | ||
""" | ||
Return True if given object is integer. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does moving this to Cython make a difference for performance? I assume most of the gain is just from short circuiting a hash attempt through the isinstance checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some benchmarking - but not addressing exact this concern. However the checks for tuple and frozenset will probably be slow in pure Python even using all()
#59565 (comment)