PERF: Native version of `is_hashable` #59565

Tolker-KU · 2024-08-20T18:43:09Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Add native version of is_hashable(...) for some marginal performance improvements.

jbrockmendel · 2024-08-20T20:38:31Z

pandas/_libs/lib.pyx

+    # fail this test.
+
+    # Reconsider this decision once this numpy bug is fixed:
+    # https://github.com/numpy/numpy/issues/5562


looks like this issue has been closed. maybe this can be reconsidered?

I've benchmarked a version using isinstance(obj, collections.abc.Hashable). It is considerably slower than the current python implementation (~2x)

jbrockmendel · 2024-08-20T20:38:50Z

Can you provide some measurements on the perf impact?

Tolker-KU · 2024-08-21T19:21:22Z

Can you provide some measurements on the perf impact?

I've benchmarked the current pure Python implementation against two different Cython versions with the following results. Overall the proposed cython version is twice a fast as the pure python version in the happy path and 10x faster in the exceptional path. The proposed cython version is ~33 % faster than the simple Cython version in the happy and 10x faster in the exceptional path

def is_hashable(obj: object) -> bool:
    try:
        hash(obj)
    except TypeError:
        return False
    else:
        return True


cpdef bint c_is_hashable(object obj) noexcept:
    if PyLong_CheckExact(obj) or PyFloat_CheckExact(obj) or PyUnicode_CheckExact(obj):
        return True

    if PyTuple_CheckExact(obj) or PyFrozenSet_CheckExact(obj):
        for o in obj:
            if not c_is_hashable(o):
                return False
        return True

    if PyAnySet_CheckExact(obj) or PyDict_CheckExact(obj) or PyList_CheckExact(obj):
        return False

    try:
        hash(obj)
    except TypeError:
        return False
    else:
        return True


cpdef bint c_is_hashable2(object obj) noexcept:
    try:
        hash(obj)
    except TypeError:
        return False
    else:
        return True


for obj in (2, "hello world", ("hello", "world", 1, 2.0), ([],), np.int64(5), []):
    print(repr(obj))
    %timeit is_hashable(obj)
    %timeit c_is_hashable(obj)
    %timeit c_is_hashable2(obj)
    print("")
 
>>> 2
>>> 58.8 ns ± 0.344 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 33.3 ns ± 0.333 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 47.9 ns ± 0.284 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> 'hello world'
>>> 75.4 ns ± 0.703 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 33.7 ns ± 0.561 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 42.1 ns ± 0.508 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> ('hello', 'world', 1, 2.0)
>>> 101 ns ± 0.421 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 46.8 ns ± 0.493 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 71.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> ([],)
>>> 392 ns ± 2.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> 39.4 ns ± 0.87 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 464 ns ± 1.95 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> np.int64(5)
>>> 66 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 47.8 ns ± 1.08 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 46.6 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> []
>>> 396 ns ± 6.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> 36 ns ± 0.578 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> 477 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

WillAyd · 2024-08-22T12:51:46Z

pandas/_libs/lib.pyx

    return util.is_float_object(obj)


+cpdef bint is_hashable(object obj):


Does moving this to Cython make a difference for performance? I assume most of the gain is just from short circuiting a hash attempt through the isinstance checks?

I've done some benchmarking - but not addressing exact this concern. However the checks for tuple and frozenset will probably be slow in pure Python even using all()

#59565 (comment)

WillAyd · 2024-08-28T21:00:19Z

pandas/_libs/lib.pyx

+    is_long = PyLong_CheckExact(obj)
+    is_float = PyFloat_CheckExact(obj)
+    is_unicode = PyUnicode_CheckExact(obj)
+    is_tuple = PyTuple_CheckExact(obj)


Is there any particular reason to use CheckExact over Check for the container types? I think the latter is preferable, given subclasses of these aren't that uncommon

The motivation for this is pandas' own FrozenList which is hashable and a subclass of list. I agree that PyXXXX_Check(...) would be more intuitive but due to corner cases like FrozenList I think we need the exact checks to maintain the correct behaviour

WillAyd · 2024-08-28T21:03:43Z

pandas/_libs/lib.pyx

+
+    # tuple and frozenset is hashable if and only if all elements are hashable
+    if is_tuple:
+        for o in <tuple>obj:


Does the cast actually matter? I again would think Cython would just always defer to PySequence_GetItem for iteration.

If that's true, I suppose there could be some difference between PySequence_GetItem and a more exact PyTuple_GETITEM call that could be dispatched to, with the latter not having to deal with reference counting. However, the point of writing in Cython is to happily trade some performance compared to the raw C code for some higher level abstractions.

Where that line really matters is up for debate, but I generally worry about trying to be too exact in this function. I'm also a little unclear why the non-hashable items can't short-circuit to quickly throw an error during a PyObject_Hash call. I would think the Python built-in tuple and frozen set should be able to do this faster than us?

Thanks!

I've looked into the generated C-code for this version with the casts and a version without the cast. Assuming that the C-compiler is able to hoist some conditions out of the generated loops, the generated assembly for the two versions should be exactly the same. I went too far down the path of micro optimisations.

As to whether the built-in tuple and frozenset can do this faster than us I've looked into the implementations in CPython. tuple essentially call the hash function on all the elements in a loop and do some bit-twiddling to combine the hashes into one hash. The C-code of that loop is very similar to that generated by cython but we save calls to PyObject_Hash(...) on the elements which on the micro level should give a nice speed-up. For frozenset I've discover that it always is hashable, which makes perfect sense as elements must hashable to be put in a set. This discovery allows me to make some simplifications to the code.

Generated C-code

With casts

Without casts

CPython implementation

tuple

https://github.com/python/cpython/blob/0cba289870d5cd41f24b2f63b9480e4593aa2330/Objects/tupleobject.c#L318-L342

Note PyTuple_GET_ITEM(...) is essentially just indexing into the underlying array
https://github.com/python/cpython/blob/0cba289870d5cd41f24b2f63b9480e4593aa2330/Include/cpython/tupleobject.h#L27

WillAyd · 2024-08-31T22:59:26Z

I've benchmarked the current pure Python implementation against two different Cython versions with the following results.

Is there anything user-facing that would be improved by this? I see the performance improvements in the benchmarks you posted, but being down in the range of nanoseconds I'm unsure if its worth the added complexity

Tolker-KU · 2024-09-01T10:00:22Z

I've benchmarked the current pure Python implementation against two different Cython versions with the following results.

Is there anything user-facing that would be improved by this? I see the performance improvements in the benchmarks you posted, but being down in the range of nanoseconds I'm unsure if its worth the added complexity

Fair point! See below user-facing benchmark. I cannot judge if it is worth it or not

>>> idx = pd.MultiIndex([[] for _ in range(20)], [[] for _ in range(20)])
>>> new_names = [f"new_name_{i}" for i in range(20)]
>>> %timeit idx.set_names(new_names, inplace=True)
8.68 μs ± 1.71 μs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  <-- before
7.79 μs ± 3.86 μs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  <-- after

github-actions · 2024-10-02T00:07:13Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2024-10-29T20:36:26Z

Thanks for the PR, but agreed that I'm not sure this maintenance burden of this new implementation is worth the minimal speedup. Thanks for the effort here but closing

Tolker-KU requested a review from WillAyd as a code owner August 20, 2024 18:43

Tolker-KU marked this pull request as draft August 20, 2024 18:50

jbrockmendel reviewed Aug 20, 2024

View reviewed changes

Tolker-KU force-pushed the feature/fast-is_hashable branch from 215fe25 to 40f6965 Compare August 21, 2024 19:14

Tolker-KU force-pushed the feature/fast-is_hashable branch 3 times, most recently from 9c80d17 to befea90 Compare August 21, 2024 19:35

Tolker-KU marked this pull request as ready for review August 22, 2024 04:01

Tolker-KU requested a review from mroeschke as a code owner August 22, 2024 04:01

Tolker-KU force-pushed the feature/fast-is_hashable branch from 59fca35 to f9a3950 Compare August 22, 2024 04:09

WillAyd reviewed Aug 22, 2024

View reviewed changes

rhshadrach added Performance Memory or execution speed performance hashing hash_pandas_object labels Aug 25, 2024

Tolker-KU added 5 commits August 28, 2024 21:09

Add fast version of is_hashable

76e3cf3

Catch exceptions

8eb6ac4

Import performance of is_hashable by checking for common types

e7e392b

Fix docstring check

3e44533

Clean up is_hashable

5f79b59

Tolker-KU force-pushed the feature/fast-is_hashable branch from f9a3950 to 5f79b59 Compare August 28, 2024 19:39

WillAyd reviewed Aug 28, 2024

View reviewed changes

Tolker-KU added 2 commits August 31, 2024 22:19

frozenset is always hashable

4e3ff48

Mark as noexcept for better C codegen

1f51061

github-actions bot added the Stale label Oct 2, 2024

mroeschke closed this Oct 29, 2024

		return util.is_float_object(obj)


		cpdef bint is_hashable(object obj):

Uh oh!

PERF: Native version of is_hashable #59565

PERF: Native version of is_hashable #59565

Uh oh!

Conversation

Tolker-KU commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

Tolker-KU Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Aug 20, 2024

Uh oh!

Tolker-KU commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

Tolker-KU Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

Tolker-KU Aug 31, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

Tolker-KU Aug 31, 2024

Choose a reason for hiding this comment

With casts

Without casts

tuple

Uh oh!

WillAyd commented Aug 31, 2024

Uh oh!

Tolker-KU commented Sep 1, 2024

Uh oh!

github-actions bot commented Oct 2, 2024

Uh oh!

mroeschke commented Oct 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PERF: Native version of `is_hashable` #59565

PERF: Native version of `is_hashable` #59565

Tolker-KU commented Aug 20, 2024 •

edited

Loading

Tolker-KU Aug 21, 2024 •

edited

Loading

Tolker-KU commented Aug 21, 2024 •

edited

Loading