Skip to content

Conversation

@visitorckw
Copy link
Contributor

@visitorckw visitorckw commented Sep 17, 2024

Optimize the keep_top_bit() function to use _Py_bit_length() instead of manually shifting bits. This allows for more efficient execution on hardware that supports specialized instructions such as bsr on x86. Additionally, the function is now marked as inline, as it is small and only used in one location, improving the potential for inlining by the compiler.

@visitorckw visitorckw force-pushed the optimize-keep-top-bit branch from dd1075d to 8014255 Compare September 17, 2024 14:23
else {
return 0;
}
#elif defined(_MSC_VER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_BitScanReverse64() is not available on 32-bit Windows (x86, ARM). You should check if you're running 64-bit Windows.

@visitorckw visitorckw force-pushed the optimize-keep-top-bit branch from 8014255 to 2eb9e24 Compare September 17, 2024 14:42
@visitorckw
Copy link
Contributor Author

I noticed the warnings in the CI tests on Ubuntu related to unsigned/signed integer conversions. However, it seems most of them are not caused by my code changes. In the keep_top_bit function, I did cast the variable n, which is of type Py_ssize_t, to uint64_t. But since n will never be a negative integer, I believe we are safe.

}
#elif defined(_MSC_VER) && defined(_WIN64)
// _BitScanReverse64() is documented to search 64 bits.
Py_BUILD_ASSERT(sizeof(uint64_t) <= 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this assertion, I don't see its purpose.

msb += BIT_LENGTH_TABLE[x];
return msb;
#endif
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add tests for this function in Modules/_testinternalcapi.c? Copy check_bit_length() and test_bit_length() to make 64-bit variant and add your test_bit_length64() to module_functions.

Introduce a new function _Py_bit_length64() to handle 64-bit integers
efficiently. This function utilizes __builtin_clzll() for GCC/Clang and
_BitScanReverse64() for MSVC, ensuring compatibility across different
platforms. For compilers that do not support these intrinsics, a
fallback implementation using a lookup table is provided.

This addition is necessary for safely computing the bit length of
64-bit integers, which is particularly important when working with
types like Py_ssize_t on platforms where size_t is 64-bit, such as
Windows.
Introduce a new test function, test_bit_length64(), which verifies the
correctness of _Py_bit_length64().
…_bit_length()

Optimize the keep_top_bit() function to use _Py_bit_length() instead
of manually shifting bits. This allows for more efficient execution on
hardware that supports specialized instructions such as bsr on x86.
Additionally, the function is now marked as inline, as it is small and
only used in one location, improving the potential for inlining by the
compiler.
@visitorckw visitorckw force-pushed the optimize-keep-top-bit branch from 2eb9e24 to b62ecdb Compare September 17, 2024 16:00
@rhettinger rhettinger self-assigned this Sep 17, 2024
@rhettinger
Copy link
Contributor

rhettinger commented Sep 17, 2024

Thanks for the suggestion; however, the heapq module doesn't need keep_top_bit() to be optimized. It is called only once, outside of loop and only when the number of iterations exceeds 2500. You would never be able to detect this edit in a timing.

Also the code forkeep_top_bit() is already very fast, so there isn't much room for improvement with an intrinsic. I prefer to keep it as-is because it is easier for me to maintain.

@rhettinger rhettinger closed this Sep 17, 2024
@visitorckw
Copy link
Contributor Author

Sad to hear that.
Sorry for creating unnecessary noise, and thanks for your feedback!

@rhettinger
Copy link
Contributor

It wasn't "unnecessary noise". Your suggestions are always welcome.

This particular case wasn't wrong. It was just too microscopic to be worth having an external code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants