HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934
HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934jinhyukify wants to merge 6 commits intoapache:masterfrom
Conversation
|
Please add a dummy change in pom.xml so we can run all the tests in HBase to see if there are any problems with the changes? once we confirm there is no problem, you can do a force push to remove the dummy commit and then merge the PR. And why not read 8 bytes at once? |
|
@Apache9 Thank you for checking!
The existing hashes (Jenkins, Murmur, Murmur3) are all 4-byte based, so they operate on 4-byte chunks. There is currently no place in the algorithms where 8-byte reads are used. In theory we could read 8 bytes and split them into two ints, but given the current structure it would not provide much benefit. If we add hash functions that operate on 8-byte words in the future, we could revisit this. |
| return (int) assembleCrossingLE(offset, Bytes.SIZEOF_INT); | ||
| } | ||
|
|
||
| private long assembleCrossingLE(int offset, int wordBytes) { |
There was a problem hiding this comment.
So we're adding a non-trivial amount of code here, and it seems to overlap with the existing get(int offset) implementation to some degree. Would it make sense to refactor get to call this new method instead, so we can reduce code duplication? We could consider that approach as long as the performance overhead is negligible.
There was a problem hiding this comment.
Thank you for checking.
I changed to use assembleCrossingLE method in get method.
There was a problem hiding this comment.
Pull request overview
This PR targets HBASE-29995 by improving hashing throughput (Bloom filter-related hash keys and hash functions) via little-endian 4-byte loads, reducing per-byte get() calls in the hot loops.
Changes:
- Add
HashKey#getIntLEand switch Murmur/Murmur3/Jenkins inner loops to load 4 bytes at a time. - Introduce
LittleEndianBytes(with optional Unsafe acceleration) and extendUnsafeAccesswith LE int helpers. - Update
RowBloomHashKey/RowColBloomHashKeyimplementations and add new unit tests for LE conversions and ROWCOL bloom key layout.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java | Adds getIntLE contract for faster 4-byte little-endian reads. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteArrayHashKey.java | Implements getIntLE for byte[]-backed keys. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowBloomHashKey.java | Implements getIntLE using row fast-path reads. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowColBloomHashKey.java | Adds getIntLE fast paths + crossing-boundary assembly; caches total length. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/LittleEndianBytes.java | New LE int read/write utility (Unsafe-accelerated when available). |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/UnsafeAccess.java | Adds toIntLE/putIntLE helpers for byte[] and ByteBuffer. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/MurmurHash.java | Uses hashKey.getIntLE for 4-byte block loads. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/MurmurHash3.java | Uses hashKey.getIntLE for 4-byte block loads. |
| hbase-common/src/main/java/org/apache/hadoop/hbase/util/JenkinsHash.java | Uses hashKey.getIntLE for 4-byte block loads. |
| hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytesBase.java | New base tests covering LE int conversions and Cell row/qualifier reads. |
| hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytes.java | Tagged JUnit5 runner for the base tests (unsafe-enabled path). |
| hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytesWoUnsafe.java | Tagged JUnit5 runner that mocks unaligned=false (pure-Java path). |
| hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestRowColBloomHashKey.java | New tests validating ROWCOL bloom virtual layout and getIntLE consistency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Jira https://issues.apache.org/jira/browse/HBASE-29995