Skip to content

HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934

Open
jinhyukify wants to merge 6 commits intoapache:masterfrom
jinhyukify:HBASE-29995
Open

HBASE-29995 Improve existing hash implementations by reading 4 bytes at once#7934
jinhyukify wants to merge 6 commits intoapache:masterfrom
jinhyukify:HBASE-29995

Conversation

@jinhyukify
Copy link
Contributor

@Apache9
Copy link
Contributor

Apache9 commented Mar 15, 2026

Please add a dummy change in pom.xml so we can run all the tests in HBase to see if there are any problems with the changes? once we confirm there is no problem, you can do a force push to remove the dummy commit and then merge the PR.

And why not read 8 bytes at once?

@jinhyukify
Copy link
Contributor Author

@Apache9 Thank you for checking!

And why not read 8 bytes at once?

The existing hashes (Jenkins, Murmur, Murmur3) are all 4-byte based, so they operate on 4-byte chunks. There is currently no place in the algorithms where 8-byte reads are used.

In theory we could read 8 bytes and split them into two ints, but given the current structure it would not provide much benefit. If we add hash functions that operate on 8-byte words in the future, we could revisit this.

return (int) assembleCrossingLE(offset, Bytes.SIZEOF_INT);
}

private long assembleCrossingLE(int offset, int wordBytes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're adding a non-trivial amount of code here, and it seems to overlap with the existing get(int offset) implementation to some degree. Would it make sense to refactor get to call this new method instead, so we can reduce code duplication? We could consider that approach as long as the performance overhead is negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking.
I changed to use assembleCrossingLE method in get method.

@junegunn junegunn requested a review from Copilot March 20, 2026 00:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets HBASE-29995 by improving hashing throughput (Bloom filter-related hash keys and hash functions) via little-endian 4-byte loads, reducing per-byte get() calls in the hot loops.

Changes:

  • Add HashKey#getIntLE and switch Murmur/Murmur3/Jenkins inner loops to load 4 bytes at a time.
  • Introduce LittleEndianBytes (with optional Unsafe acceleration) and extend UnsafeAccess with LE int helpers.
  • Update RowBloomHashKey/RowColBloomHashKey implementations and add new unit tests for LE conversions and ROWCOL bloom key layout.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
hbase-common/src/main/java/org/apache/hadoop/hbase/util/HashKey.java Adds getIntLE contract for faster 4-byte little-endian reads.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteArrayHashKey.java Implements getIntLE for byte[]-backed keys.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowBloomHashKey.java Implements getIntLE using row fast-path reads.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowColBloomHashKey.java Adds getIntLE fast paths + crossing-boundary assembly; caches total length.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/LittleEndianBytes.java New LE int read/write utility (Unsafe-accelerated when available).
hbase-common/src/main/java/org/apache/hadoop/hbase/util/UnsafeAccess.java Adds toIntLE/putIntLE helpers for byte[] and ByteBuffer.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/MurmurHash.java Uses hashKey.getIntLE for 4-byte block loads.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/MurmurHash3.java Uses hashKey.getIntLE for 4-byte block loads.
hbase-common/src/main/java/org/apache/hadoop/hbase/util/JenkinsHash.java Uses hashKey.getIntLE for 4-byte block loads.
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytesBase.java New base tests covering LE int conversions and Cell row/qualifier reads.
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytes.java Tagged JUnit5 runner for the base tests (unsafe-enabled path).
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLittleEndianBytesWoUnsafe.java Tagged JUnit5 runner that mocks unaligned=false (pure-Java path).
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestRowColBloomHashKey.java New tests validating ROWCOL bloom virtual layout and getIntLE consistency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants