-
Notifications
You must be signed in to change notification settings - Fork 0
KAFKA-17747: Add compute topic and group hash #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Signed-off-by: PoAn Yang <[email protected]>
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Summary of ChangesHello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new hashing utilities within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
CodeAnt AI finished reviewing your PR. |
WalkthroughThis pull request introduces Guava library integration into the group-coordinator module and adds two new static utility methods to the Group class for computing deterministic cryptographic hashes of topic and group metadata. Dependencies and import policies are updated accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The changes span multiple concerns: build configuration (straightforward), new public API methods with hashing logic (requires careful verification of algorithm correctness and determinism), and comprehensive test coverage. The heterogeneity of file types and the addition of cryptographic hashing operations to a public interface warrants careful review, though repetition and pattern consistency in tests reduce complexity. Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
KAFKA-17747: Add compute topic and group hashTL;DR: Adds hash computation utilities for topics and groups using Guava's Murmur3 hashing with comprehensive test coverage. Refacto PR SummaryImplements deterministic hash computation methods for Kafka topic metadata and group configurations using Google Guava's Murmur3 algorithm. Change HighlightsClick to expand
Sequence DiagramsequenceDiagram
participant GC as Group Coordinator
participant G as Group Interface
participant TI as Topic Image
participant CI as Cluster Image
participant H as Guava Hasher
GC->>G: computeTopicHash(topicImage, clusterImage)
G->>H: newHasher().putByte(0)
G->>TI: getId(), getName(), getPartitions()
G->>CI: getBrokerRacks()
G->>H: putLong(topicId).putString(name).putInt(partitions)
loop For each partition
G->>H: putInt(partitionId).putString(sortedRacks)
end
H-->>G: hash().asLong()
G-->>GC: Topic hash
GC->>G: computeGroupHash(topicHashes)
G->>G: Sort topic hashes by key
G->>H: combineOrdered(hashCodes)
H-->>G: Combined hash
G-->>GC: Group hash
Testing GuideClick to expand
|
|
/refacto-visz |
|
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces utility methods to compute hashes for topics and groups, which will likely be used for consistency checks or routing. The implementation uses Guava's hashing library. My review focuses on improving the robustness of the hashing logic and code clarity. Specifically, the hashing of Uuids should use the full 128 bits to avoid potential collisions, and the stream processing logic can be simplified using more modern Java stream features. The changes are well-tested, but the tests will need updates to reflect the suggested change in the hashing logic. Overall, this is a good addition.
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(topicImage.id().hashCode()) // topic Id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using topicImage.id().hashCode() to hash the Uuid is not ideal as it truncates the 128-bit UUID into a 32-bit integer hash code before passing it to the Hasher as a long. This significantly increases the probability of hash collisions. To preserve all the information from the UUID, you should hash both the most and least significant bits of the UUID.
Note that you will also need to update the corresponding tests in GroupTest.java to reflect this change.
| .putLong(topicImage.id().hashCode()) // topic Id | |
| .putLong(topicImage.id().getMostSignificantBits()).putLong(topicImage.id().getLeastSignificantBits()) // topic Id |
| .filter(Optional::isPresent) | ||
| .map(Optional::get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: Group Hash Implementation👍 Well Done
📁 Selected files for review (5)
🎯 Custom Instructions
📝 Additional Comments
|
| .map(Optional::get) | ||
| .sorted() | ||
| .collect(Collectors.joining(";")); | ||
| topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Import Statement
Collectors class usage without visible import statement causes compilation failure. Missing import prevents successful build and deployment. Runtime ClassNotFoundException occurs when attempting to execute joining operation.
import java.util.stream.Collectors;
Commitable Suggestion
| topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";" | |
| import java.util.stream.Collectors; |
Standards
- ISO-IEC-25010-Functional-Correctness-Appropriateness
- ISO-IEC-25010-Reliability-Maturity
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(topicImage.id().hashCode()) // topic Id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null Pointer Risk
Direct hashCode() call on topicImage.id() without null validation creates NullPointerException risk. Null topic ID causes hash computation failure and system crash. Service unavailability results from unhandled runtime exception during group hash calculation.
.putLong(Objects.hashCode(topicImage.id())) // topic Id
Commitable Suggestion
| .putLong(topicImage.id().hashCode()) // topic Id | |
| .putLong(Objects.hashCode(topicImage.id())) // topic Id |
Standards
- ISO-IEC-25010-Reliability-Fault-Tolerance
- ISO-IEC-25010-Functional-Correctness-Appropriateness
- DbC-Preconditions
| static long computeGroupHash(Map<String, Long> topicHashes) { | ||
| return Hashing.combineOrdered( | ||
| topicHashes.entrySet() | ||
| .stream() | ||
| .sorted(Map.Entry.comparingByKey()) | ||
| .map(e -> HashCode.fromLong(e.getValue())) | ||
| .toList() | ||
| ).asLong(); | ||
| } | ||
|
|
||
| /** | ||
| * Computes the hash of the topic id, name, number of partitions, and partition racks by Murmur3. | ||
| * | ||
| * @param topicImage The topic image. | ||
| * @param clusterImage The cluster image. | ||
| * @return The hash of the topic. | ||
| */ | ||
| static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static Method Placement
Hash computation methods are placed in Group interface but handle generic metadata operations. These utility methods have no direct relationship to Group behavior and violate SRP. Consider extracting to dedicated HashUtils class for better separation of concerns.
Standards
- SOLID-SRP
- Clean-Code-Class-Organization
- Refactoring-Extract-Class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
build.gradle(1 hunks)checkstyle/import-control-group-coordinator.xml(1 hunks)gradle/dependencies.gradle(2 hunks)group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java(2 hunks)group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (3)
metadata/src/main/java/org/apache/kafka/image/ClusterImage.java (1)
ClusterImage(36-108)metadata/src/main/java/org/apache/kafka/image/TopicImage.java (1)
TopicImage(37-92)metadata/src/main/java/org/apache/kafka/metadata/BrokerRegistration.java (1)
BrokerRegistration(44-398)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)
clients/src/main/java/org/apache/kafka/common/Uuid.java (1)
Uuid(32-196)metadata/src/main/java/org/apache/kafka/image/MetadataImage.java (1)
MetadataImage(33-200)
🔇 Additional comments (4)
checkstyle/import-control-group-coordinator.xml (1)
80-81: Import policy update looks goodAllowing com.google.common.hash only within coordinator/group is appropriately scoped.
gradle/dependencies.gradle (2)
64-65: Guava version mapping addedVersion pin and alias look fine.
151-152: Dependency entry added; remember LICENSE-binary updateAdd Guava to LICENSE-binary and verify dependency-check suppressions if needed.
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (1)
229-237: Deterministic ordered combination for group hashSorting by topic name before Hashing.combineOrdered achieves stable results. LGTM.
| implementation libs.hdrHistogram | ||
| implementation libs.re2j | ||
| implementation libs.slf4jApi | ||
| implementation libs.guava |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Scope Guava correctly; consider exposing metadata as api for the new public methods
- Guava usage is internal to hashing and not exposed in signatures — keeping it as implementation is correct. LGTM.
- New public static methods on Group use TopicImage and ClusterImage types. To avoid downstream compile issues, expose ':metadata' as an api dependency in group-coordinator.
Also ensure LICENSE-binary is updated for the new Guava dep (per dependencies.gradle note).
Proposed change (within project(':group-coordinator') dependencies):
- implementation project(':metadata')
+ api project(':metadata')I'll verify the review comment by examining the relevant code sections. Let me start by inspecting the build configuration and the Group class.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and inspect the build.gradle file around line 1423 for Guava dependency
fd -e gradle "build.gradle" | head -5
Length of output: 12
🏁 Script executed:
#!/bin/bash
# Search for the Group class and its public static methods
rg -n "class Group" --type=java -A 20 | head -50Length of output: 6189
🏁 Script executed:
#!/bin/bash
# Check group-coordinator dependencies configuration
fd -e gradle | xargs rg -l "group-coordinator" | head -5Length of output: 28
🏁 Script executed:
#!/bin/bash
# Look for TopicImage and ClusterImage in public method signatures
rg -nP "public\s+static.*(?:TopicImage|ClusterImage)" --type=javaLength of output: 384
🏁 Script executed:
#!/bin/bash
# Check for LICENSE-binary file and Guava references
fd "LICENSE" -e binary -o -e txt | grep -i licenseLength of output: 116
| static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) { | ||
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(topicImage.id().hashCode()) // topic Id | ||
| .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name | ||
| .putInt(topicImage.partitions().size()); // number of partitions | ||
|
|
||
| topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> { | ||
| topicHasher.putInt(entry.getKey()); // partition id | ||
| String racks = Arrays.stream(entry.getValue().replicas) | ||
| .mapToObj(clusterImage::broker) | ||
| .filter(Objects::nonNull) | ||
| .map(BrokerRegistration::rack) | ||
| .filter(Optional::isPresent) | ||
| .map(Optional::get) | ||
| .sorted() | ||
| .collect(Collectors.joining(";")); | ||
| topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";" | ||
| }); | ||
| return topicHasher.hash().asLong(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full 128-bit UUID, not Uuid.hashCode(), to avoid unnecessary collisions
The doc says “hash of the topic id…”. Using topicImage.id().hashCode() reduces 128-bit UUID to 32 bits, then writes it as a long, increasing collision risk and diverging from the stated intent.
Replace with both UUID halves; optionally avoid building an intermediate rack string.
Apply:
- Hasher topicHasher = hf.newHasher()
- .putByte((byte) 0) // magic byte
- .putLong(topicImage.id().hashCode()) // topic Id
- .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
- .putInt(topicImage.partitions().size()); // number of partitions
+ Hasher topicHasher = hf.newHasher()
+ .putByte((byte) 0) // magic byte
+ // topic id (full 128 bits)
+ .putLong(topicImage.id().getMostSignificantBits())
+ .putLong(topicImage.id().getLeastSignificantBits())
+ .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
+ .putInt(topicImage.partitions().size()); // number of partitions
@@
- String racks = Arrays.stream(entry.getValue().replicas)
- .mapToObj(clusterImage::broker)
- .filter(Objects::nonNull)
- .map(BrokerRegistration::rack)
- .filter(Optional::isPresent)
- .map(Optional::get)
- .sorted()
- .collect(Collectors.joining(";"));
- topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
+ Arrays.stream(entry.getValue().replicas)
+ .mapToObj(clusterImage::broker)
+ .filter(Objects::nonNull)
+ .map(BrokerRegistration::rack)
+ .filter(Optional::isPresent)
+ .map(Optional::get)
+ .sorted()
+ // optional: dedupe racks per partition to avoid double-counting identical rack strings
+ // .distinct()
+ .forEachOrdered(r -> {
+ topicHasher.putString(r, StandardCharsets.UTF_8);
+ topicHasher.putByte((byte) ';');
+ });Note: Tests will need updates to write the full UUID (see test suggestions).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) { | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| .putLong(topicImage.id().hashCode()) // topic Id | |
| .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name | |
| .putInt(topicImage.partitions().size()); // number of partitions | |
| topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> { | |
| topicHasher.putInt(entry.getKey()); // partition id | |
| String racks = Arrays.stream(entry.getValue().replicas) | |
| .mapToObj(clusterImage::broker) | |
| .filter(Objects::nonNull) | |
| .map(BrokerRegistration::rack) | |
| .filter(Optional::isPresent) | |
| .map(Optional::get) | |
| .sorted() | |
| .collect(Collectors.joining(";")); | |
| topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";" | |
| }); | |
| return topicHasher.hash().asLong(); | |
| } | |
| static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) { | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| // topic id (full 128 bits) | |
| .putLong(topicImage.id().getMostSignificantBits()) | |
| .putLong(topicImage.id().getLeastSignificantBits()) | |
| .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name | |
| .putInt(topicImage.partitions().size()); // number of partitions | |
| topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> { | |
| topicHasher.putInt(entry.getKey()); // partition id | |
| Arrays.stream(entry.getValue().replicas) | |
| .mapToObj(clusterImage::broker) | |
| .filter(Objects::nonNull) | |
| .map(BrokerRegistration::rack) | |
| .filter(Optional::isPresent) | |
| .map(Optional::get) | |
| .sorted() | |
| // optional: dedupe racks per partition to avoid double-counting identical rack strings | |
| // .distinct() | |
| .forEachOrdered(r -> { | |
| topicHasher.putString(r, StandardCharsets.UTF_8); | |
| topicHasher.putByte((byte) ';'); | |
| }); | |
| }); | |
| return topicHasher.hash().asLong(); | |
| } |
🤖 Prompt for AI Agents
In group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
around lines 246 to 267, replace the use of topicImage.id().hashCode() (which
reduces the 128-bit UUID to 32 bits) with the full 128-bit UUID by writing both
halves: call topicImage.id().getMostSignificantBits() and
topicImage.id().getLeastSignificantBits() (two putLong calls) so the full UUID
is included in the hasher; also avoid building an intermediate joined rack
string by serializing the sorted racks directly (e.g., write an int count then
iterate sorted rack strings and putString each) or otherwise stream each rack
into the topicHasher to reduce allocations; update tests accordingly to expect
the full UUID serialization.
| @Test | ||
| void testComputeTopicHash() { | ||
| long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster()); | ||
|
|
||
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | ||
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | ||
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | ||
| .putInt(0) // partition 0 | ||
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | ||
| .putInt(1) // partition 1 | ||
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | ||
| assertEquals(topicHasher.hash().asLong(), result); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align expected hash with full 128-bit UUID (not hashCode)
After fixing computeTopicHash to hash the full UUID, adjust the tests:
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
- .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+ .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+ .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Test | |
| void testComputeTopicHash() { | |
| long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster()); | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertEquals(topicHasher.hash().asLong(), result); | |
| } | |
| @Test | |
| void testComputeTopicHash() { | |
| long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster()); | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) | |
| .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB) | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertEquals(topicHasher.hash().asLong(), result); | |
| } |
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 50 to 65, the test currently feeds only topicId.hashCode() into the
hasher but computeTopicHash now hashes the full 128-bit UUID; update the test to
feed the full UUID into the Hasher (replace the single
putLong(FOO_TOPIC_ID.hashCode()) with two putLong calls:
FOO_TOPIC_ID.getMostSignificantBits() and
FOO_TOPIC_ID.getLeastSignificantBits()) so the expected hasher input matches the
implementation and the asserted hash is correct.
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 1) // different magic byte | ||
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | ||
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | ||
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | ||
| .putInt(0) // partition 0 | ||
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | ||
| .putInt(1) // partition 1 | ||
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | ||
| assertNotEquals(topicHasher.hash().asLong(), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Same UUID correction for different-magic test
Hasher topicHasher = hf.newHasher()
.putByte((byte) 1) // different magic byte
- .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+ .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+ .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 1) // different magic byte | |
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertNotEquals(topicHasher.hash().asLong(), result); | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 1) // different magic byte | |
| .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) | |
| .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB) | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertNotEquals(topicHasher.hash().asLong(), result); |
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 71 to 81, the test uses FOO_TOPIC_ID.hashCode() when feeding the
UUID into the hasher which can change the UUID representation and invalidate the
"different-magic" expectation; replace the single
putLong(FOO_TOPIC_ID.hashCode()) with the UUID's canonical two-long
representation by adding putLong(FOO_TOPIC_ID.getMostSignificantBits()) and
putLong(FOO_TOPIC_ID.getLeastSignificantBits()) so the same UUID bytes are used
across tests while keeping the different magic byte.
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | ||
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | ||
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | ||
| // different partition order | ||
| .putInt(1) // partition 1 | ||
| .putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1 | ||
| .putInt(0) // partition 0 | ||
| .putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0 | ||
| assertNotEquals(topicHasher.hash().asLong(), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Same UUID correction for different partition order
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
- .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+ .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+ .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 88 to 99, the code puts only FOO_TOPIC_ID.hashCode() into the
hasher which loses UUID entropy and can produce the same hash for different
partition orders; replace that single putLong(FOO_TOPIC_ID.hashCode()) with two
putLong calls for the UUID’s full value
(putLong(FOO_TOPIC_ID.getMostSignificantBits()) and
putLong(FOO_TOPIC_ID.getLeastSignificantBits())) so the complete UUID is
included in the hash and different partition orders produce different results.
| HashFunction hf = Hashing.murmur3_128(); | ||
| Hasher topicHasher = hf.newHasher() | ||
| .putByte((byte) 0) // magic byte | ||
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | ||
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | ||
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | ||
| .putInt(0) // partition 0 | ||
| .putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0 | ||
| .putInt(1) // partition 1 | ||
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | ||
| assertNotEquals(topicHasher.hash().asLong(), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Same UUID correction for different rack order
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
- .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+ .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+ .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| .putLong(FOO_TOPIC_ID.hashCode()) // topic Id | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertNotEquals(topicHasher.hash().asLong(), result); | |
| HashFunction hf = Hashing.murmur3_128(); | |
| Hasher topicHasher = hf.newHasher() | |
| .putByte((byte) 0) // magic byte | |
| .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) | |
| .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB) | |
| .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name | |
| .putInt(FOO_NUM_PARTITIONS) // number of partitions | |
| .putInt(0) // partition 0 | |
| .putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0 | |
| .putInt(1) // partition 1 | |
| .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1 | |
| assertNotEquals(topicHasher.hash().asLong(), result); |
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 106 to 116, the test currently expects different UUIDs when
partition rack order changes but the comment indicates the UUID should remain
the same; update the test to assert equality instead of inequality (replace
assertNotEquals(...) with assertEquals(...)) so it verifies that the hash/UUID
is invariant to rack order, or alternatively normalize/sort the rack strings
before building the hasher so the computed hash is order-independent and keep
the assertion accordingly.
Delete this text and replace it with a detailed description of your change. The
PR title and body will become the squashed commit message.
If you would like to tag individuals, add some commentary, upload images, or
include other supplemental information that should not be part of the eventual
commit message, please use a separate comment.
If applicable, please include a summary of the testing strategy (including
rationale) for the proposed change. Unit and/or integration tests are expected
for any behavior change and system tests should be considered for larger
changes.
Summary by CodeRabbit
New Features
Chores