-
Notifications
You must be signed in to change notification settings - Fork 0
Mirror upstream elastic/elasticsearch#133344 for AI review (snapshot of HEAD tree) #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0 HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac Branch=main
WalkthroughAdds a new TsidBuilder for constructing time series identifiers from named dimensions, extends MurmurHash3.Hash128 with two public constructors, and introduces unit tests validating hashing, ordering semantics, string handling, builder merging, exceptions, and TSID size bounds. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant TsidBuilder
participant MurmurHash3
participant TSID as BytesRef(TSID)
Client->>TsidBuilder: newBuilder()
loop Add dimensions
Client->>TsidBuilder: add*Dimension(path, value)
end
alt Build hash
TsidBuilder->>MurmurHash3: hash(dim names + values)
MurmurHash3-->>TsidBuilder: Hash128(h1,h2)
end
TsidBuilder->>TsidBuilder: sort dimensions, enforce limits
TsidBuilder->>TSID: buildTsid() with path/value hashes + final 16B hash
TSID-->>Client: BytesRef
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java(1 hunks)server/src/main/java/org/elasticsearch/common/hash/MurmurHash3.java(1 hunks)server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java (1)
server/src/main/java/org/elasticsearch/common/hash/Murmur3Hasher.java (1)
Murmur3Hasher(19-120)
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java (2)
libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java (1)
Text(19-173)server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java (1)
TsidBuilder(32-346)
| public TsidBuilder addAll(TsidBuilder other) { | ||
| if (other == null || other.dimensions.isEmpty()) { | ||
| return this; | ||
| } | ||
| dimensions.addAll(other.dimensions); | ||
| return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix insertion-order loss in addAll for multi-valued fields.
addAll reuses the other builder’s Dimension instances without recomputing insertion order. For shared paths that appear multiple times (array dimensions), this causes the appended values to be reordered ahead of existing ones when we sort, producing a different TSID/hash than adding the same dimensions sequentially. Reproducing is easy: add "foo": ["a1","a2"] to builder A, "foo": ["b1"] to builder B, then newBuilder().addAll(A).addAll(B) yields order [a1,b1,a2] instead of [a1,a2,b1]. That’s a functional break for addAll.
Recreate each incoming dimension with a fresh insertion index (and clone the hashes) before appending:
public TsidBuilder addAll(TsidBuilder other) {
if (other == null || other.dimensions.isEmpty()) {
return this;
}
- dimensions.addAll(other.dimensions);
+ for (Dimension dimension : other.dimensions) {
+ dimensions.add(
+ new Dimension(
+ dimension.path(),
+ new MurmurHash3.Hash128(dimension.pathHash().h1, dimension.pathHash().h2),
+ new MurmurHash3.Hash128(dimension.valueHash().h1, dimension.valueHash().h2),
+ dimensions.size()
+ )
+ );
+ }
return this;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public TsidBuilder addAll(TsidBuilder other) { | |
| if (other == null || other.dimensions.isEmpty()) { | |
| return this; | |
| } | |
| dimensions.addAll(other.dimensions); | |
| return this; | |
| public TsidBuilder addAll(TsidBuilder other) { | |
| if (other == null || other.dimensions.isEmpty()) { | |
| return this; | |
| } | |
| for (Dimension dimension : other.dimensions) { | |
| dimensions.add( | |
| new Dimension( | |
| dimension.path(), | |
| new MurmurHash3.Hash128(dimension.pathHash().h1, dimension.pathHash().h2), | |
| new MurmurHash3.Hash128(dimension.valueHash().h1, dimension.valueHash().h2), | |
| dimensions.size() | |
| ) | |
| ); | |
| } | |
| return this; | |
| } |
🤖 Prompt for AI Agents
In server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java
around lines 181-186, addAll currently appends other.dimensions by reference
which preserves their original insertion indexes and causes insertion-order loss
for multi-valued (array) fields; to fix, iterate other.dimensions and for each
Dimension create a new Dimension instance (copying/cloning the hash/state but
assigning a fresh insertion index from this builder) and add those new instances
to this.dimensions so appended values keep correct sequential order.
Single commit with tree=be3c1d34ffb0c20f2517b1b81283f7bd311024ac^{tree}, parent=a31485f6e8f14869de0605e9f6b303b353b772a0. Exact snapshot of upstream PR head. No conflict resolution attempted.
Summary by CodeRabbit
New Features
Tests