Skip to content

Conversation

@phananh1010
Copy link
Owner

@phananh1010 phananh1010 commented Oct 2, 2025

Single commit with tree=be3c1d34ffb0c20f2517b1b81283f7bd311024ac^{tree}, parent=a31485f6e8f14869de0605e9f6b303b353b772a0. Exact snapshot of upstream PR head. No conflict resolution attempted.

Summary by CodeRabbit

  • New Features

    • Added a fluent builder to create deterministic time series identifiers from multiple typed dimensions (strings, numbers, booleans), with support for custom value funnels and merging builders.
    • Ensures stable IDs regardless of dimension order (for single-field values) and produces compact identifiers with defined minimum and maximum sizes.
    • Enhanced 128-bit hashing support used by identifier generation.
  • Tests

    • Comprehensive unit tests covering dimension handling, ordering behavior, size bounds, merging, and error cases for empty inputs.

BASE=a31485f6e8f14869de0605e9f6b303b353b772a0
HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac
Branch=main
@coderabbitai
Copy link

coderabbitai bot commented Oct 2, 2025

Walkthrough

Adds a new TsidBuilder for constructing time series identifiers from named dimensions, extends MurmurHash3.Hash128 with two public constructors, and introduces unit tests validating hashing, ordering semantics, string handling, builder merging, exceptions, and TSID size bounds.

Changes

Cohort / File(s) Summary
TSID construction
server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java
Introduces TsidBuilder with typed dimension adders, fluent API, merging, custom funnels, and generation of a 128-bit hash and TSID BytesRef. Enforces max value fields, sorts dimensions for hashing/build, and defines functional interfaces for funnels.
Hash support
server/src/main/java/org/elasticsearch/common/hash/MurmurHash3.java
Adds two public constructors to Hash128: a no-arg and an (h1, h2) initializer. No other behavioral changes.
Unit tests
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java
Adds tests covering multi-type dimensions, ordering rules, string input variants, addAll behavior, empty-dimension exceptions, and TSID size min/max.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant TsidBuilder
  participant MurmurHash3
  participant TSID as BytesRef(TSID)

  Client->>TsidBuilder: newBuilder()
  loop Add dimensions
    Client->>TsidBuilder: add*Dimension(path, value)
  end
  alt Build hash
    TsidBuilder->>MurmurHash3: hash(dim names + values)
    MurmurHash3-->>TsidBuilder: Hash128(h1,h2)
  end
  TsidBuilder->>TsidBuilder: sort dimensions, enforce limits
  TsidBuilder->>TSID: buildTsid() with path/value hashes + final 16B hash
  TSID-->>Client: BytesRef
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I nibble bytes with whiskered grace,
Sorting paths in tidy place.
Hashes twirl—h1, h2—
A TSID brewed like morning dew.
With paws that funnel every key,
I stamp the time: hop-hash, tee-hee! 🐇⌛️

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “Mirror upstream elastic#133344 for AI review (snapshot of HEAD tree)” describes a mirroring action rather than the substantive code changes and does not convey the primary additions of the TsidBuilder class, MurmurHash3.Hash128 constructors, or the new tests. Rename the pull request to clearly reflect its content, for example “Add TsidBuilder with dimension hashing and associated tests,” so that it summarizes the main changes rather than the mirroring process.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval/pr-133344-diff

Comment @coderabbitai help to get the list of available commands and usage tips.

@phananh1010 phananh1010 closed this Oct 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a31485f and b5428e7.

📒 Files selected for processing (3)
  • server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java (1 hunks)
  • server/src/main/java/org/elasticsearch/common/hash/MurmurHash3.java (1 hunks)
  • server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java (1)
server/src/main/java/org/elasticsearch/common/hash/Murmur3Hasher.java (1)
  • Murmur3Hasher (19-120)
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java (2)
libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java (1)
  • Text (19-173)
server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java (1)
  • TsidBuilder (32-346)

Comment on lines +181 to +186
public TsidBuilder addAll(TsidBuilder other) {
if (other == null || other.dimensions.isEmpty()) {
return this;
}
dimensions.addAll(other.dimensions);
return this;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix insertion-order loss in addAll for multi-valued fields.

addAll reuses the other builder’s Dimension instances without recomputing insertion order. For shared paths that appear multiple times (array dimensions), this causes the appended values to be reordered ahead of existing ones when we sort, producing a different TSID/hash than adding the same dimensions sequentially. Reproducing is easy: add "foo": ["a1","a2"] to builder A, "foo": ["b1"] to builder B, then newBuilder().addAll(A).addAll(B) yields order [a1,b1,a2] instead of [a1,a2,b1]. That’s a functional break for addAll.

Recreate each incoming dimension with a fresh insertion index (and clone the hashes) before appending:

     public TsidBuilder addAll(TsidBuilder other) {
         if (other == null || other.dimensions.isEmpty()) {
             return this;
         }
-        dimensions.addAll(other.dimensions);
+        for (Dimension dimension : other.dimensions) {
+            dimensions.add(
+                new Dimension(
+                    dimension.path(),
+                    new MurmurHash3.Hash128(dimension.pathHash().h1, dimension.pathHash().h2),
+                    new MurmurHash3.Hash128(dimension.valueHash().h1, dimension.valueHash().h2),
+                    dimensions.size()
+                )
+            );
+        }
         return this;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public TsidBuilder addAll(TsidBuilder other) {
if (other == null || other.dimensions.isEmpty()) {
return this;
}
dimensions.addAll(other.dimensions);
return this;
public TsidBuilder addAll(TsidBuilder other) {
if (other == null || other.dimensions.isEmpty()) {
return this;
}
for (Dimension dimension : other.dimensions) {
dimensions.add(
new Dimension(
dimension.path(),
new MurmurHash3.Hash128(dimension.pathHash().h1, dimension.pathHash().h2),
new MurmurHash3.Hash128(dimension.valueHash().h1, dimension.valueHash().h2),
dimensions.size()
)
);
}
return this;
}
🤖 Prompt for AI Agents
In server/src/main/java/org/elasticsearch/cluster/routing/TsidBuilder.java
around lines 181-186, addAll currently appends other.dimensions by reference
which preserves their original insertion indexes and causes insertion-order loss
for multi-valued (array) fields; to fix, iterate other.dimensions and for each
Dimension create a new Dimension instance (copying/cloning the hash/state but
assigning a fresh insertion index from this builder) and add those new instances
to this.dimensions so appended values keep correct sequential order.

@phananh1010 phananh1010 reopened this Oct 3, 2025
@phananh1010 phananh1010 closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants