-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add tsid builder #133344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tsid builder #133344
Conversation
This is a part of elastic#132566
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| * This is to cluster time series with similar values together, also helping with making encodings more effective. | ||
| * </li> | ||
| * <li> | ||
| * A hash of all names and values combined (16 bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to see a TSDB and TSDB-metricgen run where we just use 16 bytes, to see the impact to storage. Copying large tsids is an issue during querying, 16 bytes can be optimized much better.
Not a blocker, since this is not used yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll definitely do that in the context of the larger PR.
| murmur3Hasher.reset(); | ||
| for (int i = 0; i < dimensions.size(); i++) { | ||
| Dimension dim = dimensions.get(i); | ||
| addLongs(murmur3Hasher, dim.pathHash.h1, dim.pathHash.h2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use h1? Conflicts are not catastrophic here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, conflicts aren't too bad here. I guess by just using h1, it could speed up the hashing a bit. But either way, the resulting _tsid will be of the same size (4 bytes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to xor both parts and ad that as a single long, this seems like a good compromise.
| } | ||
| MurmurHash3.Hash128 valueHash = dim.valueHash(); | ||
| murmur3Hasher.reset(); | ||
| addLongs(murmur3Hasher, valueHash.h1, valueHash.h2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, we only use one by below either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, h1 actually only contains the type - like 1 for integers and 2 for longs. But using h2 here could work. But that also won't impact the size of the _tsid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to xor both parts and ad that as a single long, this seems like a good compromise.
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/routing/TsidBuilderTests.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Kostas Krikellas <[email protected]>
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0 HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac Branch=main
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0 HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac Branch=main
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0 HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac Branch=main
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0 HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac Branch=main
This is a part of #132566