Skip to content

Conversation

@felixbarny
Copy link
Member

This is a part of #132566

@felixbarny felixbarny requested a review from a team as a code owner August 21, 2025 16:39
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Aug 21, 2025
@felixbarny felixbarny requested a review from kkrik-es August 21, 2025 16:39
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Aug 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

* This is to cluster time series with similar values together, also helping with making encodings more effective.
* </li>
* <li>
* A hash of all names and values combined (16 bytes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see a TSDB and TSDB-metricgen run where we just use 16 bytes, to see the impact to storage. Copying large tsids is an issue during querying, 16 bytes can be optimized much better.

Not a blocker, since this is not used yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll definitely do that in the context of the larger PR.

murmur3Hasher.reset();
for (int i = 0; i < dimensions.size(); i++) {
Dimension dim = dimensions.get(i);
addLongs(murmur3Hasher, dim.pathHash.h1, dim.pathHash.h2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use h1? Conflicts are not catastrophic here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, conflicts aren't too bad here. I guess by just using h1, it could speed up the hashing a bit. But either way, the resulting _tsid will be of the same size (4 bytes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to xor both parts and ad that as a single long, this seems like a good compromise.

}
MurmurHash3.Hash128 valueHash = dim.valueHash();
murmur3Hasher.reset();
addLongs(murmur3Hasher, valueHash.h1, valueHash.h2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, we only use one by below either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, h1 actually only contains the type - like 1 for integers and 2 for longs. But using h2 here could work. But that also won't impact the size of the _tsid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to xor both parts and ad that as a single long, this seems like a good compromise.

@felixbarny felixbarny merged commit f790311 into elastic:main Aug 22, 2025
33 checks passed
@felixbarny felixbarny deleted the tsid-builder branch August 22, 2025 10:20
pabloem pushed a commit to pabloem/elasticsearch that referenced this pull request Aug 22, 2025
@felixbarny felixbarny self-assigned this Aug 25, 2025
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 2, 2025
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0
HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 8, 2025
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0
HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 17, 2025
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0
HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 23, 2025
BASE=a31485f6e8f14869de0605e9f6b303b353b772a0
HEAD=be3c1d34ffb0c20f2517b1b81283f7bd311024ac
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants