Skip to content

TSDB ingest performance: combine routing and tsdb hashing #132566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Aug 8, 2025

Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.

Instead of populating the index.routing_path setting that can differ from the document dimensions, this now populates a new index.dimensions index setting containing all dimensions. This setting isn't user-configurable (todo). In case users manually set index.routing_path, the new optimization doesn't kick in so that routing and tsid creation is working as before. Additionally, if the dimension fields can't be expressed as a simple set of path matches (for example when using a dynamic template with a match_mapping_type that sets time_series_dimension: true), it falls back to populating index.routing_path.

As an additional benefit, the new _tsids are shorter, which may have benefits at query time. While they're shorter, they still retain the main properties: clustering similar time series together (which helps in compression) and making collisions very unlikely. More details in the JavaDoc of TsidBuilder.

I've added a dependency on hash4j which provides an efficient way to hash strings, without having to create a temporary utf-8 byte array, as well as a nice API.

Remaining issues to work out:

  • Make index.dimensions a private setting
  • Update index.dimensions when adding a new dimension field to the mappings.
  • Ensure that all dynamic templates are incorporated into index.dimensions so that the coordinating node always knows which paths will be considered dimensions.

@felixbarny felixbarny requested review from a team as code owners August 8, 2025 09:09
@felixbarny felixbarny added the :StorageEngine/TSDB You know, for Metrics label Aug 8, 2025
@felixbarny felixbarny marked this pull request as draft August 8, 2025 09:09
@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Aug 8, 2025
@felixbarny felixbarny marked this pull request as ready for review August 11, 2025 10:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny felixbarny requested a review from kkrik-es August 11, 2025 10:13
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue serverless-linked Added by automation, don't add manually :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants