-
Notifications
You must be signed in to change notification settings - Fork 25.4k
TSDB ingest performance: combine routing and tsdb hashing #132566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
felixbarny
wants to merge
24
commits into
elastic:main
Choose a base branch
from
felixbarny:tsdb-hash-once
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,377
−263
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fall back to index.routing_path if the dimensions can't be identified by a simple path math
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
The new tsid behavior is conditional on the index.dimensions setting which will only be populated on new indices
…imensions setting
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
external-contributor
Pull request authored by a developer outside the Elasticsearch team
>non-issue
serverless-linked
Added by automation, don't add manually
:StorageEngine/TSDB
You know, for Metrics
Team:StorageEngine
v9.2.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.
Instead of populating the
index.routing_path
setting that can differ from the document dimensions, this now populates a newindex.dimensions
index setting containing all dimensions. This setting isn't user-configurable (todo). In case users manually setindex.routing_path
, the new optimization doesn't kick in so that routing and tsid creation is working as before. Additionally, if the dimension fields can't be expressed as a simple set of path matches (for example when using a dynamic template with amatch_mapping_type
that setstime_series_dimension: true
), it falls back to populatingindex.routing_path
.As an additional benefit, the new
_tsid
s are shorter, which may have benefits at query time. While they're shorter, they still retain the main properties: clustering similar time series together (which helps in compression) and making collisions very unlikely. More details in the JavaDoc ofTsidBuilder
.I've added a dependency on hash4j which provides an efficient way to hash strings, without having to create a temporary utf-8 byte array, as well as a nice API.
Remaining issues to work out:
index.dimensions
a private settingindex.dimensions
when adding a new dimension field to the mappings.index.dimensions
so that the coordinating node always knows which paths will be considered dimensions.