TSDB ingest performance: combine routing and tsdb hashing #132566

felixbarny · 2025-08-08T09:09:21Z

Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.

Instead of populating the index.routing_path setting that can differ from the document dimensions, this now populates a new index.dimensions index setting containing all dimensions. This setting isn't user-configurable (todo). In case users manually set index.routing_path, the new optimization doesn't kick in so that routing and tsid creation is working as before. Additionally, if the dimension fields can't be expressed as a simple set of path matches (for example when using a dynamic template with a match_mapping_type that sets time_series_dimension: true), it falls back to populating index.routing_path.

As an additional benefit, the new _tsids are shorter, which may have benefits at query time. While they're shorter, they still retain the main properties: clustering similar time series together (which helps in compression) and making collisions very unlikely. More details in the JavaDoc of TsidBuilder. In fact, based on my testing, the compression is even a bit better after this change.

Remaining issues to work out:

Make index.dimensions a private setting
Update index.dimensions when adding a new dimension field to the mappings.
Ensure that all dynamic templates are incorporated into index.dimensions so that the coordinating node always knows which paths will be considered dimensions.

Sub-PRs

Fall back to index.routing_path if the dimensions can't be identified by a simple path math

…n an exact tsid

elasticsearchmachine · 2025-08-11T10:13:44Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

server/src/main/java/org/elasticsearch/index/IndexMode.java

… into tsdb-hash-once

henningandersen

A few more comments, sorry for just missing the merge time.

server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java

...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java

server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

server/src/main/java/org/elasticsearch/index/IndexMode.java

* upstream/main: (50 commits) Disable utf-8 parsing optimization (elastic#135172) rest-api-spec: fix master_timeout typo (elastic#135167) Fixes countDistinctWithConditions in csv-spec tests (elastic#135097) Fix test failure by checking for feature flag (elastic#135174) Fix deadlock in ThreadPoolMergeScheduler when a failing merge closes the IndexWriter (elastic#134656) Make SecureString comparisons constant time (elastic#135053) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on mapped geo_point field with no doc values} elastic#135164 ESQL: Replace function count tests (elastic#134951) Mute org.elasticsearch.compute.aggregation.SampleBooleanAggregatorFunctionTests testSimpleWithCranky elastic#135163 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null integer values} elastic#135162 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null double values} elastic#135159 TSDB ingest performance: combine routing and tsdb hashing (elastic#132566) Mute org.elasticsearch.compute.aggregation.SampleBytesRefAggregatorFunctionTests testSimpleWithCranky elastic#135157 Mute org.elasticsearch.xpack.logsdb.qa.BulkStoredSourceChallengeRestIT testHistogramAggregation elastic#135156 Mute org.elasticsearch.xpack.logsdb.qa.StandardVersusStandardReindexedIntoLogsDbChallengeRestIT testHistogramAggregation elastic#135155 Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testHistogramAggregation elastic#135154 Mute org.elasticsearch.xpack.logsdb.qa.BulkChallengeRestIT testHistogramAggregation elastic#135153 Mute org.elasticsearch.discovery.ClusterDisruptionIT testAckedIndexing elastic#117024 Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testMountSearchableSnapshot {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135151 Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testSearchableSnapshotUpgrade {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135150 ...

With implementations IndexRouting.ExtractFromSource.ForRoutingPath and IndexRouting.ExtractFromSource.ForIndexDimensions. This addresses review comments from elastic#132566.

…2566) Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.

In elastic#133232, we've added the ability to provide index metadata with an IndexSettingProvider. It turned out that we don't need that functionality as we ended up using a private index setting in elastic#132566. This also adds the `IndexVersion` as another parameter. This is in preparation for [this](elastic#132566 (comment)) suggestion to conditionally set one or another setting, depending on the index version.

In #133232, we've added the ability to provide index metadata with an IndexSettingProvider. It turned out that we don't need that functionality as we ended up using a private index setting in #132566. This also adds the `IndexVersion` as another parameter. This is in preparation for [this](#132566 (comment)) suggestion to conditionally set one or another setting, depending on the index version. `IndexSettingProvider`s are now disallowed from providing the `index.version.created` setting. Otherwise, they can't rely on the `IndexVersion` they receive to be the one that will be actually used for the created index as another provider may change it.

) With implementations IndexRouting.ExtractFromSource.ForRoutingPath and IndexRouting.ExtractFromSource.ForIndexDimensions. This addresses review comments from #132566. Also fixes cases where the tsid is not provided by the coordinating node, such as for translog operations.

Hash once to create routing hash and _tsid

37ae32a

felixbarny added the >non-issue label Aug 8, 2025

felixbarny requested review from a team as code owners August 8, 2025 09:09

felixbarny added the :StorageEngine/TSDB You know, for Metrics label Aug 8, 2025

felixbarny marked this pull request as draft August 8, 2025 09:09

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Aug 8, 2025

felixbarny and others added 11 commits August 8, 2025 11:09

Merge branch 'main' into tsdb-hash-once

cffae3e

[CI] Auto commit changes from spotless

41d0e28

Merge branch 'main' into tsdb-hash-once

3cbc3f3

Add notice file for hash4j

877de11

Use optimized text

c71db07

Fix DataStreamIndexSettingsProviderTests

0222d93

Update index.dimensions on mapping updates

a1be48b

Fall back to index.routing_path if the dimensions can't be identified by a simple path math

Apply spotless suggestions

1734649

Make index.dimensions a private setting

d449b79

Adjust TSDBPassthroughIndexingIT

8831b1d

The index.dimensions settingn should be replicated

a915bcf

felixbarny mentioned this pull request Aug 11, 2025

Optimize IP field parsing #132463

Merged

felixbarny added 6 commits August 11, 2025 11:23

Fix IndexRoutingTests

2ff61f6

Make downsampling aware of index.dimensions

002fd43

Merge remote-tracking branch 'origin/main' into tsdb-hash-once

2ccc3c2

Fix rest compatibility tests by excluding test that make assertions o…

6282cd4

…n an exact tsid

Fix condition for system provided settings

fae55a5

Adjust TsdbDataStreamRestIT

b41a88d

felixbarny marked this pull request as ready for review August 11, 2025 10:13

elasticsearchmachine added the Team:StorageEngine label Aug 11, 2025

felixbarny requested a review from kkrik-es August 11, 2025 10:13

henningandersen reviewed Sep 17, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/IndexMode.java Show resolved Hide resolved

felixbarny disabled auto-merge September 17, 2025 08:56

felixbarny added 4 commits September 17, 2025 11:21

Avoid allocating a BytesRef if empty

add084c

Merge remote-tracking branch 'origin/main' into tsdb-hash-once

374301e

Merge remote-tracking branch 'refs/remotes/felixbarny/tsdb-hash-once'…

506ca37

… into tsdb-hash-once

Merge remote-tracking branch 'origin/main' into tsdb-hash-once

407c897

felixbarny mentioned this pull request Sep 18, 2025

OTLP: optimize _tsid creation #134982

Merged

felixbarny added 4 commits September 19, 2025 15:40

Merge remote-tracking branch 'origin/main' into tsdb-hash-once

250f20e

Add feature flag

0d6ece8

Merge remote-tracking branch 'origin/main' into tsdb-hash-once

9ee2cfb

Add feature flag

128febb

felixbarny merged commit a3f5ea5 into elastic:main Sep 21, 2025
34 checks passed

felixbarny deleted the tsdb-hash-once branch September 21, 2025 09:03

henningandersen reviewed Sep 21, 2025

View reviewed changes

This was referenced Sep 22, 2025

Refactor IndexRouting.ExtractFromSource to be an abstract class #135206

Merged

Don't set index.dimensions if dimensions are added via dynamic templates #135212

Merged

felixbarny mentioned this pull request Sep 23, 2025

Cleanup of IndexSettingProvider #135251

Merged

felixbarny mentioned this pull request Sep 24, 2025

Set either index.dimensions or index.routing_path #135351

Merged

felixbarny mentioned this pull request Sep 25, 2025

Remove index.dimensions feature flag #135402

Merged

This was referenced Sep 25, 2025

[CI] ReindexDatastreamIndexTransportActionIT testTsdbStartEndSet failing #135366

Closed

Fixing ReindexDatastreamIndexTransportActionIT.testTsdbStartEndSet to always provide index.routing_path #135456

Merged

This was referenced Oct 23, 2025

Mirror upstream elastic/elasticsearch#135212 for AI review (snapshot of HEAD tree) phananh1010/elasticsearch#200

Closed

Mirror upstream elastic/elasticsearch#133344 for AI review (snapshot of HEAD tree) phananh1010/elasticsearch#210

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TSDB ingest performance: combine routing and tsdb hashing #132566

TSDB ingest performance: combine routing and tsdb hashing #132566

Uh oh!

felixbarny commented Aug 8, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

TSDB ingest performance: combine routing and tsdb hashing #132566

TSDB ingest performance: combine routing and tsdb hashing #132566

Uh oh!

Conversation

felixbarny commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

felixbarny commented Aug 8, 2025 •

edited

Loading