-
Notifications
You must be signed in to change notification settings - Fork 25.6k
TSDB ingest performance: combine routing and tsdb hashing #132566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fall back to index.routing_path if the dimensions can't be identified by a simple path math
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a shallow initial pass. I'll need to reeducate myself on a few concepts to fully grasp everything though. But seems relatively isolated.
server/src/main/java/org/elasticsearch/action/index/IndexRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, sorry for just missing the merge time.
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
...ata-streams/src/main/java/org/elasticsearch/datastreams/DataStreamIndexSettingsProvider.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Show resolved
Hide resolved
* upstream/main: (50 commits) Disable utf-8 parsing optimization (elastic#135172) rest-api-spec: fix master_timeout typo (elastic#135167) Fixes countDistinctWithConditions in csv-spec tests (elastic#135097) Fix test failure by checking for feature flag (elastic#135174) Fix deadlock in ThreadPoolMergeScheduler when a failing merge closes the IndexWriter (elastic#134656) Make SecureString comparisons constant time (elastic#135053) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on mapped geo_point field with no doc values} elastic#135164 ESQL: Replace function count tests (elastic#134951) Mute org.elasticsearch.compute.aggregation.SampleBooleanAggregatorFunctionTests testSimpleWithCranky elastic#135163 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null integer values} elastic#135162 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=analytics/nested_top_metrics_sort/terms order by top metrics numeric not null double values} elastic#135159 TSDB ingest performance: combine routing and tsdb hashing (elastic#132566) Mute org.elasticsearch.compute.aggregation.SampleBytesRefAggregatorFunctionTests testSimpleWithCranky elastic#135157 Mute org.elasticsearch.xpack.logsdb.qa.BulkStoredSourceChallengeRestIT testHistogramAggregation elastic#135156 Mute org.elasticsearch.xpack.logsdb.qa.StandardVersusStandardReindexedIntoLogsDbChallengeRestIT testHistogramAggregation elastic#135155 Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusLogsDbReindexedIntoStandardModeChallengeRestIT testHistogramAggregation elastic#135154 Mute org.elasticsearch.xpack.logsdb.qa.BulkChallengeRestIT testHistogramAggregation elastic#135153 Mute org.elasticsearch.discovery.ClusterDisruptionIT testAckedIndexing elastic#117024 Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testMountSearchableSnapshot {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135151 Mute org.elasticsearch.lucene.RollingUpgradeSearchableSnapshotIndexCompatibilityIT testSearchableSnapshotUpgrade {p0=[9.2.0, 9.2.0, 9.2.0]} elastic#135150 ...
With implementations IndexRouting.ExtractFromSource.ForRoutingPath and IndexRouting.ExtractFromSource.ForIndexDimensions. This addresses review comments from elastic#132566.
…2566) Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.
…2566) Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.
In elastic#133232, we've added the ability to provide index metadata with an IndexSettingProvider. It turned out that we don't need that functionality as we ended up using a private index setting in elastic#132566. This also adds the `IndexVersion` as another parameter. This is in preparation for [this](elastic#132566 (comment)) suggestion to conditionally set one or another setting, depending on the index version.
In #133232, we've added the ability to provide index metadata with an IndexSettingProvider. It turned out that we don't need that functionality as we ended up using a private index setting in #132566. This also adds the `IndexVersion` as another parameter. This is in preparation for [this](#132566 (comment)) suggestion to conditionally set one or another setting, depending on the index version. `IndexSettingProvider`s are now disallowed from providing the `index.version.created` setting. Otherwise, they can't rely on the `IndexVersion` they receive to be the one that will be actually used for the created index as another provider may change it.
) With implementations IndexRouting.ExtractFromSource.ForRoutingPath and IndexRouting.ExtractFromSource.ForIndexDimensions. This addresses review comments from #132566. Also fixes cases where the tsid is not provided by the coordinating node, such as for translog operations.
Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.
Instead of populating the
index.routing_path
setting that can differ from the document dimensions, this now populates a newindex.dimensions
index setting containing all dimensions. This setting isn't user-configurable (todo). In case users manually setindex.routing_path
, the new optimization doesn't kick in so that routing and tsid creation is working as before. Additionally, if the dimension fields can't be expressed as a simple set of path matches (for example when using a dynamic template with amatch_mapping_type
that setstime_series_dimension: true
), it falls back to populatingindex.routing_path
.As an additional benefit, the new
_tsid
s are shorter, which may have benefits at query time. While they're shorter, they still retain the main properties: clustering similar time series together (which helps in compression) and making collisions very unlikely. More details in the JavaDoc ofTsidBuilder
. In fact, based on my testing, the compression is even a bit better after this change.Remaining issues to work out:
index.dimensions
a private settingindex.dimensions
when adding a new dimension field to the mappings.index.dimensions
so that the coordinating node always knows which paths will be considered dimensions.Sub-PRs