Skip to content

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Aug 8, 2025

Instead of hashing dimensions during routing and then again during document parsing, this combines the two steps. The tsid is created during routing and then used to create a routing hash. The tsid is then sent to the data nodes which acts as a signal that creating the tsid during document parsing isn't required anymore.

Instead of populating the index.routing_path setting that can differ from the document dimensions, this now populates a new index.dimensions index setting containing all dimensions. This setting isn't user-configurable (todo). In case users manually set index.routing_path, the new optimization doesn't kick in so that routing and tsid creation is working as before. Additionally, if the dimension fields can't be expressed as a simple set of path matches (for example when using a dynamic template with a match_mapping_type that sets time_series_dimension: true), it falls back to populating index.routing_path.

As an additional benefit, the new _tsids are shorter, which may have benefits at query time. While they're shorter, they still retain the main properties: clustering similar time series together (which helps in compression) and making collisions very unlikely. More details in the JavaDoc of TsidBuilder. In fact, based on my testing, the compression is even a bit better after this change.

I've added a dependency on hash4j which provides an efficient way to hash strings, without having to create a temporary utf-8 byte array, as well as a nice API.

Remaining issues to work out:

  • Make index.dimensions a private setting
  • Update index.dimensions when adding a new dimension field to the mappings.
  • Ensure that all dynamic templates are incorporated into index.dimensions so that the coordinating node always knows which paths will be considered dimensions.

Sub-PRs

@felixbarny felixbarny requested review from a team as code owners August 8, 2025 09:09
@felixbarny felixbarny added the :StorageEngine/TSDB You know, for Metrics label Aug 8, 2025
@felixbarny felixbarny marked this pull request as draft August 8, 2025 09:09
@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Aug 8, 2025
@felixbarny felixbarny marked this pull request as ready for review August 11, 2025 10:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny felixbarny requested a review from kkrik-es August 11, 2025 10:13
@kkrik-es kkrik-es requested a review from martijnvg August 25, 2025 12:10
@kkrik-es
Copy link
Contributor

I think @martijnvg should also take a look, in case I missed anything wrt routing and tsids.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time with this PR, I wasn't able to do a full review yet, my main concern is around dimensions that get dynamically introduced and that are not part of passthrough object fields.

Before if a new dimension field is introduced, then we only route based on routing fields and then at index time compute a tsid, which would be different compared to time series without the new dimension field, but at least the new tsid would consistently land in the same shard. More critically, also subsequent documents for new tsid would also land in the same shard.

With this change, if a new dimension is dynamically introduced outside passthrough object fields, then the first document for the updated time series is routed to a shard, and then at index time a new dimension is dynamically mapped. I think the problem we can now run into is that next document for the updated time serie can be routed to a different shard than the previous document was indexed into.

The fact that the tsid changes and that a time serie becomes a different time serie is not what I'm worried about. This is also what happens today. I'm worried about documents from the same tsid ending up in different shards. Please let me know, if that indeed can happen or if my understanding is incorrect.

* <p>
* The _tsid can not be directly set by a user, it is set by the coordinating node.
*/
public IndexRequest tsid(BytesRef tsid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't too bad and perhaps preferred over routing. It can't be set via write APIs and it clear what the purpose of this property is.

@felixbarny
Copy link
Member Author

felixbarny commented Aug 26, 2025

I spent some time with this PR, I wasn't able to do a full review yet, my main concern is around dimensions that get dynamically introduced and that are not part of passthrough object fields.

At a high-level, the idea is that we only enter the new time_series_dimensions.includes-based mode (as opposed to the index.routing_path-mode) when we know it's safe in the sense that no dimensions can be dynamically added that we didn't anticipate. Anticipate meaning that they aren't matched via the dimension path matches. This is what the matchesAllDimensions flag is about.

Before if a new dimension field is introduced, then we only route based on routing fields and then at index time compute a tsid, which would be different compared to time series without the new dimension field, but at least the new tsid would consistently land in the same shard. More critically, also subsequent documents for new tsid would also land in the same shard.

With this change, if a new dimension is dynamically introduced outside passthrough object fields, then the first document for the updated time series is routed to a shard, and then at index time a new dimension is dynamically mapped. I think the problem we can now run into is that next document for the updated time serie can be routed to a different shard than the previous document was indexed into.

The fact that the tsid changes and that a time serie becomes a different time serie is not what I'm worried about. This is also what happens today. I'm worried about documents from the same tsid ending up in different shards. Please let me know, if that indeed can happen or if my understanding is incorrect.

I'm not sure if I fully understand the scenario you've described. Here's my take based on my understanding of your question. Let me know if I'm off.

With this new change, the _tsid is created in the coordinating node and the routing happens based on the tsid. So for a given _tsid, all documents are still consistently routed to the same shard.

There can be instances where users manually mark an existing field as a dimension. After the next rollover, this will lead to the _tsid and the shard routing to be different. But that should be fine because it's a new time series for all intents and purposes.

There can't be a situation where an existing field is dynamically mapped to a dimension. That's because dynamic mappings only apply to fields that aren't in the mapping, yet.

If a new field is mapped based on a dynamic mapping, it's the first time we're encountered it, so it shouldn't affect existing time series. Also, we know that the new field has to be in the time_series_dimensions.includes list already. For example, there may be a dynamic template for mapping labels.* to time_series_dimension keyword fields. In that case, we would have extracted that path (labels.*) from the dynamic template in DataStreamIndexSettingsProvider. If there's a dynamic template that doesn't use a simple path match (for example uses match_mapping_type: string), the matchesAllDimensions boolean flag would have been false, and we would have fallen back to index.routing_paths (the current logic). Because of that mechanism, and the guarantee that the time_series_dimensions.includes list of path matches will match any potential dimension fields, the coordinating node will always know about which fields are dimensions, even if that field is not present in the mapping just yet but about to be added via a path_match-based dynamic mapping.

@felixbarny
Copy link
Member Author

There's an interesting test failure related to reindexing. The issue is that the custom time_series_dimensions.includes metadata doesn't get copied over when creating the target index:

var createIndexRequest = new CreateIndexRequest(request.destIndex()).settings(settings);
createIndexRequest.cause("create-index-from-source");
if (mergeMappings.isEmpty() == false) {
createIndexRequest.mapping(mergeMappings);
}
createIndexRequest.setParentTask(new TaskId(clusterService.localNode().getId(), task.getId()));
client.admin().indices().create(createIndexRequest, listener.map(response -> response));

There's no way to provide the custom metadata form the source index to the target index as the CreateIndexRequest doesn't allow for that.

In other places, like in downsampling this is solved by the action running on the master node and submitting a task that uses MetadataCreateIndexService#applyCreateIndexRequest to create a new index, which allows to provide a consumer to copy over custom metadata from the source to the target:

https://github.com/felixbarny/elasticsearch/blob/40e7ee79874fe8056113fc934e7e0640eab3a8fe/x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/TransportDownsampleAction.java#L962-L982

Not sure how to best tackle this. A few options:

  • Revert back from using custom metadata to a private index setting. As we'd want the index setting to be private, we'd need to add the ability to mark specific settings as being system-provided to avoid errors when automatically setting or copying over the dimensions setting. As this is something users should never set or be exposed to, an index setting is probably not the right fit conceptually.
  • Allow create index requests to specify custom index metadata. Obviously, this should not be allowed for the public rest API, just for the internal transport action request.
  • Migrate CreateIndexFromSourceTransportAction to be a AcknowledgedTransportMasterNodeAction and call MetadataCreateIndexService#applyCreateIndexRequest directly instead of creating an CreateIndexRequest so that we can copy over the custom index metadata, similar to how it's done in TransportDownsampleAction.

@martijnvg
Copy link
Member

Migrate CreateIndexFromSourceTransportAction to be a AcknowledgedTransportMasterNodeAction and call MetadataCreateIndexService#applyCreateIndexRequest directly instead of creating an CreateIndexRequest so that we can copy over the custom index metadata, similar to how it's done in TransportDownsampleAction.

That wouldn't work for reindex?

Allow create index requests to specify custom index metadata. Obviously, this should not be allowed for the public rest API, just for the internal transport action request.

I think we would need more use cases to justify adding this capability to create index api.

Revert back from using custom metadata to a private index setting. As we'd want the index setting to be private, we'd need to add the ability to mark specific settings as being system-provided to avoid errors when automatically setting or copying over the dimensions setting. As this is something users should never set or be exposed to, an index setting is probably not the right fit conceptually.

I prefer this approach. I do wonder whether we can set PrivateIndex scoped settings via index settings providers. Not sure whether the create index api can distinguish between PrivateIndex scoped settings added via api / templates or index settings providers. I think this is something we can fix, if this is a problem.

@felixbarny
Copy link
Member Author

felixbarny commented Aug 27, 2025

That wouldn't work for reindex?

Why wouldn't this work for reindex?

I prefer this approach. [using a private index setting instead of custom index metadata]

@dakrone would you be ok with that? You proposed using custom index metadata rather than a private index setting.

I do wonder whether we can set PrivateIndex scoped settings via index settings providers. Not sure whether the create index api can distinguish between PrivateIndex scoped settings added via api / templates or index settings providers. I think this is something we can fix, if this is a problem.

Not today, but this is doable. Probably needs reviews/buy-in from core infra, though. In fact, this is what I implemented initially. See also d449b79c and a409a95c.

@felixbarny
Copy link
Member Author

@rjernst I'd like to get your perspective on this as well. You can start reading from this comment.

The TL;DR is that we want to store the time series dimension somewhere in index metadata. Either as a private index setting or as custom metadata. If we go with the private index settings based approach, we'd need functionality that allows us to set private settings if they're system-provided. This affects settings added via an IndexSettingsProvider and also settings copied over before creating a downsampling index. My previous comment links to commits that enable that. How would you feel about adding something like that to the codebase?

@gmarouli noted another potential issue with the approach of using custom index metadata. This would probably not work well with CCR. At least, we would need to add functionality to copy custom index metadata into the follower index.

@rjernst
Copy link
Member

rjernst commented Aug 28, 2025

Allow create index requests to specify custom index metadata. Obviously, this should not be allowed for the public rest API, just for the internal transport action request.

Without having given this a ton of thought, this seems like the most straightforward solution (we've done such things in other places before).

@rjernst
Copy link
Member

rjernst commented Aug 28, 2025

we'd need to add the ability to mark specific settings as being system-provided

FWIW I think making the transport request allow private index settings would also work, but it would mean needing to move validation around to the edge (where the rest request is parsed), or have a separate part of the request specifically for private index settings that can't be set by the rest request.

@felixbarny
Copy link
Member Author

felixbarny commented Aug 28, 2025

Thanks for the timely feedback, Ryan!

FWIW I think making the transport request allow private index settings would also work, but it would mean needing to move validation around to the edge (where the rest request is parsed), or have a separate part of the request specifically for private index settings that can't be set by the rest request.

What I did in a409a95c was to add a flag to CreateIndexClusterStateUpdateRequest to indicate that all settings in that request are system provided. This obviously can't be toggled via the REST API. I didn't yet see the need to have a way to add settings where some are user-provided and some are system-provided with CreateIndexClusterStateUpdateRequest.

The change in d449b79c is about making it so that during index creation/validation, we only disallow private settings coming from index templates and allow IndexSettingsProvider to provide private settings.

When we all agree that this is the best path forward, I'll revert some of the changes made in #133232, which added the ability to provide custom index metadata via IndexSettingsProvider. The other things added in that PR (mainly the onMappinUpdate hook) is still valid.

Sounds like @martijnvg is on board and @rjernst seems to be generally favorable after having a first glance. Curious to hear from @dakrone.

@dakrone
Copy link
Member

dakrone commented Aug 28, 2025

I think that sounds reasonable to me as well, as long as Ryan and Martijn are happy.

@felixbarny
Copy link
Member Author

I've created a PR that allows system-provided private settings: #133789

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue serverless-linked Added by automation, don't add manually :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants