Skip to content

Conversation

jordan-powers
Copy link
Contributor

This patch removes the check that fails requests that attempt to use fields of type: nested within indices with mode time_series.

This patch also updates TimeSeriesIdFieldMapper#postParse to set the _id field on child documents once it's calculated.

Closes #120874

@jordan-powers jordan-powers added >enhancement auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 labels Feb 11, 2025
@jordan-powers jordan-powers self-assigned this Feb 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @jordan-powers, I've created a changelog YAML for you.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @jordan-powers. I didn't expect that a change inDocumentParserContext is needed, but I understand why it is needed.

The responsibility of adding the _id field to nested documents is now in another place, which isn't ideal. This is why added two comments about asserts.

// NOTE: we don't support nested fields in tsdb so it's safe to assume the standard id mapper.
doc.add(new StringField(IdFieldMapper.NAME, idField.binaryValue(), Field.Store.NO));
} else if (indexSettings().getMode() == IndexMode.TIME_SERIES) {
// For time series indices, the _id is generated from the _tsid, which in turn is generated from the values of the configured
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add an assert that getRoutingFields() doesn't return a reference to RoutingFields.Noop#INSTANCE? Just to make sure we are able to collect dimension values in order to generate _tsid / _id at a later stage?

// for time-series indices the _id isn't available at that point.
assert context.id() != null;
for (LuceneDocument doc : context.nonRootDocuments()) {
doc.add(new StringField(IdFieldMapper.NAME, Uid.encodeId(context.id()), Field.Store.NO));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also assert that _id field hasn't been added yet to non root documents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have nested within nested? If it's problematic for TSDB, we can throw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, i wonder what happens with multi-nested documents. I believe you may want to check the parent of the document here because they can differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at DocumentParserContext#createNestedContext, it seems that the child document's _id always inherits the parent's _id, which eventually inherits from the root document's _id. So even in multi-level nested documents, the _id is the same root-level _id.

IndexableField idField = doc.getParent().getField(IdFieldMapper.NAME);
if (idField != null) {
    doc.add(new StringField(IdFieldMapper.NAME, idField.binaryValue(), Field.Store.NO));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's test it :)

body:
size: 0
query:
bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nested query is required if you intend to query at the courses level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're totally right

courses.credits: 3

- match:
hits.total.value: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also do search that returns a hit?

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely documented somewhere and that documentation needs to be adjusted. I think we are in the middle of documentation migration though so let's create a task for that.

// for time-series indices the _id isn't available at that point.
assert context.id() != null;
for (LuceneDocument doc : context.nonRootDocuments()) {
doc.add(new StringField(IdFieldMapper.NAME, Uid.encodeId(context.id()), Field.Store.NO));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, i wonder what happens with multi-nested documents. I believe you may want to check the parent of the document here because they can differ.

time_series_dimension: true

---
nested fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this test? i think it repeats tests above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it repeats any tests above since this is the only test in this file with a nested non-time_series_dimension field. But it is definitely redundant with the tests I added in 160_nested_fields.yml, so I'll take it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant above in the PR, sorry.

// We need to add the uid or id to nested Lucene documents so that when a document gets deleted, the nested documents are
// also deleted. Usually this happens when the nested document is created (in DocumentParserContext#createNestedContext), but
// for time-series indices the _id isn't available at that point.
var binaryId = context.doc().getField(IdFieldMapper.NAME).binaryValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getField is kind of expensive since it iterates over all fields. Let's do this only when there are non root documents. Or maybe we can return the id from TsidExtractingIdFieldMapper above.

@jordan-powers jordan-powers merged commit 5315088 into elastic:main Feb 13, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122224

@jordan-powers
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

@jordan-powers jordan-powers deleted the fix_120874 branch February 13, 2025 17:44
elasticsearchmachine pushed a commit that referenced this pull request Feb 13, 2025
) (#122520)

This patch removes the check that fails requests that attempt to use fields of type: nested within indices with mode time_series.

This patch also updates TimeSeriesIdFieldMapper#postParse to set the _id field on child documents once it's calculated.

Closes #120874

(cherry picked from commit 5315088)

# Conflicts:
#	rest-api-spec/build.gradle
@NatElkins
Copy link

Any idea when this enhancement will be usable for Elasticsearch Cloud customers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove validations that prevents the use of nested field type with index.mode=time_series

6 participants