-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable the use of nested field type with index.mode=time_series #122224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @jordan-powers, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @jordan-powers. I didn't expect that a change inDocumentParserContext
is needed, but I understand why it is needed.
The responsibility of adding the _id field to nested documents is now in another place, which isn't ideal. This is why added two comments about asserts.
// NOTE: we don't support nested fields in tsdb so it's safe to assume the standard id mapper. | ||
doc.add(new StringField(IdFieldMapper.NAME, idField.binaryValue(), Field.Store.NO)); | ||
} else if (indexSettings().getMode() == IndexMode.TIME_SERIES) { | ||
// For time series indices, the _id is generated from the _tsid, which in turn is generated from the values of the configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add an assert that getRoutingFields()
doesn't return a reference to RoutingFields.Noop#INSTANCE
? Just to make sure we are able to collect dimension values in order to generate _tsid / _id at a later stage?
// for time-series indices the _id isn't available at that point. | ||
assert context.id() != null; | ||
for (LuceneDocument doc : context.nonRootDocuments()) { | ||
doc.add(new StringField(IdFieldMapper.NAME, Uid.encodeId(context.id()), Field.Store.NO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also assert that _id
field hasn't been added yet to non root documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have nested within nested? If it's problematic for TSDB, we can throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, i wonder what happens with multi-nested documents. I believe you may want to check the parent of the document here because they can differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at DocumentParserContext#createNestedContext
, it seems that the child document's _id
always inherits the parent's _id
, which eventually inherits from the root document's _id
. So even in multi-level nested documents, the _id
is the same root-level _id
.
IndexableField idField = doc.getParent().getField(IdFieldMapper.NAME);
if (idField != null) {
doc.add(new StringField(IdFieldMapper.NAME, idField.binaryValue(), Field.Store.NO));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test it :)
body: | ||
size: 0 | ||
query: | ||
bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nested
query is required if you intend to query at the courses level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you're totally right
courses.credits: 3 | ||
|
||
- match: | ||
hits.total.value: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also do search that returns a hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely documented somewhere and that documentation needs to be adjusted. I think we are in the middle of documentation migration though so let's create a task for that.
// for time-series indices the _id isn't available at that point. | ||
assert context.id() != null; | ||
for (LuceneDocument doc : context.nonRootDocuments()) { | ||
doc.add(new StringField(IdFieldMapper.NAME, Uid.encodeId(context.id()), Field.Store.NO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, i wonder what happens with multi-nested documents. I believe you may want to check the parent of the document here because they can differ.
Also, add a test that actually returns a document.
time_series_dimension: true | ||
|
||
--- | ||
nested fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this test? i think it repeats tests above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it repeats any tests above since this is the only test in this file with a nested non-time_series_dimension field. But it is definitely redundant with the tests I added in 160_nested_fields.yml, so I'll take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant above in the PR, sorry.
// We need to add the uid or id to nested Lucene documents so that when a document gets deleted, the nested documents are | ||
// also deleted. Usually this happens when the nested document is created (in DocumentParserContext#createNestedContext), but | ||
// for time-series indices the _id isn't available at that point. | ||
var binaryId = context.doc().getField(IdFieldMapper.NAME).binaryValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getField
is kind of expensive since it iterates over all fields. Let's do this only when there are non root documents. Or maybe we can return the id from TsidExtractingIdFieldMapper
above.
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
) (#122520) This patch removes the check that fails requests that attempt to use fields of type: nested within indices with mode time_series. This patch also updates TimeSeriesIdFieldMapper#postParse to set the _id field on child documents once it's calculated. Closes #120874 (cherry picked from commit 5315088) # Conflicts: # rest-api-spec/build.gradle
Any idea when this enhancement will be usable for Elasticsearch Cloud customers? |
This patch removes the check that fails requests that attempt to use fields of
type: nested
within indices withmode time_series
.This patch also updates
TimeSeriesIdFieldMapper#postParse
to set the_id
field on child documents once it's calculated.Closes #120874