- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Aggs: Fix significant terms not finding background documents for nested fields #128472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggs: Fix significant terms not finding background documents for nested fields #128472
Conversation
| Hi @ivancea, I've created a changelog YAML for you. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help a bit with the review, these tests have:
- Setup: Index with a type ("normal" and "outlier"), and a value (1 and 2). That value is replicated 6 times (integer and keyword, and then as a nested and doubly nested field. Every "value" field has the same value in each document, so testing against each of them should render the same results.
- A first "Checking" test to ensure all data ingested is correct
- Test cases for the 3 kinds of values (normal, nested, doubly nested). Each of them do a sig_terms expecting the same results (except for the background_filter one)
| Pinging @elastic/es-analytical-engine (Team:Analytics) | 
| var nestedParentField = context.nestedLookup().getNestedParent(fieldType.name()); | ||
| if (nestedParentField != null) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best way to detect a nested field, but it worked well, and I didn't find any edge case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg, do you know the most right way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if buildQuery should take the nested context into account? That sounds like a bigger change than I'd like to make to aggs at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been some time since I worked on nested related functionality. Assuming that fieldType.name() returns a path, then  I think this is the right way to figure out the parent field.
| Hi @ivancea, I've updated the changelog YAML for you. | 
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
…ed fields (elastic#128472) Closes elastic#101163 Fixes the `significant_terms` aggregation not working correctly on nested fields.
Closes #101163
Fixes the
significant_termsaggregation not working correctly on nested fields.It was returning buckets with
bg_count: 0, meaning it wasn't detecting any background document.The filter it used to check for background documents was a plain TermsQuery, which wouldn't work on Nested fields.
The PR adds an extra NestedQuery wrapping the Terms in case the field is inside a Nested parent.
SignificantTextwas also checked, but it's explicitly documented that it doesn't work on text fields, as it needs to access the source.TBD: Backports