- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
ESQL: Fix FieldAttribute name usage in InferNonNullAggConstraint #128910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
674c4b6
              5926191
              83fb557
              85858f8
              4921bc7
              be574fd
              be97463
              5e696aa
              6da3847
              8a57abc
              7edd899
              f8a5d1b
              f8435eb
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 128910 | ||
| summary: Fix `FieldAttribute` name usage in `InferNonNullAggConstraint` | ||
| area: ES|QL | ||
| type: bug | ||
| issues: [] | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -27,14 +27,24 @@ | |
|  | ||
| /** | ||
| * Attribute for an ES field. | ||
| * To differentiate between the different type of fields this class offers: | ||
| * - name - the fully qualified name (foo.bar.tar) | ||
| * - path - the path pointing to the field name (foo.bar) | ||
| * - parent - the immediate parent of the field; useful for figuring out the type of field (nested vs object) | ||
| * - nestedParent - if nested, what's the parent (which might not be the immediate one) | ||
| * This class offers: | ||
| * - name - the name of the attribute, but not necessarily of the field. | ||
| * - The raw EsField representing the field; for parent.child.grandchild this is just grandchild. | ||
| * - parentName - the full path to the immediate parent of the field, e.g. parent.child (without .grandchild) | ||
| * | ||
| * To adequately represent e.g. union types, the name of the attribute can be altered because we may have multiple synthetic field | ||
| * attributes that really belong to the same underlying field. For instance, if a multi-typed field is used both as {@code field::string} | ||
| * and {@code field::ip}, we'll generate 2 field attributes called {@code $$field$converted_to$string} and {@code $$field$converted_to$ip} | ||
| * but still referring to the same underlying field. | ||
| */ | ||
| public class FieldAttribute extends TypedAttribute { | ||
|  | ||
| /** | ||
| * A field name, as found in the mapping. Includes the whole path from the root of the document. | ||
| * Implemented as a wrapper around {@link String} to distinguish from the attribute name (which sometimes differs!) at compile time. | ||
| */ | ||
| public record FieldName(String string) {}; | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not important, but I think I would name named the parameter  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had this initially, but then you have  Do we know of a precedent where we used a wrapper record before? In Rust, it'd be standard to call the wrapped attribute  | ||
|  | ||
| static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry( | ||
| Attribute.class, | ||
| "FieldAttribute", | ||
|  | @@ -43,6 +53,7 @@ public class FieldAttribute extends TypedAttribute { | |
|  | ||
| private final String parentName; | ||
| private final EsField field; | ||
| protected FieldName lazyFieldName; | ||
|  | ||
| public FieldAttribute(Source source, String name, EsField field) { | ||
| this(source, null, name, field); | ||
|  | @@ -136,15 +147,19 @@ public String parentName() { | |
| /** | ||
| * The full name of the field in the index, including all parent fields. E.g. {@code parent.subfield.this_field}. | ||
| */ | ||
| public String fieldName() { | ||
| // Before 8.15, the field name was the same as the attribute's name. | ||
| // On later versions, the attribute can be renamed when creating synthetic attributes. | ||
| // Because until 8.15, we couldn't set `synthetic` to true due to a bug, in that version such FieldAttributes are marked by their | ||
| // name starting with `$$`. | ||
| if ((synthetic() || name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX)) == false) { | ||
| return name(); | ||
| public FieldName fieldName() { | ||
| if (lazyFieldName == null) { | ||
| // Before 8.15, the field name was the same as the attribute's name. | ||
| // On later versions, the attribute can be renamed when creating synthetic attributes. | ||
| // Because until 8.15, we couldn't set `synthetic` to true due to a bug, in that version such FieldAttributes are marked by | ||
| // their | ||
| // name starting with `$$`. | ||
| if ((synthetic() || name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX)) == false) { | ||
| lazyFieldName = new FieldName(name()); | ||
| } | ||
| lazyFieldName = new FieldName(Strings.hasText(parentName) ? parentName + "." + field.getName() : field.getName()); | ||
| } | ||
| return Strings.hasText(parentName) ? parentName + "." + field.getName() : field.getName(); | ||
| return lazyFieldName; | ||
| } | ||
|  | ||
| public EsField.Exact getExactInfo() { | ||
|  | ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -25,7 +25,7 @@ | |
|  | ||
| /** | ||
| * The vast majority of aggs ignore null entries - this rule adds a pushable filter, as it is cheap | ||
| * to execute, to filter this entries out to begin with. | ||
| * to execute, to filter these entries out to begin with. | ||
| * STATS x = min(a), y = sum(b) | ||
| * becomes | ||
| * | WHERE a IS NOT NULL OR b IS NOT NULL | ||
|  | @@ -55,7 +55,7 @@ protected LogicalPlan rule(Aggregate aggregate, LocalLogicalOptimizerContext con | |
| Expression field = af.field(); | ||
| // ignore literals (e.g. COUNT(1)) | ||
| // make sure the field exists at the source and is indexed (not runtime) | ||
| if (field.foldable() == false && field instanceof FieldAttribute fa && stats.isIndexed(fa.name())) { | ||
| if (field.foldable() == false && field instanceof FieldAttribute fa && stats.isIndexed(fa.fieldName())) { | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, this is the actual fix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks so familiar. I'm sure I've done that in a few places before, probably during the union-types work. Pity I was not thorough! What I did was replace  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Caution! 
 I hope this gets easier with the updated javadoc. :/ | ||
| nonNullAggFields.add(field); | ||
| } else { | ||
| // otherwise bail out since unless disjunction needs to cover _all_ fields, things get filtered out | ||
|  | ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -126,32 +126,36 @@ public boolean canUseEqualityOnSyntheticSourceDelegate(FieldAttribute attr, Stri | |
| }; | ||
|  | ||
| /** | ||
| * If we have access to SearchStats over a collection of shards, we can make more fine-grained decisions about what can be pushed down. | ||
| * This should open up more opportunities for lucene pushdown. | ||
| * If we have access to {@link SearchStats} over a collection of shards, we can make more fine-grained decisions about what can be | ||
| * pushed down. This should open up more opportunities for lucene pushdown. | ||
| */ | ||
| static LucenePushdownPredicates from(SearchStats stats) { | ||
| // TODO: use FieldAttribute#fieldName, otherwise this doesn't apply to field attributes used for union types. | ||
| // C.f. https://github.com/elastic/elasticsearch/issues/128905 | ||
| return new LucenePushdownPredicates() { | ||
| @Override | ||
| public boolean hasExactSubfield(FieldAttribute attr) { | ||
| return stats.hasExactSubfield(attr.name()); | ||
| return stats.hasExactSubfield(new FieldAttribute.FieldName(attr.name())); | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we not calling attr.fieldName()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This fails the union type csv tests - I didn't want to fix it in this PR but opened #128905 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Applies also to the instances below. | ||
| } | ||
|  | ||
| @Override | ||
| public boolean isIndexedAndHasDocValues(FieldAttribute attr) { | ||
| // We still consider the value of isAggregatable here, because some fields like ScriptFieldTypes are always aggregatable | ||
| // But this could hide issues with fields that are not indexed but are aggregatable | ||
| // This is the original behaviour for ES|QL, but is it correct? | ||
| return attr.field().isAggregatable() || stats.isIndexed(attr.name()) && stats.hasDocValues(attr.name()); | ||
| return attr.field().isAggregatable() | ||
| || stats.isIndexed(new FieldAttribute.FieldName(attr.name())) | ||
|         
                  alex-spies marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| && stats.hasDocValues(new FieldAttribute.FieldName(attr.name())); | ||
|         
                  alex-spies marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| } | ||
|  | ||
| @Override | ||
| public boolean isIndexed(FieldAttribute attr) { | ||
| return stats.isIndexed(attr.name()); | ||
| return stats.isIndexed(new FieldAttribute.FieldName(attr.name())); | ||
| } | ||
|  | ||
| @Override | ||
| public boolean canUseEqualityOnSyntheticSourceDelegate(FieldAttribute attr, String value) { | ||
| return stats.canUseEqualityOnSyntheticSourceDelegate(attr.field().getName(), value); | ||
| return stats.canUseEqualityOnSyntheticSourceDelegate(attr.fieldName(), value); | ||
| } | ||
| }; | ||
| } | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In union-types we do not have multiple names like this, since the name specifies the type we're converting to, not the type we're converting from. So if
fieldis IP and KEYWORD in different mappings, the converted field might be$$field$coverted_to$ipif the query usedfield::ipto resolve the mapping conflict. A different query might usefield::stringto resolve the conflict, and in that query the attribute gets name$$field$converted_to$string. So we only ever generate 1 new attribute, not 2, in the same query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do generate more than 1; I just confirmed this locally with the example given in the javadoc: