Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/127225.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 127225
summary: Fix count optimization with pushable union types
area: ES|QL
type: bug
issues:
- 127200
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,30 @@ count:long | @timestamp:date
0 | 2023-10-23T13:50:00.000Z
;

multiIndexIpStatsPushableCount
required_capability: union_types
required_capability: fix_count_pushdown_for_union_types

FROM sample_data, sample_data_str
| STATS count=count(client_ip::ip)
;

count:long
14
;

multiIndexIpStatsPushableCountEval
required_capability: union_types
required_capability: fix_count_pushdown_for_union_types

FROM sample_data, sample_data_str
| EVAL client_ip = client_ip::ip
| STATS count=count(client_ip)
;

count:long
14
;

multiIndexIpStringStatsInline2
required_capability: union_types
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1038,7 +1038,13 @@ public enum Cap {
/**
* Support for the SAMPLE command
*/
SAMPLE(Build.current().isSnapshot());
SAMPLE(Build.current().isSnapshot()),

/**
* When pushing down {@code STATS count(field::type)} for a union type field, we wrongly used a synthetic attribute name in the
* query instead of the actual field name. This led to 0 counts instead of the correct result.
*/
FIX_COUNT_PUSHDOWN_FOR_UNION_TYPES;

private final boolean enabled;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@ public LogicalPlan apply(LogicalPlan plan, LocalLogicalOptimizerContext localLog

private LogicalPlan missingToNull(LogicalPlan plan, Predicate<FieldAttribute> shouldBeRetained) {
if (plan instanceof EsRelation relation) {
// Remove missing fields from the EsRelation because this is not where we will obtain them from; replace them by an Eval right
// after, instead. This allows us to safely re-use the attribute ids of the corresponding FieldAttributes.
// For any missing field, place an Eval right after the EsRelation to assign null values to that attribute (using the same name
// id!), thus avoiding that InsertFieldExtrations inserts a field extraction later.
Comment on lines +69 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// This means that an EsRelation[field1, field2, field3] where field1 and field 3 are missing will be replaced by
// Project[field1, field2, field3] <- keeps the ordering intact
// \_Eval[field1 = null, field3 = null]
// \_EsRelation[field2]
// \_EsRelation[field1, field2, field3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field is missing, would it be in the EsRelation at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Because we're in the local optimizer, the field does exist overall, and thus is put into the EsRelation after the field caps call on the coordinator. But on the local node it's missing! (Or in the search stats that this optimization run uses, to be more precise) This optimizer rule applies exclusively to such fields.

List<Attribute> relationOutput = relation.output();
Map<DataType, Alias> nullLiterals = Maps.newLinkedHashMapWithExpectedSize(DataType.types().size());
List<NamedExpression> newProjections = new ArrayList<>(relationOutput.size());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,9 @@ private Tuple<List<Attribute>, List<EsStatsQueryExec.Stat>> pushableStats(
// check if regular field
else {
if (target instanceof FieldAttribute fa) {
var fName = fa.name();
var fName = fa.fieldName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Is this the fix? I imagine this could have impacts in many places, so could this fix other bugs we've not noticed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is the fix. Simple oversight to use the correct field name - in the past, the field attribute name and the field name were the same, but union types had to break with this pattern.

I do not see other situations that this may fix, too, because the specific stats-pushdown optimization only triggers in a narrow slice of queries, anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm wondering is if we could make this mistake again and if we could prevent it.
Also, should we insert filters here using #fieldName()?

if (field.foldable() == false && field instanceof FieldAttribute fa && stats.isIndexed(fa.name())) {

if (context.searchStats().isSingleValue(fName)) {
fieldName = fa.name();
fieldName = fName;
query = QueryBuilders.existsQuery(fieldName);
}
}
Expand Down
Loading