Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

image

When executing a query against many shards with many fields SearchContextStats:exists accounts for 2.13% of the entire benchmark samples.

This change attempts to make it a bit cheaper. Please see inline comments.

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Mar 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

public boolean exists(String field) {
var stat = cache.get(field);
return stat != null ? stat.config.exists : fastNoCacheFieldExists(field);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes result from the cache if present or otherwise resolves it bypassing the cache.

The issue here is that cache is only limited to 32 entries while the query touches 20 indices with 500 fields each.
In such cases (when the exists check is performed in the loop for each field) the result is not persisted anyways and is also quiet expensive to initialize.

fastNoCacheFieldExists instead only checks for field existence. This supposed to be cheaper as we need to find it in the first context in contrast to scanning all context to initialize the rest of the fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if you believe we should reconsider cache size or if you see another way around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this way. Was the time mostly spent messing around with the hash map? I feel like a bunch of hash lookups isn't usually worth caching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of cache size 32 dates back to Costin's original work two years ago in c351235. I see no problem increasing the cache size, but also think your optimization is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the time mostly spent messing around with the hash map? I feel like a bunch of hash lookups isn't usually worth caching.

Here it was dominated by iterating (the blue bar on top of pink) in the profiling output.
To initialize the chace we need to loop all contexts, to reply to exists we optimistically need to find the first one or loop all if the field does not exists (should be rare I assume):

private FieldConfig makeFieldConfig(String field) {
boolean exists = false;
boolean hasExactSubfield = true;
boolean indexed = true;
boolean hasDocValues = true;
// even if there are deleted documents, check the existence of a field
// since if it's missing, deleted documents won't change that
for (SearchExecutionContext context : contexts) {
if (context.isFieldMapped(field)) {
exists = exists || true;
MappedFieldType type = context.getFieldType(field);
indexed = indexed && type.isIndexed();
hasDocValues = hasDocValues && type.hasDocValues();
if (type instanceof TextFieldMapper.TextFieldType t) {
hasExactSubfield = hasExactSubfield && t.canUseSyntheticSourceDelegateForQuerying();
} else {
hasExactSubfield = false;
}
} else {
indexed = false;
hasDocValues = false;
hasExactSubfield = false;
}
}
if (exists == false) {
// if it does not exist on any context, no other settings are valid
return new FieldConfig(false, false, false, false);
} else {
return new FieldConfig(exists, hasExactSubfield, indexed, hasDocValues);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that while exists can short circuit the loop, the other checks cannot. So it makes sense to keep the loop and the cache for the other checks, but also to have a special case for exists. I wonder, however, how probable is it that the planner only calls for exists and not for the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could merge this and see in a new profiling output if the same appears somewhere else.
If it does then this is not helpful and we would have to rethink the cache (maybe make it not expire entries and take into account circuit-breaker?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, however, how probable is it that the planner only calls for exists and not for the others?

After thinking about it a little more:
In this case (benchmarks for from idx* | LIMIT something with 20 indices and 500 fields) we actually do not use SearchContextStats anywhere else, otherwise we would see initialization in other places already. We are likely use it in other cases so we might need to extend set of queries we benchmark.

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

public boolean exists(String field) {
var stat = cache.get(field);
return stat != null ? stat.config.exists : fastNoCacheFieldExists(field);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of cache size 32 dates back to Costin's original work two years ago in c351235. I see no problem increasing the cache size, but also think your optimization is fine too.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the cache size was done arbitrarily which obviously doesn't handle all scenarios. One thing to consider is that bypassing the cache, won't help invalidate old entries.
I can see how, when dealing with 10K fields, this creates write thrashing of the cache.

@idegtiarenko idegtiarenko merged commit 8bbd474 into elastic:main Apr 1, 2025
17 checks passed
@idegtiarenko idegtiarenko deleted the speedup_exists_check branch April 1, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants