Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,6 @@ private SearchContextStats(List<SearchExecutionContext> contexts) {
assert contexts != null && contexts.isEmpty() == false;
}

public boolean exists(String field) {
var stat = cache.computeIfAbsent(field, this::makeFieldStats);
return stat.config.exists;
}

private FieldStats makeFieldStats(String field) {
var stat = new FieldStats();
stat.config = makeFieldConfig(field);
Expand Down Expand Up @@ -123,19 +118,30 @@ private FieldConfig makeFieldConfig(String field) {
}
}

private boolean fastNoCacheFieldExists(String field) {
for (SearchExecutionContext context : contexts) {
if (context.isFieldMapped(field)) {
return true;
}
}
return false;
}

public boolean exists(String field) {
var stat = cache.get(field);
return stat != null ? stat.config.exists : fastNoCacheFieldExists(field);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes result from the cache if present or otherwise resolves it bypassing the cache.

The issue here is that cache is only limited to 32 entries while the query touches 20 indices with 500 fields each.
In such cases (when the exists check is performed in the loop for each field) the result is not persisted anyways and is also quiet expensive to initialize.

fastNoCacheFieldExists instead only checks for field existence. This supposed to be cheaper as we need to find it in the first context in contrast to scanning all context to initialize the rest of the fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if you believe we should reconsider cache size or if you see another way around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this way. Was the time mostly spent messing around with the hash map? I feel like a bunch of hash lookups isn't usually worth caching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of cache size 32 dates back to Costin's original work two years ago in c351235. I see no problem increasing the cache size, but also think your optimization is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the time mostly spent messing around with the hash map? I feel like a bunch of hash lookups isn't usually worth caching.

Here it was dominated by iterating (the blue bar on top of pink) in the profiling output.
To initialize the chace we need to loop all contexts, to reply to exists we optimistically need to find the first one or loop all if the field does not exists (should be rare I assume):

private FieldConfig makeFieldConfig(String field) {
boolean exists = false;
boolean hasExactSubfield = true;
boolean indexed = true;
boolean hasDocValues = true;
// even if there are deleted documents, check the existence of a field
// since if it's missing, deleted documents won't change that
for (SearchExecutionContext context : contexts) {
if (context.isFieldMapped(field)) {
exists = exists || true;
MappedFieldType type = context.getFieldType(field);
indexed = indexed && type.isIndexed();
hasDocValues = hasDocValues && type.hasDocValues();
if (type instanceof TextFieldMapper.TextFieldType t) {
hasExactSubfield = hasExactSubfield && t.canUseSyntheticSourceDelegateForQuerying();
} else {
hasExactSubfield = false;
}
} else {
indexed = false;
hasDocValues = false;
hasExactSubfield = false;
}
}
if (exists == false) {
// if it does not exist on any context, no other settings are valid
return new FieldConfig(false, false, false, false);
} else {
return new FieldConfig(exists, hasExactSubfield, indexed, hasDocValues);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that while exists can short circuit the loop, the other checks cannot. So it makes sense to keep the loop and the cache for the other checks, but also to have a special case for exists. I wonder, however, how probable is it that the planner only calls for exists and not for the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could merge this and see in a new profiling output if the same appears somewhere else.
If it does then this is not helpful and we would have to rethink the cache (maybe make it not expire entries and take into account circuit-breaker?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, however, how probable is it that the planner only calls for exists and not for the others?

After thinking about it a little more:
In this case (benchmarks for from idx* | LIMIT something with 20 indices and 500 fields) we actually do not use SearchContextStats anywhere else, otherwise we would see initialization in other places already. We are likely use it in other cases so we might need to extend set of queries we benchmark.


public boolean isIndexed(String field) {
var stat = cache.computeIfAbsent(field, this::makeFieldStats);
return stat.config.indexed;
return cache.computeIfAbsent(field, this::makeFieldStats).config.indexed;
}

public boolean hasDocValues(String field) {
var stat = cache.computeIfAbsent(field, this::makeFieldStats);
return stat.config.hasDocValues;
return cache.computeIfAbsent(field, this::makeFieldStats).config.hasDocValues;
}

public boolean hasExactSubfield(String field) {
var stat = cache.computeIfAbsent(field, this::makeFieldStats);
return stat.config.hasExactSubfield;
return cache.computeIfAbsent(field, this::makeFieldStats).config.hasExactSubfield;
}

public long count() {
Expand Down Expand Up @@ -218,7 +224,7 @@ public boolean isSingleValue(String field) {
var stat = cache.computeIfAbsent(field, this::makeFieldStats);
if (stat.singleValue == null) {
// there's no such field so no need to worry about multi-value fields
if (exists(field) == false) {
if (stat.config.exists == false) {
stat.singleValue = true;
} else {
// fields are MV per default
Expand Down