ESQL: Push `==` to `text` fields to lucene #130387

nik9000 · 2025-07-01T12:31:12Z

Speeds up ESQL's == on text fields by building a lucene query to find candidates.

In ESQL a == "words cat words" on text fields means "a has exact the text 'words cat words'". We don't have a lucene query to matches this exactly but we can get it to make candidate documents that might match by running the query:

"match": { "a": "words cat words" }

That'll produce all the documents with the terms words and cat. We can then run those documents through our compute engine's normal code to check == - it'll load the text fields and run the real equality check.

The other option - the thing we do without this PR - is to push nothing to lucene for this ==. If there's other filters we'll push them. And, sometimes, those are plenty! But in many cases we get a pretty big boost out of this candidates strategy.

Quick and dirty best case testing yields numbers like:

    text eq {"took": 17,"documents_found":   1000}
not text eq {"took":777,"documents_found":1000000}

The took there is milliseconds. The queries are FROM test | WHERE text == \"words words 1\" | STATS COUNT(*) (optimized by this PR) and FROM test | WHERE NOT text == \"words words 1\" | STATS COUNT(*) (unchanged) without a time filter. The documents_found shows that the == comparison only needs to scan load 1000 documents but NOT == needs to scan all of the million documents I put in the index.

Relates to #128529

NOTE: This won't work for all text fields. There's an unbelievable amount of configuration possible on these fields and it's quite possible for the "match": { "a": "words cat words" } query not to generate candidate for ==. The easiest way to do this is to configure the index_analyzer and search_analyzer in incompatible ways. But i'm sure there are tons of ways!

``` text eq {"took": 13,"documents_found": 1000} not text eq {"took":4482,"documents_found":10000000} ```

elasticsearchmachine · 2025-07-01T12:31:37Z

Hi @nik9000, I've created a changelog YAML for you.

nik9000 · 2025-07-01T12:46:28Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+            return index.getValue() == Boolean.TRUE
+                && analyzers.indexAnalyzer.isConfigured() == false
+                && analyzers.searchAnalyzer.isConfigured() == false;
+        }


==SEARCH FOLKS LOOK HERE==

Is this paranoid enough? Like, could you break the match query with other settings I'm not checking? I'd love to default this to only working if there's nothing configured on the text field and then opt in configurations.

This is paranoid indeed. Just checking if the analyzer are the same is not enough since we have morphological analyzer that can tokenize differently depending on the surrounding text (Kuromoji, Nori, ..).
We could have a list of allowed analyzers but that sounds like a good start that will handle 99% of the use cases.
I can't think of another setting that would change this, are we matching the input as phrase query or as a conjunction of term?

Conjunction of terms right now. I probably should do phrase to be honest.

Do you think it's worth building a field setting that overrides our choices? A true/false/null thing - true means "I know match is fine, push", false means "I know match isn't fine, never push" and null means "you figure it out".

since we have morphological analyzer that can tokenize differently depending on the surrounding text (Kuromoji, Nori, ..)

Can you give an example? So long as the same text tokenizes the same way that should be fine. We're not pushing "contains tokens" here - we're pushing exat string ==.

Can you give an example? So long as the same text tokenizes the same way that should be fine. We're not pushing "contains tokens" here - we're pushing exat string ==.

Ah right sorry, the input must be exactly the same. So then another test, to remove the analyzer restriction, would be to use a memory index and verify that the search analyzer query can find the input indexed with the index analyzer?

Do you think it's worth building a field setting that overrides our choices?

I wonder if that should be pushed down to mappings directly. We have termQuery, phraseQuery defined by the field mappers now so we could delegate the decision there with an exactQuery? That would help with fields like match_only_text to avoid doing two verifications on the content?

cc @javanna

I think ESQL might indeed deserve some more query methods on MappedFieldType. I've been adding methods to, like, TextFieldMapper that help me decide what query to build. But I think it'd be cleaner to put these into the MappedFieldType. Also, it'd be more possible for, well, more fields to link into ESQL.

hey, I follow high-level. Sounds like it would be cleaner to add the query generation to MappedFieldType. I am not quite sure what the default generated query would be though, but that's an implementation detail. I am also not clear on how that same method may be used in match only text, not familiar enough with it, but it all sounds like a good idea.

nik9000 · 2025-07-01T12:49:52Z

.../main/java/org/elasticsearch/xpack/esql/expression/predicate/operator/comparison/Equals.java

+        String name = handler.nameOf(lhs);
+        if (pushdownPredicates.canUseEqualityOnSyntheticSourceDelegate(lhs, rhs)) {
+            return new SingleValueQuery(new EqualsSyntheticSourceDelegate(source(), name, rhs), name, true);
+        }


==SEARCH FOLKS LOOK HERE==

The if statement blow makes that candidate matching query.

…push_bare_text

elasticsearchmachine · 2025-07-17T12:35:02Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…push_bare_text

nik9000 added 11 commits June 11, 2025 12:26

WIP

fcf243a

``` text eq {"took": 13,"documents_found": 1000} not text eq {"took":4482,"documents_found":10000000} ```

Merge branch 'main' into esql_push_bare_text

450d81f

Name

0c0f77e

More

3e3e5e6

Merge branch 'main' into esql_push_bare_text

b9b7105

Compile

e769c32

t Merge branch 'main' into esql_push_bare_text

4d6bba5

Merge branch 'main' into esql_push_bare_text

a522cb9

Explain

07f92d7

Fixup

8959080

Merge branch 'main' into esql_push_bare_text

f432dd3

nik9000 requested a review from javanna July 1, 2025 12:31

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v9.2.0 v9.1.1 labels Jul 1, 2025

Update docs/changelog/130387.yaml

6e519cd

github-actions bot deployed to docs-preview July 1, 2025 12:32 View deployment

[CI] Auto commit changes from spotless

0a24912

github-actions bot deployed to docs-preview July 1, 2025 12:40 View deployment

nik9000 commented Jul 1, 2025

View reviewed changes

nik9000 removed the v9.1.1 label Jul 1, 2025

nik9000 added 3 commits July 15, 2025 13:45

Merge branch 'main' into esql_push_bare_text

d3ea0f0

Fixup

b63a626

Merge remote-tracking branch 'nik9000/esql_push_bare_text' into esql_…

5603252

…push_bare_text

github-actions bot deployed to docs-preview July 15, 2025 19:03 View deployment

Merge branch 'main' into esql_push_bare_text

ad4f407

github-actions bot deployed to docs-preview July 16, 2025 20:38 View deployment

nik9000 marked this pull request as ready for review July 17, 2025 12:34

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 17, 2025

nik9000 added 3 commits July 17, 2025 08:35

Merge branch 'main' into esql_push_bare_text

b5e91c8

Fixup

0c1efa5

Merge remote-tracking branch 'nik9000/esql_push_bare_text' into esql_…

4b30b5b

…push_bare_text

github-actions bot deployed to docs-preview July 17, 2025 14:37 View deployment

Off by one fun

96060ec

github-actions bot deployed to docs-preview July 17, 2025 18:11 View deployment

Merge branch 'main' into esql_push_bare_text

57702c6

github-actions bot deployed to docs-preview July 17, 2025 19:20 View deployment

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Push `==` to `text` fields to lucene #130387

ESQL: Push `==` to `text` fields to lucene #130387

Uh oh!

nik9000 commented Jul 1, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 1, 2025

Uh oh!

nik9000 Jul 1, 2025

Uh oh!

jimczi Jul 1, 2025

Uh oh!

nik9000 Jul 1, 2025

Uh oh!

nik9000 Jul 1, 2025

Uh oh!

jimczi Jul 1, 2025

Uh oh!

jimczi Jul 1, 2025

Uh oh!

nik9000 Jul 2, 2025

Uh oh!

javanna Jul 22, 2025

Uh oh!

nik9000 Jul 1, 2025

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ESQL: Push == to text fields to lucene #130387

Are you sure you want to change the base?

ESQL: Push == to text fields to lucene #130387

Uh oh!

Conversation

nik9000 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ESQL: Push `==` to `text` fields to lucene #130387

ESQL: Push `==` to `text` fields to lucene #130387

nik9000 commented Jul 1, 2025 •

edited

Loading