Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/130387.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 130387
summary: Push `==` to `text` fields to lucene
area: ES|QL
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ private AnnotatedTextFieldType(
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate,
Map<String, String> meta
) {
super(name, true, store, tsi, isSyntheticSource, syntheticSourceDelegate, meta, false, false);
super(name, true, store, tsi, isSyntheticSource, syntheticSourceDelegate, meta, false, false, false);
}

public AnnotatedTextFieldType(String name, Map<String, String> meta) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
import org.elasticsearch.index.fielddata.SourceValueFetcherSortedBinaryIndexFieldData;
import org.elasticsearch.index.fielddata.StoredFieldSortedBinaryIndexFieldData;
import org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.index.similarity.SimilarityProvider;
import org.elasticsearch.script.field.DelegateDocValuesField;
Expand Down Expand Up @@ -413,7 +414,8 @@ private TextFieldType buildFieldType(
SyntheticSourceHelper.syntheticSourceDelegate(fieldType, multiFields),
meta.getValue(),
eagerGlobalOrdinals.getValue(),
indexPhrases.getValue()
indexPhrases.getValue(),
matchQueryYieldsCandidateMatchesForEquality()
);
if (fieldData.getValue()) {
ft.setFielddata(true, freqFilter.getValue());
Expand All @@ -422,6 +424,25 @@ private TextFieldType buildFieldType(
return ft;
}

/**
* Does a `match` query generate all valid candidates for `==`? Meaning,
* if I do a match query for any string, say `foo bar baz`, then that
* query will find all documents that indexed the same string.
* <p>
* This should be true for most sanely configured text fields. That's
* just how we use them for search. But it's quite possible to make
* the index analyzer not agree with the search analyzer, for example.
* </p>
* <p>
* So this implementation is ultra-paranoid.
* </p>
*/
private boolean matchQueryYieldsCandidateMatchesForEquality() {
return index.getValue() == Boolean.TRUE
&& analyzers.indexAnalyzer.isConfigured() == false
&& analyzers.searchAnalyzer.isConfigured() == false;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

==SEARCH FOLKS LOOK HERE==

Is this paranoid enough? Like, could you break the match query with other settings I'm not checking? I'd love to default this to only working if there's nothing configured on the text field and then opt in configurations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is paranoid indeed. Just checking if the analyzer are the same is not enough since we have morphological analyzer that can tokenize differently depending on the surrounding text (Kuromoji, Nori, ..).
We could have a list of allowed analyzers but that sounds like a good start that will handle 99% of the use cases.
I can't think of another setting that would change this, are we matching the input as phrase query or as a conjunction of term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conjunction of terms right now. I probably should do phrase to be honest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth building a field setting that overrides our choices? A true/false/null thing - true means "I know match is fine, push", false means "I know match isn't fine, never push" and null means "you figure it out".

since we have morphological analyzer that can tokenize differently depending on the surrounding text (Kuromoji, Nori, ..)

Can you give an example? So long as the same text tokenizes the same way that should be fine. We're not pushing "contains tokens" here - we're pushing exat string ==.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example? So long as the same text tokenizes the same way that should be fine. We're not pushing "contains tokens" here - we're pushing exat string ==.

Ah right sorry, the input must be exactly the same. So then another test, to remove the analyzer restriction, would be to use a memory index and verify that the search analyzer query can find the input indexed with the index analyzer?

Do you think it's worth building a field setting that overrides our choices?

I wonder if that should be pushed down to mappings directly. We have termQuery, phraseQuery defined by the field mappers now so we could delegate the decision there with an exactQuery? That would help with fields like match_only_text to avoid doing two verifications on the content?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ESQL might indeed deserve some more query methods on MappedFieldType. I've been adding methods to, like, TextFieldMapper that help me decide what query to build. But I think it'd be cleaner to put these into the MappedFieldType. Also, it'd be more possible for, well, more fields to link into ESQL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, I follow high-level. Sounds like it would be cleaner to add the query generation to MappedFieldType. I am not quite sure what the default generated query would be though, but that's an implementation detail. I am also not clear on how that same method may be used in match only text, not familiar enough with it, but it all sounds like a good idea.


private SubFieldInfo buildPrefixInfo(MapperBuilderContext context, FieldType fieldType, TextFieldType tft) {
if (indexPrefixes.get() == null) {
return null;
Expand Down Expand Up @@ -694,6 +715,12 @@ public static class TextFieldType extends StringFieldType {
*/
private final KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate;

/**
* Does a {@link MatchQueryBuilder} produce <strong>all</strong> documents
* that <strong>might</strong> have equal text to the query's value.
*/
private final boolean matchQueryYieldsCandidateMatchesForEquality;

public TextFieldType(
String name,
boolean indexed,
Expand All @@ -703,7 +730,8 @@ public TextFieldType(
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate,
Map<String, String> meta,
boolean eagerGlobalOrdinals,
boolean indexPhrases
boolean indexPhrases,
boolean matchQueryYieldsCandidateMatchesForEquality
) {
super(name, indexed, stored, false, tsi, meta);
fielddata = false;
Expand All @@ -712,6 +740,7 @@ public TextFieldType(
this.syntheticSourceDelegate = syntheticSourceDelegate;
this.eagerGlobalOrdinals = eagerGlobalOrdinals;
this.indexPhrases = indexPhrases;
this.matchQueryYieldsCandidateMatchesForEquality = matchQueryYieldsCandidateMatchesForEquality;
}

public TextFieldType(String name, boolean indexed, boolean stored, Map<String, String> meta) {
Expand All @@ -728,6 +757,7 @@ public TextFieldType(String name, boolean indexed, boolean stored, Map<String, S
syntheticSourceDelegate = null;
eagerGlobalOrdinals = false;
indexPhrases = false;
matchQueryYieldsCandidateMatchesForEquality = true;
}

public TextFieldType(String name, boolean isSyntheticSource) {
Expand All @@ -740,7 +770,8 @@ public TextFieldType(String name, boolean isSyntheticSource) {
null,
Collections.emptyMap(),
false,
false
false,
true
);
}

Expand Down Expand Up @@ -1030,6 +1061,10 @@ public boolean canUseSyntheticSourceDelegateForQueryingEquality(String str) {
return str.length() <= syntheticSourceDelegate.ignoreAbove();
}

public boolean matchQueryYieldsCandidateMatchesForEquality() {
return matchQueryYieldsCandidateMatchesForEquality;
}

@Override
public BlockLoader blockLoader(BlockLoaderContext blContext) {
if (canUseSyntheticSourceDelegateForLoading()) {
Expand Down Expand Up @@ -1219,7 +1254,7 @@ public KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate() {
public static class ConstantScoreTextFieldType extends TextFieldType {

public ConstantScoreTextFieldType(String name, boolean indexed, boolean stored, TextSearchInfo tsi, Map<String, String> meta) {
super(name, indexed, stored, tsi, false, null, meta, false, false);
super(name, indexed, stored, tsi, false, null, meta, false, false, /* unused */ false);
}

public ConstantScoreTextFieldType(String name) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1538,6 +1538,7 @@ private TextFieldMapper.TextFieldType storedTextField(String name) {
null,
Map.of(),
false,
false,
false
);
}
Expand All @@ -1552,6 +1553,7 @@ private TextFieldMapper.TextFieldType textFieldWithDelegate(String name, Keyword
delegate,
Map.of(),
false,
false,
false
);
}
Expand Down
Loading
Loading