Skip to content

Commit b424ad0

Browse files
committed
Enable Mapped Field Types to Override Default Highlighter (elastic#121176)
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field. The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default. All other fields will continue to use the `unified` highlighter by default.
1 parent 732267f commit b424ad0

File tree

9 files changed

+127
-27
lines changed

9 files changed

+127
-27
lines changed

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,14 +133,13 @@ You can extract the most relevant fragments from a semantic text field by using
133133
POST test-index/_search
134134
{
135135
"query": {
136-
"semantic": {
137-
"field": "my_semantic_field"
136+
"match": {
137+
"my_semantic_field": "Which country is Paris in?"
138138
}
139139
},
140140
"highlight": {
141141
"fields": {
142142
"my_semantic_field": {
143-
"type": "semantic",
144143
"number_of_fragments": 2, <1>
145144
"order": "score" <2>
146145
}
@@ -152,6 +151,33 @@ POST test-index/_search
152151
<1> Specifies the maximum number of fragments to return.
153152
<2> Sorts highlighted fragments by score when set to `score`. By default, fragments will be output in the order they appear in the field (order: none).
154153

154+
Highlighting is supported on fields other than semantic_text.
155+
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text,
156+
you can explicitly enforce the `semantic` highlighter in the query:
157+
158+
[source,console]
159+
------------------------------------------------------------
160+
PUT test-index
161+
{
162+
"query": {
163+
"match": {
164+
"my_field": "Which country is Paris in?"
165+
}
166+
},
167+
"highlight": {
168+
"fields": {
169+
"my_field": {
170+
"type": "semantic", <1>
171+
"number_of_fragments": 2,
172+
"order": "score"
173+
}
174+
}
175+
}
176+
}
177+
------------------------------------------------------------
178+
// TEST[skip:Requires inference endpoint]
179+
<1> Ensures that highlighting is applied exclusively to semantic_text fields.
180+
155181
[discrete]
156182
[[custom-indexing]]
157183
==== Customizing `semantic_text` indexing

docs/reference/search/search-your-data/highlighting.asciidoc

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ GET /_search
3737
// TEST[setup:my_index]
3838

3939
{es} supports three highlighters: `unified`, `plain`, and `fvh` (fast vector
40-
highlighter). You can specify the highlighter `type` you want to use
41-
for each field.
40+
highlighter) for `text` and `keyword` fields and the `semantic` highlighter for `semantic_text` fields.
41+
You can specify the highlighter `type` you want to use for each field or rely on the field type's default highlighter.
4242

4343
[discrete]
4444
[[unified-highlighter]]
@@ -48,7 +48,19 @@ highlighter breaks the text into sentences and uses the BM25 algorithm to score
4848
individual sentences as if they were documents in the corpus. It also supports
4949
accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. The `unified`
5050
highlighter can combine matches from multiple fields into one result (see
51-
`matched_fields`). This is the default highlighter.
51+
`matched_fields`).
52+
53+
This is the default highlighter for all `text` and `keyword` fields.
54+
55+
[discrete]
56+
[[semantic-highlighter]]
57+
==== Semantic Highlighter
58+
59+
The `semantic` highlighter is specifically designed for use with the <<semantic-text, `semantic_text`>> field.
60+
It identifies and extracts the most relevant fragments from the field based on semantic
61+
similarity between the query and each fragment.
62+
63+
By default, <<semantic-text, `semantic_text`>> fields use the semantic highlighter.
5264

5365
[discrete]
5466
[[plain-highlighter]]

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
import org.elasticsearch.index.query.SearchExecutionContext;
4646
import org.elasticsearch.search.DocValueFormat;
4747
import org.elasticsearch.search.fetch.subphase.FetchFieldsPhase;
48+
import org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter;
4849
import org.elasticsearch.search.lookup.SearchLookup;
4950

5051
import java.io.IOException;
@@ -221,6 +222,13 @@ public TimeSeriesParams.MetricType getMetricType() {
221222
return null;
222223
}
223224

225+
/**
226+
* Returns the default highlighter type to use when highlighting the field.
227+
*/
228+
public String getDefaultHighlighter() {
229+
return DefaultHighlighter.NAME;
230+
}
231+
224232
/** Generates a query that will only match documents that contain the given value.
225233
* The default implementation returns a {@link TermQuery} over the value bytes
226234
* @throws IllegalArgumentException if {@code value} cannot be converted to the expected data type or if the field is not searchable

server/src/main/java/org/elasticsearch/search/SearchModule.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -936,7 +936,7 @@ private static Map<String, Highlighter> setupHighlighters(Settings settings, Lis
936936
NamedRegistry<Highlighter> highlighters = new NamedRegistry<>("highlighter");
937937
highlighters.register("fvh", new FastVectorHighlighter(settings));
938938
highlighters.register("plain", new PlainHighlighter());
939-
highlighters.register("unified", new DefaultHighlighter());
939+
highlighters.register(DefaultHighlighter.NAME, new DefaultHighlighter());
940940
highlighters.extractAndRegister(plugins, SearchPlugin::getHighlighters);
941941

942942
return unmodifiableMap(highlighters.getRegistry());

server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/DefaultHighlighter.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
import static org.elasticsearch.lucene.search.uhighlight.CustomUnifiedHighlighter.MULTIVAL_SEP_CHAR;
5050

5151
public class DefaultHighlighter implements Highlighter {
52-
52+
public static final String NAME = "unified";
5353
public static final NodeFeature UNIFIED_HIGHLIGHTER_MATCHED_FIELDS = new NodeFeature("unified_highlighter_matched_fields", true);
5454

5555
@Override

server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/HighlightPhase.java

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ public void process(HitContext hitContext) throws IOException {
6666
Map<String, Function<HitContext, FieldHighlightContext>> contextBuilders = fieldContext.builders;
6767
for (String field : contextBuilders.keySet()) {
6868
FieldHighlightContext fieldContext = contextBuilders.get(field).apply(hitContext);
69-
Highlighter highlighter = getHighlighter(fieldContext.field);
69+
Highlighter highlighter = getHighlighter(fieldContext.field, fieldContext.fieldType);
7070
HighlightField highlightField = highlighter.highlight(fieldContext);
7171
if (highlightField != null) {
7272
// Note that we make sure to use the original field name in the response. This is because the
@@ -80,10 +80,10 @@ public void process(HitContext hitContext) throws IOException {
8080
};
8181
}
8282

83-
private Highlighter getHighlighter(SearchHighlightContext.Field field) {
83+
private Highlighter getHighlighter(SearchHighlightContext.Field field, MappedFieldType fieldType) {
8484
String highlighterType = field.fieldOptions().highlighterType();
8585
if (highlighterType == null) {
86-
highlighterType = "unified";
86+
highlighterType = fieldType.getDefaultHighlighter();
8787
}
8888
Highlighter highlighter = highlighters.get(highlighterType);
8989
if (highlighter == null) {
@@ -103,15 +103,14 @@ private FieldContext contextBuilders(
103103
Map<String, Function<HitContext, FieldHighlightContext>> builders = new LinkedHashMap<>();
104104
StoredFieldsSpec storedFieldsSpec = StoredFieldsSpec.NO_REQUIREMENTS;
105105
for (SearchHighlightContext.Field field : highlightContext.fields()) {
106-
Highlighter highlighter = getHighlighter(field);
107-
108106
Collection<String> fieldNamesToHighlight = context.getSearchExecutionContext().getMatchingFieldNames(field.field());
109107

110108
boolean fieldNameContainsWildcards = field.field().contains("*");
111109
Set<String> storedFields = new HashSet<>();
112110
boolean sourceRequired = false;
113111
for (String fieldName : fieldNamesToHighlight) {
114112
MappedFieldType fieldType = context.getSearchExecutionContext().getFieldType(fieldName);
113+
Highlighter highlighter = getHighlighter(field, fieldType);
115114

116115
// We should prevent highlighting if a field is anything but a text, match_only_text,
117116
// or keyword field.

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ public Set<NodeFeature> getFeatures() {
3737
}
3838

3939
private static final NodeFeature SEMANTIC_TEXT_HIGHLIGHTER = new NodeFeature("semantic_text.highlighter");
40+
private static final NodeFeature SEMANTIC_TEXT_HIGHLIGHTER_DEFAULT = new NodeFeature("semantic_text.highlighter.default");
4041

4142
@Override
4243
public Set<NodeFeature> getTestFeatures() {
@@ -52,7 +53,8 @@ public Set<NodeFeature> getTestFeatures() {
5253
SemanticInferenceMetadataFieldsMapper.EXPLICIT_NULL_FIXES,
5354
SEMANTIC_KNN_VECTOR_QUERY_REWRITE_INTERCEPTION_SUPPORTED,
5455
TextSimilarityRankRetrieverBuilder.TEXT_SIMILARITY_RERANKER_ALIAS_HANDLING_FIX,
55-
SemanticInferenceMetadataFieldsMapper.INFERENCE_METADATA_FIELDS_ENABLED_BY_DEFAULT
56+
SemanticInferenceMetadataFieldsMapper.INFERENCE_METADATA_FIELDS_ENABLED_BY_DEFAULT,
57+
SEMANTIC_TEXT_HIGHLIGHTER_DEFAULT
5658
);
5759
}
5860
}

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@
7373
import org.elasticsearch.xpack.core.ml.inference.results.MlTextEmbeddingResults;
7474
import org.elasticsearch.xpack.core.ml.inference.results.TextExpansionResults;
7575
import org.elasticsearch.xpack.core.ml.search.SparseVectorQueryBuilder;
76+
import org.elasticsearch.xpack.inference.highlight.SemanticTextHighlighter;
7677

7778
import java.io.IOException;
7879
import java.io.UncheckedIOException;
@@ -582,6 +583,11 @@ public String familyTypeName() {
582583
return TextFieldMapper.CONTENT_TYPE;
583584
}
584585

586+
@Override
587+
public String getDefaultHighlighter() {
588+
return SemanticTextHighlighter.NAME;
589+
}
590+
585591
public String getInferenceId() {
586592
return inferenceId;
587593
}

x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml

Lines changed: 60 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -55,22 +55,32 @@ setup:
5555
index.mapping.semantic_text.use_legacy_format: false
5656
mappings:
5757
properties:
58+
title:
59+
type: text
5860
body:
5961
type: semantic_text
6062
inference_id: dense-inference-id
6163

62-
---
63-
"Highlighting using a sparse embedding model":
6464
- do:
6565
index:
6666
index: test-sparse-index
6767
id: doc_1
6868
body:
69+
title: "Elasticsearch"
6970
body: ["ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!"]
7071
refresh: true
7172

72-
- match: { result: created }
73+
- do:
74+
index:
75+
index: test-dense-index
76+
id: doc_1
77+
body:
78+
title: "Elasticsearch"
79+
body: [ "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!" ]
80+
refresh: true
7381

82+
---
83+
"Highlighting using a sparse embedding model":
7484
- do:
7585
search:
7686
index: test-sparse-index
@@ -153,16 +163,6 @@ setup:
153163

154164
---
155165
"Highlighting using a dense embedding model":
156-
- do:
157-
index:
158-
index: test-dense-index
159-
id: doc_1
160-
body:
161-
body: ["ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!"]
162-
refresh: true
163-
164-
- match: { result: created }
165-
166166
- do:
167167
search:
168168
index: test-dense-index
@@ -243,4 +243,51 @@ setup:
243243
- match: { hits.hits.0.highlight.body.0: "You Know, for Search!" }
244244
- match: { hits.hits.0.highlight.body.1: "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides." }
245245

246+
---
247+
"Default highlighter for fields":
248+
- requires:
249+
cluster_features: "semantic_text.highlighter.default"
250+
reason: semantic text field defaults to the semantic highlighter
251+
252+
- do:
253+
search:
254+
index: test-dense-index
255+
body:
256+
query:
257+
match:
258+
body: "What is Elasticsearch?"
259+
highlight:
260+
fields:
261+
body:
262+
order: "score"
263+
number_of_fragments: 2
264+
265+
- match: { hits.total.value: 1 }
266+
- match: { hits.hits.0._id: "doc_1" }
267+
- length: { hits.hits.0.highlight.body: 2 }
268+
- match: { hits.hits.0.highlight.body.0: "You Know, for Search!" }
269+
- match: { hits.hits.0.highlight.body.1: "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides." }
270+
271+
---
272+
"semantic highlighter ignores non-inference fields":
273+
- requires:
274+
cluster_features: "semantic_text.highlighter.default"
275+
reason: semantic text field defaults to the semantic highlighter
276+
277+
- do:
278+
search:
279+
index: test-dense-index
280+
body:
281+
query:
282+
match:
283+
title: "Elasticsearch"
284+
highlight:
285+
fields:
286+
title:
287+
type: semantic
288+
number_of_fragments: 2
289+
290+
- match: { hits.total.value: 1 }
291+
- match: { hits.hits.0._id: "doc_1" }
292+
- not_exists: hits.hits.0.highlight.title
246293

0 commit comments

Comments
 (0)