Skip to content

Commit dbeb55c

Browse files
authored
Enable Mapped Field Types to Override Default Highlighter (#121176)
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field. The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default. All other fields will continue to use the `unified` highlighter by default.
1 parent 6486299 commit dbeb55c

File tree

9 files changed

+128
-26
lines changed

9 files changed

+128
-26
lines changed

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,14 +133,13 @@ You can extract the most relevant fragments from a semantic text field by using
133133
POST test-index/_search
134134
{
135135
"query": {
136-
"semantic": {
137-
"field": "my_semantic_field"
136+
"match": {
137+
"my_semantic_field": "Which country is Paris in?"
138138
}
139139
},
140140
"highlight": {
141141
"fields": {
142142
"my_semantic_field": {
143-
"type": "semantic",
144143
"number_of_fragments": 2, <1>
145144
"order": "score" <2>
146145
}
@@ -152,6 +151,33 @@ POST test-index/_search
152151
<1> Specifies the maximum number of fragments to return.
153152
<2> Sorts highlighted fragments by score when set to `score`. By default, fragments will be output in the order they appear in the field (order: none).
154153

154+
Highlighting is supported on fields other than semantic_text.
155+
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text,
156+
you can explicitly enforce the `semantic` highlighter in the query:
157+
158+
[source,console]
159+
------------------------------------------------------------
160+
PUT test-index
161+
{
162+
"query": {
163+
"match": {
164+
"my_field": "Which country is Paris in?"
165+
}
166+
},
167+
"highlight": {
168+
"fields": {
169+
"my_field": {
170+
"type": "semantic", <1>
171+
"number_of_fragments": 2,
172+
"order": "score"
173+
}
174+
}
175+
}
176+
}
177+
------------------------------------------------------------
178+
// TEST[skip:Requires inference endpoint]
179+
<1> Ensures that highlighting is applied exclusively to semantic_text fields.
180+
155181
[discrete]
156182
[[custom-indexing]]
157183
==== Customizing `semantic_text` indexing

docs/reference/search/search-your-data/highlighting.asciidoc

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ GET /_search
3737
// TEST[setup:my_index]
3838

3939
{es} supports three highlighters: `unified`, `plain`, and `fvh` (fast vector
40-
highlighter). You can specify the highlighter `type` you want to use
41-
for each field.
40+
highlighter) for `text` and `keyword` fields and the `semantic` highlighter for `semantic_text` fields.
41+
You can specify the highlighter `type` you want to use for each field or rely on the field type's default highlighter.
4242

4343
[discrete]
4444
[[unified-highlighter]]
@@ -48,7 +48,19 @@ highlighter breaks the text into sentences and uses the BM25 algorithm to score
4848
individual sentences as if they were documents in the corpus. It also supports
4949
accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. The `unified`
5050
highlighter can combine matches from multiple fields into one result (see
51-
`matched_fields`). This is the default highlighter.
51+
`matched_fields`).
52+
53+
This is the default highlighter for all `text` and `keyword` fields.
54+
55+
[discrete]
56+
[[semantic-highlighter]]
57+
==== Semantic Highlighter
58+
59+
The `semantic` highlighter is specifically designed for use with the <<semantic-text, `semantic_text`>> field.
60+
It identifies and extracts the most relevant fragments from the field based on semantic
61+
similarity between the query and each fragment.
62+
63+
By default, <<semantic-text, `semantic_text`>> fields use the semantic highlighter.
5264

5365
[discrete]
5466
[[plain-highlighter]]

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
import org.elasticsearch.index.query.SearchExecutionContext;
4242
import org.elasticsearch.search.DocValueFormat;
4343
import org.elasticsearch.search.fetch.subphase.FetchFieldsPhase;
44+
import org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter;
4445
import org.elasticsearch.search.lookup.SearchLookup;
4546

4647
import java.io.IOException;
@@ -217,6 +218,13 @@ public TimeSeriesParams.MetricType getMetricType() {
217218
return null;
218219
}
219220

221+
/**
222+
* Returns the default highlighter type to use when highlighting the field.
223+
*/
224+
public String getDefaultHighlighter() {
225+
return DefaultHighlighter.NAME;
226+
}
227+
220228
/** Generates a query that will only match documents that contain the given value.
221229
* The default implementation returns a {@link TermQuery} over the value bytes
222230
* @throws IllegalArgumentException if {@code value} cannot be converted to the expected data type or if the field is not searchable

server/src/main/java/org/elasticsearch/search/SearchModule.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -913,7 +913,7 @@ private static Map<String, Highlighter> setupHighlighters(Settings settings, Lis
913913
NamedRegistry<Highlighter> highlighters = new NamedRegistry<>("highlighter");
914914
highlighters.register("fvh", new FastVectorHighlighter(settings));
915915
highlighters.register("plain", new PlainHighlighter());
916-
highlighters.register("unified", new DefaultHighlighter());
916+
highlighters.register(DefaultHighlighter.NAME, new DefaultHighlighter());
917917
highlighters.extractAndRegister(plugins, SearchPlugin::getHighlighters);
918918

919919
return unmodifiableMap(highlighters.getRegistry());

server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/DefaultHighlighter.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@
5050

5151
public class DefaultHighlighter implements Highlighter {
5252

53+
public static final String NAME = "unified";
54+
5355
@Override
5456
public boolean canHighlight(MappedFieldType fieldType) {
5557
return true;

server/src/main/java/org/elasticsearch/search/fetch/subphase/highlight/HighlightPhase.java

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ public void process(HitContext hitContext) throws IOException {
6666
Map<String, Function<HitContext, FieldHighlightContext>> contextBuilders = fieldContext.builders;
6767
for (String field : contextBuilders.keySet()) {
6868
FieldHighlightContext fieldContext = contextBuilders.get(field).apply(hitContext);
69-
Highlighter highlighter = getHighlighter(fieldContext.field);
69+
Highlighter highlighter = getHighlighter(fieldContext.field, fieldContext.fieldType);
7070
HighlightField highlightField = highlighter.highlight(fieldContext);
7171
if (highlightField != null) {
7272
// Note that we make sure to use the original field name in the response. This is because the
@@ -80,10 +80,10 @@ public void process(HitContext hitContext) throws IOException {
8080
};
8181
}
8282

83-
private Highlighter getHighlighter(SearchHighlightContext.Field field) {
83+
private Highlighter getHighlighter(SearchHighlightContext.Field field, MappedFieldType fieldType) {
8484
String highlighterType = field.fieldOptions().highlighterType();
8585
if (highlighterType == null) {
86-
highlighterType = "unified";
86+
highlighterType = fieldType.getDefaultHighlighter();
8787
}
8888
Highlighter highlighter = highlighters.get(highlighterType);
8989
if (highlighter == null) {
@@ -103,15 +103,14 @@ private FieldContext contextBuilders(
103103
Map<String, Function<HitContext, FieldHighlightContext>> builders = new LinkedHashMap<>();
104104
StoredFieldsSpec storedFieldsSpec = StoredFieldsSpec.NO_REQUIREMENTS;
105105
for (SearchHighlightContext.Field field : highlightContext.fields()) {
106-
Highlighter highlighter = getHighlighter(field);
107-
108106
Collection<String> fieldNamesToHighlight = context.getSearchExecutionContext().getMatchingFieldNames(field.field());
109107

110108
boolean fieldNameContainsWildcards = field.field().contains("*");
111109
Set<String> storedFields = new HashSet<>();
112110
boolean sourceRequired = false;
113111
for (String fieldName : fieldNamesToHighlight) {
114112
MappedFieldType fieldType = context.getSearchExecutionContext().getFieldType(fieldName);
113+
Highlighter highlighter = getHighlighter(field, fieldType);
115114

116115
// We should prevent highlighting if a field is anything but a text, match_only_text,
117116
// or keyword field.

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
public class InferenceFeatures implements FeatureSpecification {
2626

2727
private static final NodeFeature SEMANTIC_TEXT_HIGHLIGHTER = new NodeFeature("semantic_text.highlighter");
28+
private static final NodeFeature SEMANTIC_TEXT_HIGHLIGHTER_DEFAULT = new NodeFeature("semantic_text.highlighter.default");
2829

2930
@Override
3031
public Set<NodeFeature> getTestFeatures() {
@@ -40,7 +41,8 @@ public Set<NodeFeature> getTestFeatures() {
4041
SemanticInferenceMetadataFieldsMapper.EXPLICIT_NULL_FIXES,
4142
SEMANTIC_KNN_VECTOR_QUERY_REWRITE_INTERCEPTION_SUPPORTED,
4243
TextSimilarityRankRetrieverBuilder.TEXT_SIMILARITY_RERANKER_ALIAS_HANDLING_FIX,
43-
SemanticInferenceMetadataFieldsMapper.INFERENCE_METADATA_FIELDS_ENABLED_BY_DEFAULT
44+
SemanticInferenceMetadataFieldsMapper.INFERENCE_METADATA_FIELDS_ENABLED_BY_DEFAULT,
45+
SEMANTIC_TEXT_HIGHLIGHTER_DEFAULT
4446
);
4547
}
4648
}

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@
7373
import org.elasticsearch.xpack.core.ml.inference.results.MlTextEmbeddingResults;
7474
import org.elasticsearch.xpack.core.ml.inference.results.TextExpansionResults;
7575
import org.elasticsearch.xpack.core.ml.search.SparseVectorQueryBuilder;
76+
import org.elasticsearch.xpack.inference.highlight.SemanticTextHighlighter;
7677

7778
import java.io.IOException;
7879
import java.io.UncheckedIOException;
@@ -580,6 +581,11 @@ public String familyTypeName() {
580581
return TextFieldMapper.CONTENT_TYPE;
581582
}
582583

584+
@Override
585+
public String getDefaultHighlighter() {
586+
return SemanticTextHighlighter.NAME;
587+
}
588+
583589
public String getInferenceId() {
584590
return inferenceId;
585591
}

x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml

Lines changed: 60 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -55,22 +55,32 @@ setup:
5555
index.mapping.semantic_text.use_legacy_format: false
5656
mappings:
5757
properties:
58+
title:
59+
type: text
5860
body:
5961
type: semantic_text
6062
inference_id: dense-inference-id
6163

62-
---
63-
"Highlighting using a sparse embedding model":
6464
- do:
6565
index:
6666
index: test-sparse-index
6767
id: doc_1
6868
body:
69+
title: "Elasticsearch"
6970
body: ["ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!"]
7071
refresh: true
7172

72-
- match: { result: created }
73+
- do:
74+
index:
75+
index: test-dense-index
76+
id: doc_1
77+
body:
78+
title: "Elasticsearch"
79+
body: [ "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!" ]
80+
refresh: true
7381

82+
---
83+
"Highlighting using a sparse embedding model":
7484
- do:
7585
search:
7686
index: test-sparse-index
@@ -153,16 +163,6 @@ setup:
153163

154164
---
155165
"Highlighting using a dense embedding model":
156-
- do:
157-
index:
158-
index: test-dense-index
159-
id: doc_1
160-
body:
161-
body: ["ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides.", "You Know, for Search!"]
162-
refresh: true
163-
164-
- match: { result: created }
165-
166166
- do:
167167
search:
168168
index: test-dense-index
@@ -243,4 +243,51 @@ setup:
243243
- match: { hits.hits.0.highlight.body.0: "You Know, for Search!" }
244244
- match: { hits.hits.0.highlight.body.1: "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides." }
245245

246+
---
247+
"Default highlighter for fields":
248+
- requires:
249+
cluster_features: "semantic_text.highlighter.default"
250+
reason: semantic text field defaults to the semantic highlighter
251+
252+
- do:
253+
search:
254+
index: test-dense-index
255+
body:
256+
query:
257+
match:
258+
body: "What is Elasticsearch?"
259+
highlight:
260+
fields:
261+
body:
262+
order: "score"
263+
number_of_fragments: 2
264+
265+
- match: { hits.total.value: 1 }
266+
- match: { hits.hits.0._id: "doc_1" }
267+
- length: { hits.hits.0.highlight.body: 2 }
268+
- match: { hits.hits.0.highlight.body.0: "You Know, for Search!" }
269+
- match: { hits.hits.0.highlight.body.1: "ElasticSearch is an open source, distributed, RESTful, search engine which is built on top of Lucene internally and enjoys all the features it provides." }
270+
271+
---
272+
"semantic highlighter ignores non-inference fields":
273+
- requires:
274+
cluster_features: "semantic_text.highlighter.default"
275+
reason: semantic text field defaults to the semantic highlighter
276+
277+
- do:
278+
search:
279+
index: test-dense-index
280+
body:
281+
query:
282+
match:
283+
title: "Elasticsearch"
284+
highlight:
285+
fields:
286+
title:
287+
type: semantic
288+
number_of_fragments: 2
289+
290+
- match: { hits.total.value: 1 }
291+
- match: { hits.hits.0._id: "doc_1" }
292+
- not_exists: hits.hits.0.highlight.title
246293

0 commit comments

Comments
 (0)