Skip to content

Commit 17faa89

Browse files
jimcziMikep86
andauthored
[8.18] Refactor semantic text field to align with text field behaviour (elastic#119339)
* Refactor semantic text field to align with text field behaviour (elastic#119183) Co-authored-by: Mike Pellegrini <[email protected]> * fix compil after backport * fix compil after backport (bis) --------- Co-authored-by: Mike Pellegrini <[email protected]>
1 parent bba5e69 commit 17faa89

File tree

55 files changed

+9129
-1555
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+9129
-1555
lines changed

docs/reference/query-dsl/semantic-query.asciidoc

Lines changed: 0 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -117,79 +117,3 @@ GET my-index/_search
117117
}
118118
------------------------------------------------------------
119119
// TEST[skip: Requires inference endpoints]
120-
121-
122-
[discrete]
123-
[[advanced-search]]
124-
==== Advanced search on `semantic_text` fields
125-
126-
The `semantic` query uses default settings for searching on `semantic_text` fields for ease of use.
127-
If you want to fine-tune a search on a `semantic_text` field, you need to know the task type used by the `inference_id` configured in `semantic_text`.
128-
You can find the task type using the <<get-inference-api>>, and check the `task_type` associated with the {infer} service.
129-
Depending on the `task_type`, use either the <<query-dsl-sparse-vector-query,`sparse_vector`>> or the <<query-dsl-knn-query,`knn`>> query for greater flexibility and customization.
130-
131-
NOTE: While it is possible to use the `sparse_vector` query or the `knn` query
132-
on a `semantic_text` field, it is not supported to use the `semantic_query` on a
133-
`sparse_vector` or `dense_vector` field type.
134-
135-
136-
[discrete]
137-
[[search-sparse-inference]]
138-
===== Search with `sparse_embedding` inference
139-
140-
When the {infer} endpoint uses a `sparse_embedding` model, you can use a <<query-dsl-sparse-vector-query,`sparse_vector` query>> on a <<semantic-text,`semantic_text`>> field in the following way:
141-
142-
[source,console]
143-
------------------------------------------------------------
144-
GET test-index/_search
145-
{
146-
"query": {
147-
"nested": {
148-
"path": "inference_field.inference.chunks",
149-
"query": {
150-
"sparse_vector": {
151-
"field": "inference_field.inference.chunks.embeddings",
152-
"inference_id": "my-inference-id",
153-
"query": "mountain lake"
154-
}
155-
}
156-
}
157-
}
158-
}
159-
------------------------------------------------------------
160-
// TEST[skip: Requires inference endpoints]
161-
162-
You can customize the `sparse_vector` query to include specific settings, like <<sparse-vector-query-with-pruning-config-and-rescore-example,pruning configuration>>.
163-
164-
165-
[discrete]
166-
[[search-text-inferece]]
167-
===== Search with `text_embedding` inference
168-
169-
When the {infer} endpoint uses a `text_embedding` model, you can use a <<query-dsl-knn-query,`knn` query>> on a `semantic_text` field in the following way:
170-
171-
[source,console]
172-
------------------------------------------------------------
173-
GET test-index/_search
174-
{
175-
"query": {
176-
"nested": {
177-
"path": "inference_field.inference.chunks",
178-
"query": {
179-
"knn": {
180-
"field": "inference_field.inference.chunks.embeddings",
181-
"query_vector_builder": {
182-
"text_embedding": {
183-
"model_id": "my_inference_id",
184-
"model_text": "mountain lake"
185-
}
186-
}
187-
}
188-
}
189-
}
190-
}
191-
}
192-
------------------------------------------------------------
193-
// TEST[skip: Requires inference endpoints]
194-
195-
You can customize the `knn` query to include specific settings, like `num_candidates` and `k`.

docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc

Lines changed: 1 addition & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -151,89 +151,7 @@ GET semantic-embeddings/_search
151151
<2> The query text.
152152

153153
As a result, you receive the top 10 documents that are closest in meaning to the
154-
query from the `semantic-embedding` index:
155-
156-
[source,console-result]
157-
------------------------------------------------------------
158-
"hits": [
159-
{
160-
"_index": "semantic-embeddings",
161-
"_id": "Jy5065EBBFPLbFsdh_f9",
162-
"_score": 21.487484,
163-
"_source": {
164-
"id": 8836652,
165-
"content": {
166-
"text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement.",
167-
"inference": {
168-
"inference_id": "my-elser-endpoint",
169-
"model_settings": {
170-
"task_type": "sparse_embedding"
171-
},
172-
"chunks": [
173-
{
174-
"text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement.",
175-
"embeddings": {
176-
(...)
177-
}
178-
}
179-
]
180-
}
181-
}
182-
}
183-
},
184-
{
185-
"_index": "semantic-embeddings",
186-
"_id": "Ji5065EBBFPLbFsdh_f9",
187-
"_score": 18.211695,
188-
"_source": {
189-
"id": 8836651,
190-
"content": {
191-
"text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum.",
192-
"inference": {
193-
"inference_id": "my-elser-endpoint",
194-
"model_settings": {
195-
"task_type": "sparse_embedding"
196-
},
197-
"chunks": [
198-
{
199-
"text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum.",
200-
"embeddings": {
201-
(...)
202-
}
203-
}
204-
]
205-
}
206-
}
207-
}
208-
},
209-
{
210-
"_index": "semantic-embeddings",
211-
"_id": "Wi5065EBBFPLbFsdh_b9",
212-
"_score": 13.089405,
213-
"_source": {
214-
"id": 8800197,
215-
"content": {
216-
"text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore.",
217-
"inference": {
218-
"inference_id": "my-elser-endpoint",
219-
"model_settings": {
220-
"task_type": "sparse_embedding"
221-
},
222-
"chunks": [
223-
{
224-
"text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore.",
225-
"embeddings": {
226-
(...)
227-
}
228-
}
229-
]
230-
}
231-
}
232-
}
233-
}
234-
]
235-
------------------------------------------------------------
236-
// NOTCONSOLE
154+
query from the `semantic-embedding` index.
237155

238156
[discrete]
239157
[[semantic-text-further-examples]]

muted-tests.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,9 @@ tests:
302302
- class: org.elasticsearch.xpack.inference.InferenceRestIT
303303
method: test {p0=inference/30_semantic_text_inference/Calculates embeddings using the default ELSER 2 endpoint}
304304
issue: https://github.com/elastic/elasticsearch/issues/117349
305+
- class: org.elasticsearch.xpack.inference.InferenceRestIT
306+
method: test {p0=inference/30_semantic_text_inference_bwc/Calculates embeddings using the default ELSER 2 endpoint}
307+
issue: https://github.com/elastic/elasticsearch/issues/117349
305308
- class: org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT
306309
method: testSearchWithRandomDisconnects
307310
issue: https://github.com/elastic/elasticsearch/issues/116175

server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,11 @@
4848
import org.elasticsearch.index.engine.VersionConflictEngineException;
4949
import org.elasticsearch.index.get.GetResult;
5050
import org.elasticsearch.index.mapper.DocumentMapper;
51+
import org.elasticsearch.index.mapper.InferenceMetadataFieldsMapper;
5152
import org.elasticsearch.index.mapper.MapperException;
5253
import org.elasticsearch.index.mapper.MapperService;
5354
import org.elasticsearch.index.mapper.MappingLookup;
55+
import org.elasticsearch.index.mapper.RoutingFieldMapper;
5456
import org.elasticsearch.index.mapper.SourceToParse;
5557
import org.elasticsearch.index.seqno.SequenceNumbers;
5658
import org.elasticsearch.index.shard.IndexShard;
@@ -326,7 +328,8 @@ static boolean executeBulkItemRequest(
326328
if (opType == DocWriteRequest.OpType.UPDATE) {
327329
final UpdateRequest updateRequest = (UpdateRequest) context.getCurrent();
328330
try {
329-
updateResult = updateHelper.prepare(updateRequest, context.getPrimary(), nowInMillisSupplier);
331+
var gFields = getStoredFieldsSpec(context.getPrimary());
332+
updateResult = updateHelper.prepare(updateRequest, context.getPrimary(), nowInMillisSupplier, gFields);
330333
} catch (Exception failure) {
331334
// we may fail translating a update to index or delete operation
332335
// we use index result to communicate failure while translating update request
@@ -401,6 +404,16 @@ static boolean executeBulkItemRequest(
401404
return true;
402405
}
403406

407+
private static String[] getStoredFieldsSpec(IndexShard indexShard) {
408+
if (InferenceMetadataFieldsMapper.isEnabled(indexShard.mapperService().mappingLookup())) {
409+
if (indexShard.mapperService().mappingLookup().inferenceFields().size() > 0) {
410+
// Retrieves the inference metadata field containing the inference results for all semantic fields defined in the mapping.
411+
return new String[] { RoutingFieldMapper.NAME, InferenceMetadataFieldsMapper.NAME };
412+
}
413+
}
414+
return new String[] { RoutingFieldMapper.NAME };
415+
}
416+
404417
private static boolean handleMappingUpdateRequired(
405418
BulkPrimaryExecutionContext context,
406419
MappingUpdatePerformer mappingUpdater,

server/src/main/java/org/elasticsearch/action/update/TransportUpdateAction.java

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
import org.elasticsearch.index.IndexService;
4545
import org.elasticsearch.index.engine.VersionConflictEngineException;
4646
import org.elasticsearch.index.mapper.InferenceFieldMapper;
47+
import org.elasticsearch.index.mapper.InferenceMetadataFieldsMapper;
4748
import org.elasticsearch.index.mapper.Mapper;
4849
import org.elasticsearch.index.mapper.MappingLookup;
4950
import org.elasticsearch.index.shard.IndexShard;
@@ -374,7 +375,7 @@ private static UpdateHelper.Result deleteInferenceResults(
374375
IndexMetadata indexMetadata,
375376
MappingLookup mappingLookup
376377
) {
377-
if (result.getResponseResult() != DocWriteResponse.Result.UPDATED) {
378+
if (result.getResponseResult() != DocWriteResponse.Result.UPDATED || InferenceMetadataFieldsMapper.isEnabled(mappingLookup)) {
378379
return result;
379380
}
380381

@@ -403,7 +404,7 @@ private static UpdateHelper.Result deleteInferenceResults(
403404
String inferenceFieldName = entry.getKey();
404405
Mapper mapper = mappingLookup.getMapper(inferenceFieldName);
405406

406-
if (mapper instanceof InferenceFieldMapper inferenceFieldMapper) {
407+
if (mapper instanceof InferenceFieldMapper) {
407408
String[] sourceFields = entry.getValue().getSourceFields();
408409
for (String sourceField : sourceFields) {
409410
if (sourceField.equals(inferenceFieldName) == false
@@ -412,7 +413,7 @@ private static UpdateHelper.Result deleteInferenceResults(
412413
// This has two important side effects:
413414
// - The inference field value will remain parsable by its mapper
414415
// - The inference results will be removed, forcing them to be re-generated downstream
415-
updatedSource.put(inferenceFieldName, inferenceFieldMapper.getOriginalValue(updatedSource));
416+
updatedSource.put(inferenceFieldName, getOriginalValueLegacy(inferenceFieldName, updatedSource));
416417
updatedSourceModified = true;
417418
break;
418419
}
@@ -435,4 +436,24 @@ private static UpdateHelper.Result deleteInferenceResults(
435436

436437
return returnedResult;
437438
}
439+
440+
/**
441+
* Get the field's original value (i.e. the value the user specified) from the provided source.
442+
*
443+
* @param sourceAsMap The source as a map
444+
* @return The field's original value, or {@code null} if none was provided
445+
*/
446+
private static Object getOriginalValueLegacy(String fullPath, Map<String, Object> sourceAsMap) {
447+
// TODO: Fix bug here when semantic text field is in an object
448+
Object fieldValue = sourceAsMap.get(fullPath);
449+
if (fieldValue == null) {
450+
return null;
451+
} else if (fieldValue instanceof Map<?, ?> == false) {
452+
// Don't try to further validate the non-map value, that will be handled when the source is fully parsed
453+
return fieldValue;
454+
}
455+
456+
Map<String, Object> fieldValueMap = XContentMapValues.nodeMapValue(fieldValue, "Field [" + fullPath + "]");
457+
return XContentMapValues.extractValue("text", fieldValueMap);
458+
}
438459
}

server/src/main/java/org/elasticsearch/action/update/UpdateHelper.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,15 @@ public UpdateHelper(ScriptService scriptService, DocumentParsingProvider documen
6363
* Prepares an update request by converting it into an index or delete request or an update response (no action).
6464
*/
6565
public Result prepare(UpdateRequest request, IndexShard indexShard, LongSupplier nowInMillis) throws IOException {
66-
final GetResult getResult = indexShard.getService().getForUpdate(request.id(), request.ifSeqNo(), request.ifPrimaryTerm());
66+
// TODO: Don't hard-code gFields
67+
return prepare(request, indexShard, nowInMillis, new String[] { RoutingFieldMapper.NAME });
68+
}
69+
70+
/**
71+
* Prepares an update request by converting it into an index or delete request or an update response (no action).
72+
*/
73+
public Result prepare(UpdateRequest request, IndexShard indexShard, LongSupplier nowInMillis, String[] gFields) throws IOException {
74+
final GetResult getResult = indexShard.getService().getForUpdate(request.id(), request.ifSeqNo(), request.ifPrimaryTerm(), gFields);
6775
return prepare(indexShard, request, getResult, nowInMillis);
6876
}
6977

server/src/main/java/org/elasticsearch/common/settings/IndexScopedSettings.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
import org.elasticsearch.index.fielddata.IndexFieldDataService;
3535
import org.elasticsearch.index.mapper.FieldMapper;
3636
import org.elasticsearch.index.mapper.IgnoredSourceFieldMapper;
37+
import org.elasticsearch.index.mapper.InferenceMetadataFieldsMapper;
3738
import org.elasticsearch.index.mapper.MapperService;
3839
import org.elasticsearch.index.mapper.SourceFieldMapper;
3940
import org.elasticsearch.index.similarity.SimilarityService;
@@ -191,6 +192,7 @@ public final class IndexScopedSettings extends AbstractScopedSettings {
191192
IgnoredSourceFieldMapper.SKIP_IGNORED_SOURCE_READ_SETTING,
192193
SourceFieldMapper.INDEX_MAPPER_SOURCE_MODE_SETTING,
193194
IndexSettings.RECOVERY_USE_SYNTHETIC_SOURCE_SETTING,
195+
InferenceMetadataFieldsMapper.USE_LEGACY_SEMANTIC_TEXT_FORMAT,
194196

195197
// validate that built-in similarities don't get redefined
196198
Setting.groupSetting("index.similarity.", (s) -> {

server/src/main/java/org/elasticsearch/index/IndexVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ private static IndexVersion def(int id, Version luceneVersion) {
124124
public static final IndexVersion DEPRECATE_SOURCE_MODE_MAPPER = def(8_521_00_0, Version.LUCENE_9_12_0);
125125
public static final IndexVersion USE_SYNTHETIC_SOURCE_FOR_RECOVERY_BACKPORT = def(8_522_00_0, Version.LUCENE_9_12_0);
126126
public static final IndexVersion UPGRADE_TO_LUCENE_9_12_1 = def(8_523_00_0, Version.LUCENE_9_12_1);
127+
public static final IndexVersion INFERENCE_METADATA_FIELDS_BACKPORT = def(8_524_00_0, Version.LUCENE_9_12_1);
127128
/*
128129
* STOP! READ THIS FIRST! No, really,
129130
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _

server/src/main/java/org/elasticsearch/index/engine/TranslogDirectoryReader.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,7 +441,7 @@ private void readStoredFieldsDirectly(StoredFieldVisitor visitor) throws IOExcep
441441
SourceFieldMapper mapper = mappingLookup.getMapping().getMetadataMapperByClass(SourceFieldMapper.class);
442442
if (mapper != null) {
443443
try {
444-
sourceBytes = mapper.applyFilters(sourceBytes, null);
444+
sourceBytes = mapper.applyFilters(mappingLookup, sourceBytes, null);
445445
} catch (IOException e) {
446446
throw new IOException("Failed to reapply filters after reading from translog", e);
447447
}

0 commit comments

Comments
 (0)