Skip to content

Commit 3cc450a

Browse files
committed
Add Multi-Field Support for Semantic Text Fields (#120128)
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally. This enhancement aligns with the semantic text field's current behavior as a standard text field. Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
1 parent c5a623a commit 3cc450a

File tree

10 files changed

+347
-135
lines changed

10 files changed

+347
-135
lines changed

docs/changelog/120128.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 120128
2+
summary: Add Multi-Field Support for Semantic Text Fields
3+
area: Relevance
4+
type: feature
5+
issues: []

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -182,16 +182,11 @@ Even if the script targets non-`semantic_text` fields, the update will fail when
182182

183183
[discrete]
184184
[[copy-to-support]]
185-
==== `copy_to` support
185+
==== `copy_to` and multi-fields support
186186

187-
The `semantic_text` field type can be the target of
188-
<<copy-to,`copy_to` fields>>. This means you can use a single `semantic_text`
189-
field to collect the values of other fields for semantic search. Each value has
190-
its embeddings calculated separately; each field value is a separate set of chunk(s) in
191-
the resulting embeddings.
192-
193-
This imposes a restriction on bulk requests and ingestion pipelines that update documents with `semantic_text` fields.
194-
In these cases, all fields that are copied to a `semantic_text` field, including the `semantic_text` field value, must have a value to ensure every embedding is calculated correctly.
187+
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
188+
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
189+
This means you can use a single field to collect the values of other fields for semantic search.
195190

196191
For example, the following mapping:
197192

@@ -201,39 +196,48 @@ PUT test-index
201196
{
202197
"mappings": {
203198
"properties": {
204-
"infer_field": {
205-
"type": "semantic_text",
206-
"inference_id": ".elser-2-elasticsearch"
207-
},
208199
"source_field": {
209200
"type": "text",
210201
"copy_to": "infer_field"
202+
},
203+
"infer_field": {
204+
"type": "semantic_text",
205+
"inference_id": ".elser-2-elasticsearch"
211206
}
212207
}
213208
}
214209
}
215210
------------------------------------------------------------
216211
// TEST[skip:TBD]
217212

218-
Will need the following bulk update request to ensure that `infer_field` is updated correctly:
213+
can also be declared as multi-fields:
219214

220215
[source,console]
221216
------------------------------------------------------------
222-
PUT test-index/_bulk
223-
{"update": {"_id": "1"}}
224-
{"doc": {"infer_field": "updated inference field", "source_field": "updated source field"}}
217+
PUT test-index
218+
{
219+
"mappings": {
220+
"properties": {
221+
"source_field": {
222+
"type": "text",
223+
"fields": {
224+
"infer_field": {
225+
"type": "semantic_text",
226+
"inference_id": ".elser-2-elasticsearch"
227+
}
228+
}
229+
}
230+
}
231+
}
232+
}
225233
------------------------------------------------------------
226234
// TEST[skip:TBD]
227235

228-
Notice that both the `semantic_text` field and the source field are updated in the bulk request.
229-
230-
231236
[discrete]
232237
[[limitations]]
233238
==== Limitations
234239

235240
`semantic_text` field types have the following limitations:
236241

237242
* `semantic_text` fields are not currently supported as elements of <<nested,nested fields>>.
238-
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
239-
* `semantic_text` fields can't be defined as <<multi-fields,multi-fields>> of another field, nor can they contain other fields as multi-fields.
243+
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.

server/src/main/java/org/elasticsearch/index/mapper/ContentPath.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@ private void expand() {
4343
path = newPath;
4444
}
4545

46-
public void remove() {
47-
path[--index] = null;
46+
public String remove() {
47+
var ret = path[--index];
48+
path[index] = null;
49+
return ret;
4850
}
4951

5052
public void setWithinLeafObject(boolean withinLeafObject) {

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1386,6 +1386,11 @@ public Builder init(FieldMapper initializer) {
13861386
return this;
13871387
}
13881388

1389+
public Builder addMultiField(FieldMapper.Builder builder) {
1390+
this.multiFieldsBuilder.add(builder);
1391+
return this;
1392+
}
1393+
13891394
protected BuilderParams builderParams(Mapper.Builder mainFieldBuilder, MapperBuilderContext context) {
13901395
return new BuilderParams(multiFieldsBuilder.build(mainFieldBuilder, context), copyTo, sourceKeepMode, hasScript, onScriptError);
13911396
}

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticInferenceMetadataFieldsMapper.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -143,13 +143,7 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
143143
// directly. We can safely split on all "." chars because semantic text fields cannot be used when subobjects == false.
144144
String[] fieldNameParts = fieldName.split("\\.");
145145
setPath(context.path(), fieldNameParts);
146-
147-
var parent = context.parent().findParentMapper(fieldName);
148-
if (parent == null) {
149-
throw new IllegalArgumentException("Field [" + fieldName + "] does not have a parent mapper");
150-
}
151-
String suffix = parent != context.parent() ? fieldName.substring(parent.fullPath().length() + 1) : fieldName;
152-
var mapper = parent.getMapper(suffix);
146+
var mapper = context.mappingLookup().getMapper(fieldName);
153147
if (mapper instanceof SemanticTextFieldMapper fieldMapper) {
154148
XContentLocation xContentLocation = context.parser().getTokenLocation();
155149
var input = fieldMapper.parseSemanticTextField(context);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java

Lines changed: 69 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
import org.elasticsearch.index.mapper.MapperBuilderContext;
4444
import org.elasticsearch.index.mapper.MapperMergeContext;
4545
import org.elasticsearch.index.mapper.MappingLookup;
46+
import org.elasticsearch.index.mapper.MappingParserContext;
4647
import org.elasticsearch.index.mapper.NestedObjectMapper;
4748
import org.elasticsearch.index.mapper.ObjectMapper;
4849
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
@@ -83,6 +84,7 @@
8384
import java.util.Objects;
8485
import java.util.Optional;
8586
import java.util.Set;
87+
import java.util.function.BiConsumer;
8688
import java.util.function.Function;
8789

8890
import static org.elasticsearch.search.SearchService.DEFAULT_SIZE;
@@ -119,12 +121,20 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie
119121

120122
public static final TypeParser PARSER = new TypeParser(
121123
(n, c) -> new Builder(n, c::bitSetProducer, c.getIndexSettings()),
122-
List.of(notInMultiFields(CONTENT_TYPE), notFromDynamicTemplates(CONTENT_TYPE))
124+
List.of(validateParserContext(CONTENT_TYPE))
123125
);
124126

127+
public static BiConsumer<String, MappingParserContext> validateParserContext(String type) {
128+
return (n, c) -> {
129+
if (InferenceMetadataFieldsMapper.isEnabled(c.getIndexSettings().getSettings()) == false) {
130+
notInMultiFields(type).accept(n, c);
131+
}
132+
notFromDynamicTemplates(type).accept(n, c);
133+
};
134+
}
135+
125136
public static class Builder extends FieldMapper.Builder {
126137
private final boolean useLegacyFormat;
127-
private final IndexVersion indexVersionCreated;
128138

129139
private final Parameter<String> inferenceId = Parameter.stringParam(
130140
INFERENCE_ID_FIELD,
@@ -178,7 +188,6 @@ public static Builder from(SemanticTextFieldMapper mapper) {
178188

179189
public Builder(String name, Function<Query, BitSetProducer> bitSetProducer, IndexSettings indexSettings) {
180190
super(name);
181-
this.indexVersionCreated = indexSettings.getIndexVersionCreated();
182191
this.useLegacyFormat = InferenceMetadataFieldsMapper.isEnabled(indexSettings.getSettings()) == false;
183192
this.inferenceFieldBuilder = c -> createInferenceField(
184193
c,
@@ -225,10 +234,10 @@ protected void merge(FieldMapper mergeWith, Conflicts conflicts, MapperMergeCont
225234

226235
@Override
227236
public SemanticTextFieldMapper build(MapperBuilderContext context) {
228-
if (copyTo.copyToFields().isEmpty() == false) {
237+
if (useLegacyFormat && copyTo.copyToFields().isEmpty() == false) {
229238
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support [copy_to]");
230239
}
231-
if (multiFieldsBuilder.hasMultiFields()) {
240+
if (useLegacyFormat && multiFieldsBuilder.hasMultiFields()) {
232241
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support multi-fields");
233242
}
234243
final String fullName = context.buildFullName(leafName());
@@ -247,7 +256,6 @@ public SemanticTextFieldMapper build(MapperBuilderContext context) {
247256
searchInferenceId.getValue(),
248257
modelSettings.getValue(),
249258
inferenceField,
250-
indexVersionCreated,
251259
useLegacyFormat,
252260
meta.getValue()
253261
),
@@ -277,13 +285,33 @@ private SemanticTextFieldMapper copySettings(SemanticTextFieldMapper mapper, Map
277285

278286
private SemanticTextFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
279287
super(simpleName, mappedFieldType, builderParams);
288+
ensureMultiFields(builderParams.multiFields().iterator());
289+
}
290+
291+
private void ensureMultiFields(Iterator<FieldMapper> mappers) {
292+
while (mappers.hasNext()) {
293+
var mapper = mappers.next();
294+
if (mapper.leafName().equals(INFERENCE_FIELD)) {
295+
throw new IllegalArgumentException(
296+
"Field ["
297+
+ mapper.fullPath()
298+
+ "] is already used by another field ["
299+
+ fullPath()
300+
+ "] internally. Please choose a different name."
301+
);
302+
}
303+
}
280304
}
281305

282306
@Override
283307
public Iterator<Mapper> iterator() {
284-
List<Mapper> subIterators = new ArrayList<>();
285-
subIterators.add(fieldType().getInferenceField());
286-
return subIterators.iterator();
308+
List<Mapper> mappers = new ArrayList<>();
309+
Iterator<Mapper> m = super.iterator();
310+
while (m.hasNext()) {
311+
mappers.add(m.next());
312+
}
313+
mappers.add(fieldType().getInferenceField());
314+
return mappers.iterator();
287315
}
288316

289317
@Override
@@ -352,20 +380,7 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
352380

353381
final SemanticTextFieldMapper mapper;
354382
if (fieldType().getModelSettings() == null) {
355-
context.path().remove();
356-
Builder builder = (Builder) new Builder(
357-
leafName(),
358-
fieldType().getChunksField().bitsetProducer(),
359-
fieldType().getChunksField().indexSettings()
360-
).init(this);
361-
try {
362-
mapper = builder.setModelSettings(field.inference().modelSettings())
363-
.setInferenceId(field.inference().inferenceId())
364-
.build(context.createDynamicMapperBuilderContext());
365-
context.addDynamicMapper(mapper);
366-
} finally {
367-
context.path().add(leafName());
368-
}
383+
mapper = addDynamicUpdate(context, field);
369384
} else {
370385
Conflicts conflicts = new Conflicts(fullFieldName);
371386
canMergeModelSettings(fieldType().getModelSettings(), field.inference().modelSettings(), conflicts);
@@ -440,6 +455,32 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
440455
}
441456
}
442457

458+
private SemanticTextFieldMapper addDynamicUpdate(DocumentParserContext context, SemanticTextField field) {
459+
Builder builder = (Builder) getMergeBuilder();
460+
context.path().remove();
461+
try {
462+
builder.setModelSettings(field.inference().modelSettings()).setInferenceId(field.inference().inferenceId());
463+
if (context.mappingLookup().isMultiField(fullPath())) {
464+
// The field is part of a multi-field, so the parent field must also be updated accordingly.
465+
var fieldName = context.path().remove();
466+
try {
467+
var parentMapper = ((FieldMapper) context.mappingLookup().getMapper(context.mappingLookup().parentField(fullPath())))
468+
.getMergeBuilder();
469+
context.addDynamicMapper(parentMapper.addMultiField(builder).build(context.createDynamicMapperBuilderContext()));
470+
return builder.build(context.createDynamicMapperBuilderContext());
471+
} finally {
472+
context.path().add(fieldName);
473+
}
474+
} else {
475+
var mapper = builder.build(context.createDynamicMapperBuilderContext());
476+
context.addDynamicMapper(mapper);
477+
return mapper;
478+
}
479+
} finally {
480+
context.path().add(leafName());
481+
}
482+
}
483+
443484
@Override
444485
protected String contentType() {
445486
return CONTENT_TYPE;
@@ -460,11 +501,14 @@ public InferenceFieldMetadata getMetadata(Set<String> sourcePaths) {
460501

461502
@Override
462503
protected void doValidate(MappingLookup mappers) {
463-
int parentPathIndex = fullPath().lastIndexOf(leafName());
504+
String fullPath = mappers.isMultiField(fullPath()) ? mappers.parentField(fullPath()) : fullPath();
505+
String leafName = mappers.getMapper(fullPath).leafName();
506+
int parentPathIndex = fullPath.lastIndexOf(leafName);
464507
if (parentPathIndex > 0) {
508+
String parentName = fullPath.substring(0, parentPathIndex - 1);
465509
// Check that the parent object field allows subobjects.
466510
// Subtract one from the parent path index to omit the trailing dot delimiter.
467-
ObjectMapper parentMapper = mappers.objectMappers().get(fullPath().substring(0, parentPathIndex - 1));
511+
ObjectMapper parentMapper = mappers.objectMappers().get(parentName);
468512
if (parentMapper == null) {
469513
throw new IllegalStateException(CONTENT_TYPE + " field [" + fullPath() + "] does not have a parent object mapper");
470514
}
@@ -482,7 +526,6 @@ public static class SemanticTextFieldType extends SimpleMappedFieldType {
482526
private final String searchInferenceId;
483527
private final SemanticTextField.ModelSettings modelSettings;
484528
private final ObjectMapper inferenceField;
485-
private final IndexVersion indexVersionCreated;
486529
private final boolean useLegacyFormat;
487530

488531
public SemanticTextFieldType(
@@ -491,7 +534,6 @@ public SemanticTextFieldType(
491534
String searchInferenceId,
492535
SemanticTextField.ModelSettings modelSettings,
493536
ObjectMapper inferenceField,
494-
IndexVersion indexVersionCreated,
495537
boolean useLegacyFormat,
496538
Map<String, String> meta
497539
) {
@@ -500,7 +542,6 @@ public SemanticTextFieldType(
500542
this.searchInferenceId = searchInferenceId;
501543
this.modelSettings = modelSettings;
502544
this.inferenceField = inferenceField;
503-
this.indexVersionCreated = indexVersionCreated;
504545
this.useLegacyFormat = useLegacyFormat;
505546
}
506547

0 commit comments

Comments
 (0)