Skip to content

Commit 1db194d

Browse files
authored
Add Multi-Field Support for Semantic Text Fields (#120128)
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally. This enhancement aligns with the semantic text field's current behavior as a standard text field. Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
1 parent 0488b03 commit 1db194d

File tree

10 files changed

+347
-135
lines changed

10 files changed

+347
-135
lines changed

docs/changelog/120128.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 120128
2+
summary: Add Multi-Field Support for Semantic Text Fields
3+
area: Relevance
4+
type: feature
5+
issues: []

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -182,16 +182,11 @@ Even if the script targets non-`semantic_text` fields, the update will fail when
182182

183183
[discrete]
184184
[[copy-to-support]]
185-
==== `copy_to` support
185+
==== `copy_to` and multi-fields support
186186

187-
The `semantic_text` field type can be the target of
188-
<<copy-to,`copy_to` fields>>. This means you can use a single `semantic_text`
189-
field to collect the values of other fields for semantic search. Each value has
190-
its embeddings calculated separately; each field value is a separate set of chunk(s) in
191-
the resulting embeddings.
192-
193-
This imposes a restriction on bulk requests and ingestion pipelines that update documents with `semantic_text` fields.
194-
In these cases, all fields that are copied to a `semantic_text` field, including the `semantic_text` field value, must have a value to ensure every embedding is calculated correctly.
187+
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
188+
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
189+
This means you can use a single field to collect the values of other fields for semantic search.
195190

196191
For example, the following mapping:
197192

@@ -201,39 +196,48 @@ PUT test-index
201196
{
202197
"mappings": {
203198
"properties": {
204-
"infer_field": {
205-
"type": "semantic_text",
206-
"inference_id": ".elser-2-elasticsearch"
207-
},
208199
"source_field": {
209200
"type": "text",
210201
"copy_to": "infer_field"
202+
},
203+
"infer_field": {
204+
"type": "semantic_text",
205+
"inference_id": ".elser-2-elasticsearch"
211206
}
212207
}
213208
}
214209
}
215210
------------------------------------------------------------
216211
// TEST[skip:TBD]
217212

218-
Will need the following bulk update request to ensure that `infer_field` is updated correctly:
213+
can also be declared as multi-fields:
219214

220215
[source,console]
221216
------------------------------------------------------------
222-
PUT test-index/_bulk
223-
{"update": {"_id": "1"}}
224-
{"doc": {"infer_field": "updated inference field", "source_field": "updated source field"}}
217+
PUT test-index
218+
{
219+
"mappings": {
220+
"properties": {
221+
"source_field": {
222+
"type": "text",
223+
"fields": {
224+
"infer_field": {
225+
"type": "semantic_text",
226+
"inference_id": ".elser-2-elasticsearch"
227+
}
228+
}
229+
}
230+
}
231+
}
232+
}
225233
------------------------------------------------------------
226234
// TEST[skip:TBD]
227235

228-
Notice that both the `semantic_text` field and the source field are updated in the bulk request.
229-
230-
231236
[discrete]
232237
[[limitations]]
233238
==== Limitations
234239

235240
`semantic_text` field types have the following limitations:
236241

237242
* `semantic_text` fields are not currently supported as elements of <<nested,nested fields>>.
238-
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
239-
* `semantic_text` fields can't be defined as <<multi-fields,multi-fields>> of another field, nor can they contain other fields as multi-fields.
243+
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.

server/src/main/java/org/elasticsearch/index/mapper/ContentPath.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@ private void expand() {
4343
path = newPath;
4444
}
4545

46-
public void remove() {
47-
path[--index] = null;
46+
public String remove() {
47+
var ret = path[--index];
48+
path[index] = null;
49+
return ret;
4850
}
4951

5052
public void setWithinLeafObject(boolean withinLeafObject) {

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1386,6 +1386,11 @@ public Builder init(FieldMapper initializer) {
13861386
return this;
13871387
}
13881388

1389+
public Builder addMultiField(FieldMapper.Builder builder) {
1390+
this.multiFieldsBuilder.add(builder);
1391+
return this;
1392+
}
1393+
13891394
protected BuilderParams builderParams(Mapper.Builder mainFieldBuilder, MapperBuilderContext context) {
13901395
return new BuilderParams(multiFieldsBuilder.build(mainFieldBuilder, context), copyTo, sourceKeepMode, hasScript, onScriptError);
13911396
}

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticInferenceMetadataFieldsMapper.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -143,13 +143,7 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
143143
// directly. We can safely split on all "." chars because semantic text fields cannot be used when subobjects == false.
144144
String[] fieldNameParts = fieldName.split("\\.");
145145
setPath(context.path(), fieldNameParts);
146-
147-
var parent = context.parent().findParentMapper(fieldName);
148-
if (parent == null) {
149-
throw new IllegalArgumentException("Field [" + fieldName + "] does not have a parent mapper");
150-
}
151-
String suffix = parent != context.parent() ? fieldName.substring(parent.fullPath().length() + 1) : fieldName;
152-
var mapper = parent.getMapper(suffix);
146+
var mapper = context.mappingLookup().getMapper(fieldName);
153147
if (mapper instanceof SemanticTextFieldMapper fieldMapper) {
154148
XContentLocation xContentLocation = context.parser().getTokenLocation();
155149
var input = fieldMapper.parseSemanticTextField(context);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java

Lines changed: 69 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
import org.elasticsearch.index.mapper.MapperBuilderContext;
4444
import org.elasticsearch.index.mapper.MapperMergeContext;
4545
import org.elasticsearch.index.mapper.MappingLookup;
46+
import org.elasticsearch.index.mapper.MappingParserContext;
4647
import org.elasticsearch.index.mapper.NestedObjectMapper;
4748
import org.elasticsearch.index.mapper.ObjectMapper;
4849
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
@@ -83,6 +84,7 @@
8384
import java.util.Objects;
8485
import java.util.Optional;
8586
import java.util.Set;
87+
import java.util.function.BiConsumer;
8688
import java.util.function.Function;
8789

8890
import static org.elasticsearch.search.SearchService.DEFAULT_SIZE;
@@ -117,12 +119,20 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie
117119

118120
public static final TypeParser PARSER = new TypeParser(
119121
(n, c) -> new Builder(n, c::bitSetProducer, c.getIndexSettings()),
120-
List.of(notInMultiFields(CONTENT_TYPE), notFromDynamicTemplates(CONTENT_TYPE))
122+
List.of(validateParserContext(CONTENT_TYPE))
121123
);
122124

125+
public static BiConsumer<String, MappingParserContext> validateParserContext(String type) {
126+
return (n, c) -> {
127+
if (InferenceMetadataFieldsMapper.isEnabled(c.getIndexSettings().getSettings()) == false) {
128+
notInMultiFields(type).accept(n, c);
129+
}
130+
notFromDynamicTemplates(type).accept(n, c);
131+
};
132+
}
133+
123134
public static class Builder extends FieldMapper.Builder {
124135
private final boolean useLegacyFormat;
125-
private final IndexVersion indexVersionCreated;
126136

127137
private final Parameter<String> inferenceId = Parameter.stringParam(
128138
INFERENCE_ID_FIELD,
@@ -176,7 +186,6 @@ public static Builder from(SemanticTextFieldMapper mapper) {
176186

177187
public Builder(String name, Function<Query, BitSetProducer> bitSetProducer, IndexSettings indexSettings) {
178188
super(name);
179-
this.indexVersionCreated = indexSettings.getIndexVersionCreated();
180189
this.useLegacyFormat = InferenceMetadataFieldsMapper.isEnabled(indexSettings.getSettings()) == false;
181190
this.inferenceFieldBuilder = c -> createInferenceField(
182191
c,
@@ -223,10 +232,10 @@ protected void merge(FieldMapper mergeWith, Conflicts conflicts, MapperMergeCont
223232

224233
@Override
225234
public SemanticTextFieldMapper build(MapperBuilderContext context) {
226-
if (copyTo.copyToFields().isEmpty() == false) {
235+
if (useLegacyFormat && copyTo.copyToFields().isEmpty() == false) {
227236
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support [copy_to]");
228237
}
229-
if (multiFieldsBuilder.hasMultiFields()) {
238+
if (useLegacyFormat && multiFieldsBuilder.hasMultiFields()) {
230239
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support multi-fields");
231240
}
232241
final String fullName = context.buildFullName(leafName());
@@ -245,7 +254,6 @@ public SemanticTextFieldMapper build(MapperBuilderContext context) {
245254
searchInferenceId.getValue(),
246255
modelSettings.getValue(),
247256
inferenceField,
248-
indexVersionCreated,
249257
useLegacyFormat,
250258
meta.getValue()
251259
),
@@ -275,13 +283,33 @@ private SemanticTextFieldMapper copySettings(SemanticTextFieldMapper mapper, Map
275283

276284
private SemanticTextFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
277285
super(simpleName, mappedFieldType, builderParams);
286+
ensureMultiFields(builderParams.multiFields().iterator());
287+
}
288+
289+
private void ensureMultiFields(Iterator<FieldMapper> mappers) {
290+
while (mappers.hasNext()) {
291+
var mapper = mappers.next();
292+
if (mapper.leafName().equals(INFERENCE_FIELD)) {
293+
throw new IllegalArgumentException(
294+
"Field ["
295+
+ mapper.fullPath()
296+
+ "] is already used by another field ["
297+
+ fullPath()
298+
+ "] internally. Please choose a different name."
299+
);
300+
}
301+
}
278302
}
279303

280304
@Override
281305
public Iterator<Mapper> iterator() {
282-
List<Mapper> subIterators = new ArrayList<>();
283-
subIterators.add(fieldType().getInferenceField());
284-
return subIterators.iterator();
306+
List<Mapper> mappers = new ArrayList<>();
307+
Iterator<Mapper> m = super.iterator();
308+
while (m.hasNext()) {
309+
mappers.add(m.next());
310+
}
311+
mappers.add(fieldType().getInferenceField());
312+
return mappers.iterator();
285313
}
286314

287315
@Override
@@ -350,20 +378,7 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
350378

351379
final SemanticTextFieldMapper mapper;
352380
if (fieldType().getModelSettings() == null) {
353-
context.path().remove();
354-
Builder builder = (Builder) new Builder(
355-
leafName(),
356-
fieldType().getChunksField().bitsetProducer(),
357-
fieldType().getChunksField().indexSettings()
358-
).init(this);
359-
try {
360-
mapper = builder.setModelSettings(field.inference().modelSettings())
361-
.setInferenceId(field.inference().inferenceId())
362-
.build(context.createDynamicMapperBuilderContext());
363-
context.addDynamicMapper(mapper);
364-
} finally {
365-
context.path().add(leafName());
366-
}
381+
mapper = addDynamicUpdate(context, field);
367382
} else {
368383
Conflicts conflicts = new Conflicts(fullFieldName);
369384
canMergeModelSettings(fieldType().getModelSettings(), field.inference().modelSettings(), conflicts);
@@ -438,6 +453,32 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
438453
}
439454
}
440455

456+
private SemanticTextFieldMapper addDynamicUpdate(DocumentParserContext context, SemanticTextField field) {
457+
Builder builder = (Builder) getMergeBuilder();
458+
context.path().remove();
459+
try {
460+
builder.setModelSettings(field.inference().modelSettings()).setInferenceId(field.inference().inferenceId());
461+
if (context.mappingLookup().isMultiField(fullPath())) {
462+
// The field is part of a multi-field, so the parent field must also be updated accordingly.
463+
var fieldName = context.path().remove();
464+
try {
465+
var parentMapper = ((FieldMapper) context.mappingLookup().getMapper(context.mappingLookup().parentField(fullPath())))
466+
.getMergeBuilder();
467+
context.addDynamicMapper(parentMapper.addMultiField(builder).build(context.createDynamicMapperBuilderContext()));
468+
return builder.build(context.createDynamicMapperBuilderContext());
469+
} finally {
470+
context.path().add(fieldName);
471+
}
472+
} else {
473+
var mapper = builder.build(context.createDynamicMapperBuilderContext());
474+
context.addDynamicMapper(mapper);
475+
return mapper;
476+
}
477+
} finally {
478+
context.path().add(leafName());
479+
}
480+
}
481+
441482
@Override
442483
protected String contentType() {
443484
return CONTENT_TYPE;
@@ -458,11 +499,14 @@ public InferenceFieldMetadata getMetadata(Set<String> sourcePaths) {
458499

459500
@Override
460501
protected void doValidate(MappingLookup mappers) {
461-
int parentPathIndex = fullPath().lastIndexOf(leafName());
502+
String fullPath = mappers.isMultiField(fullPath()) ? mappers.parentField(fullPath()) : fullPath();
503+
String leafName = mappers.getMapper(fullPath).leafName();
504+
int parentPathIndex = fullPath.lastIndexOf(leafName);
462505
if (parentPathIndex > 0) {
506+
String parentName = fullPath.substring(0, parentPathIndex - 1);
463507
// Check that the parent object field allows subobjects.
464508
// Subtract one from the parent path index to omit the trailing dot delimiter.
465-
ObjectMapper parentMapper = mappers.objectMappers().get(fullPath().substring(0, parentPathIndex - 1));
509+
ObjectMapper parentMapper = mappers.objectMappers().get(parentName);
466510
if (parentMapper == null) {
467511
throw new IllegalStateException(CONTENT_TYPE + " field [" + fullPath() + "] does not have a parent object mapper");
468512
}
@@ -480,7 +524,6 @@ public static class SemanticTextFieldType extends SimpleMappedFieldType {
480524
private final String searchInferenceId;
481525
private final SemanticTextField.ModelSettings modelSettings;
482526
private final ObjectMapper inferenceField;
483-
private final IndexVersion indexVersionCreated;
484527
private final boolean useLegacyFormat;
485528

486529
public SemanticTextFieldType(
@@ -489,7 +532,6 @@ public SemanticTextFieldType(
489532
String searchInferenceId,
490533
SemanticTextField.ModelSettings modelSettings,
491534
ObjectMapper inferenceField,
492-
IndexVersion indexVersionCreated,
493535
boolean useLegacyFormat,
494536
Map<String, String> meta
495537
) {
@@ -498,7 +540,6 @@ public SemanticTextFieldType(
498540
this.searchInferenceId = searchInferenceId;
499541
this.modelSettings = modelSettings;
500542
this.inferenceField = inferenceField;
501-
this.indexVersionCreated = indexVersionCreated;
502543
this.useLegacyFormat = useLegacyFormat;
503544
}
504545

0 commit comments

Comments
 (0)