Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/120128.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 120128
summary: Add Multi-Field Support for Semantic Text Fields
area: Relevance
type: feature
issues: []
48 changes: 26 additions & 22 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -182,16 +182,11 @@ Even if the script targets non-`semantic_text` fields, the update will fail when

[discrete]
[[copy-to-support]]
==== `copy_to` support
==== `copy_to` and multi-fields support

The `semantic_text` field type can be the target of
<<copy-to,`copy_to` fields>>. This means you can use a single `semantic_text`
field to collect the values of other fields for semantic search. Each value has
its embeddings calculated separately; each field value is a separate set of chunk(s) in
the resulting embeddings.

This imposes a restriction on bulk requests and ingestion pipelines that update documents with `semantic_text` fields.
In these cases, all fields that are copied to a `semantic_text` field, including the `semantic_text` field value, must have a value to ensure every embedding is calculated correctly.
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
This means you can use a single field to collect the values of other fields for semantic search.
Comment on lines +187 to +189
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the docs to indicate that semantic_text can now be a copy_to source?


For example, the following mapping:

Expand All @@ -201,39 +196,48 @@ PUT test-index
{
"mappings": {
"properties": {
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
},
"source_field": {
"type": "text",
"copy_to": "infer_field"
},
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]

Will need the following bulk update request to ensure that `infer_field` is updated correctly:
can also be declared as multi-fields:

[source,console]
------------------------------------------------------------
PUT test-index/_bulk
{"update": {"_id": "1"}}
{"doc": {"infer_field": "updated inference field", "source_field": "updated source field"}}
PUT test-index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

{
"mappings": {
"properties": {
"source_field": {
"type": "text",
"fields": {
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]

Notice that both the `semantic_text` field and the source field are updated in the bulk request.


[discrete]
[[limitations]]
==== Limitations

`semantic_text` field types have the following limitations:

* `semantic_text` fields are not currently supported as elements of <<nested,nested fields>>.
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
* `semantic_text` fields can't be defined as <<multi-fields,multi-fields>> of another field, nor can they contain other fields as multi-fields.
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,10 @@ private void expand() {
path = newPath;
}

public void remove() {
path[--index] = null;
public String remove() {
var ret = path[--index];
path[index] = null;
return ret;
}

public void setWithinLeafObject(boolean withinLeafObject) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1386,6 +1386,11 @@ public Builder init(FieldMapper initializer) {
return this;
}

public Builder addMultiField(FieldMapper.Builder builder) {
this.multiFieldsBuilder.add(builder);
return this;
}

protected BuilderParams builderParams(Mapper.Builder mainFieldBuilder, MapperBuilderContext context) {
return new BuilderParams(multiFieldsBuilder.build(mainFieldBuilder, context), copyTo, sourceKeepMode, hasScript, onScriptError);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,13 +143,7 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
// directly. We can safely split on all "." chars because semantic text fields cannot be used when subobjects == false.
String[] fieldNameParts = fieldName.split("\\.");
setPath(context.path(), fieldNameParts);

var parent = context.parent().findParentMapper(fieldName);
if (parent == null) {
throw new IllegalArgumentException("Field [" + fieldName + "] does not have a parent mapper");
}
String suffix = parent != context.parent() ? fieldName.substring(parent.fullPath().length() + 1) : fieldName;
var mapper = parent.getMapper(suffix);
var mapper = context.mappingLookup().getMapper(fieldName);
if (mapper instanceof SemanticTextFieldMapper fieldMapper) {
XContentLocation xContentLocation = context.parser().getTokenLocation();
var input = fieldMapper.parseSemanticTextField(context);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.MapperMergeContext;
import org.elasticsearch.index.mapper.MappingLookup;
import org.elasticsearch.index.mapper.MappingParserContext;
import org.elasticsearch.index.mapper.NestedObjectMapper;
import org.elasticsearch.index.mapper.ObjectMapper;
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
Expand Down Expand Up @@ -83,6 +84,7 @@
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.Function;

import static org.elasticsearch.search.SearchService.DEFAULT_SIZE;
Expand Down Expand Up @@ -117,12 +119,20 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie

public static final TypeParser PARSER = new TypeParser(
(n, c) -> new Builder(n, c::bitSetProducer, c.getIndexSettings()),
List.of(notInMultiFields(CONTENT_TYPE), notFromDynamicTemplates(CONTENT_TYPE))
List.of(validateParserContext(CONTENT_TYPE))
);

public static BiConsumer<String, MappingParserContext> validateParserContext(String type) {
return (n, c) -> {
if (InferenceMetadataFieldsMapper.isEnabled(c.getIndexSettings().getSettings()) == false) {
notInMultiFields(type).accept(n, c);
}
notFromDynamicTemplates(type).accept(n, c);
};
}

public static class Builder extends FieldMapper.Builder {
private final boolean useLegacyFormat;
private final IndexVersion indexVersionCreated;

private final Parameter<String> inferenceId = Parameter.stringParam(
INFERENCE_ID_FIELD,
Expand Down Expand Up @@ -176,7 +186,6 @@ public static Builder from(SemanticTextFieldMapper mapper) {

public Builder(String name, Function<Query, BitSetProducer> bitSetProducer, IndexSettings indexSettings) {
super(name);
this.indexVersionCreated = indexSettings.getIndexVersionCreated();
this.useLegacyFormat = InferenceMetadataFieldsMapper.isEnabled(indexSettings.getSettings()) == false;
this.inferenceFieldBuilder = c -> createInferenceField(
c,
Expand Down Expand Up @@ -223,10 +232,10 @@ protected void merge(FieldMapper mergeWith, Conflicts conflicts, MapperMergeCont

@Override
public SemanticTextFieldMapper build(MapperBuilderContext context) {
if (copyTo.copyToFields().isEmpty() == false) {
if (useLegacyFormat && copyTo.copyToFields().isEmpty() == false) {
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support [copy_to]");
}
if (multiFieldsBuilder.hasMultiFields()) {
if (useLegacyFormat && multiFieldsBuilder.hasMultiFields()) {
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support multi-fields");
}
final String fullName = context.buildFullName(leafName());
Expand All @@ -245,7 +254,6 @@ public SemanticTextFieldMapper build(MapperBuilderContext context) {
searchInferenceId.getValue(),
modelSettings.getValue(),
inferenceField,
indexVersionCreated,
useLegacyFormat,
meta.getValue()
),
Expand Down Expand Up @@ -275,13 +283,33 @@ private SemanticTextFieldMapper copySettings(SemanticTextFieldMapper mapper, Map

private SemanticTextFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
super(simpleName, mappedFieldType, builderParams);
ensureMultiFields(builderParams.multiFields().iterator());
}

private void ensureMultiFields(Iterator<FieldMapper> mappers) {
while (mappers.hasNext()) {
var mapper = mappers.next();
if (mapper.leafName().equals(INFERENCE_FIELD)) {
throw new IllegalArgumentException(
"Field ["
+ mapper.fullPath()
+ "] is already used by another field ["
+ fullPath()
+ "] internally. Please choose a different name."
);
}
}
}

@Override
public Iterator<Mapper> iterator() {
List<Mapper> subIterators = new ArrayList<>();
subIterators.add(fieldType().getInferenceField());
return subIterators.iterator();
List<Mapper> mappers = new ArrayList<>();
Iterator<Mapper> m = super.iterator();
while (m.hasNext()) {
mappers.add(m.next());
}
mappers.add(fieldType().getInferenceField());
return mappers.iterator();
}

@Override
Expand Down Expand Up @@ -350,20 +378,7 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel

final SemanticTextFieldMapper mapper;
if (fieldType().getModelSettings() == null) {
context.path().remove();
Builder builder = (Builder) new Builder(
leafName(),
fieldType().getChunksField().bitsetProducer(),
fieldType().getChunksField().indexSettings()
).init(this);
try {
mapper = builder.setModelSettings(field.inference().modelSettings())
.setInferenceId(field.inference().inferenceId())
.build(context.createDynamicMapperBuilderContext());
context.addDynamicMapper(mapper);
} finally {
context.path().add(leafName());
}
mapper = addDynamicUpdate(context, field);
} else {
Conflicts conflicts = new Conflicts(fullFieldName);
canMergeModelSettings(fieldType().getModelSettings(), field.inference().modelSettings(), conflicts);
Expand Down Expand Up @@ -438,6 +453,32 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
}
}

private SemanticTextFieldMapper addDynamicUpdate(DocumentParserContext context, SemanticTextField field) {
Builder builder = (Builder) getMergeBuilder();
context.path().remove();
try {
builder.setModelSettings(field.inference().modelSettings()).setInferenceId(field.inference().inferenceId());
if (context.mappingLookup().isMultiField(fullPath())) {
// The field is part of a multi-field, so the parent field must also be updated accordingly.
var fieldName = context.path().remove();
try {
var parentMapper = ((FieldMapper) context.mappingLookup().getMapper(context.mappingLookup().parentField(fullPath())))
.getMergeBuilder();
context.addDynamicMapper(parentMapper.addMultiField(builder).build(context.createDynamicMapperBuilderContext()));
return builder.build(context.createDynamicMapperBuilderContext());
} finally {
context.path().add(fieldName);
}
} else {
var mapper = builder.build(context.createDynamicMapperBuilderContext());
context.addDynamicMapper(mapper);
return mapper;
}
} finally {
context.path().add(leafName());
}
}

@Override
protected String contentType() {
return CONTENT_TYPE;
Expand All @@ -458,11 +499,14 @@ public InferenceFieldMetadata getMetadata(Set<String> sourcePaths) {

@Override
protected void doValidate(MappingLookup mappers) {
int parentPathIndex = fullPath().lastIndexOf(leafName());
String fullPath = mappers.isMultiField(fullPath()) ? mappers.parentField(fullPath()) : fullPath();
String leafName = mappers.getMapper(fullPath).leafName();
int parentPathIndex = fullPath.lastIndexOf(leafName);
if (parentPathIndex > 0) {
String parentName = fullPath.substring(0, parentPathIndex - 1);
// Check that the parent object field allows subobjects.
// Subtract one from the parent path index to omit the trailing dot delimiter.
ObjectMapper parentMapper = mappers.objectMappers().get(fullPath().substring(0, parentPathIndex - 1));
ObjectMapper parentMapper = mappers.objectMappers().get(parentName);
if (parentMapper == null) {
throw new IllegalStateException(CONTENT_TYPE + " field [" + fullPath() + "] does not have a parent object mapper");
}
Expand All @@ -480,7 +524,6 @@ public static class SemanticTextFieldType extends SimpleMappedFieldType {
private final String searchInferenceId;
private final SemanticTextField.ModelSettings modelSettings;
private final ObjectMapper inferenceField;
private final IndexVersion indexVersionCreated;
private final boolean useLegacyFormat;

public SemanticTextFieldType(
Expand All @@ -489,7 +532,6 @@ public SemanticTextFieldType(
String searchInferenceId,
SemanticTextField.ModelSettings modelSettings,
ObjectMapper inferenceField,
IndexVersion indexVersionCreated,
boolean useLegacyFormat,
Map<String, String> meta
) {
Expand All @@ -498,7 +540,6 @@ public SemanticTextFieldType(
this.searchInferenceId = searchInferenceId;
this.modelSettings = modelSettings;
this.inferenceField = inferenceField;
this.indexVersionCreated = indexVersionCreated;
this.useLegacyFormat = useLegacyFormat;
}

Expand Down
Loading