Skip to content

Commit b50cefe

Browse files
markjhoyelasticsearchmachine
andauthored
Add Sparse Vector Index Options Settings to Semantic Text Field (#131058)
* add sparse vector index options to semantic text * [CI] Auto commit changes from spotless * current tests - not 100% working yet * sparse_vector index options/createEmbeddingsField * set default index options if we don't have any * [CI] Auto commit changes from spotless * remove redundant code; set defaults * fix tests * add validation test * [CI] Auto commit changes from spotless * add additional tests * [CI] Auto commit changes from spotless * fix tests * [CI] Auto commit changes from spotless * ... and fix tests... * [CI] Auto commit changes from spotless * fill in test specific sparse vector index options * remove unused node feature * [CI] Auto commit changes from spotless * Update docs/changelog/131058.yaml * update changelog * some cleanups; still needs a few more tests * [CI] Auto commit changes from spotless * fix additional tests * [CI] Auto commit changes from spotless * and fix more tests * ... annnnd... fix more tests * [CI] Auto commit changes from spotless * clean tests; add YAML Rest tests * [CI] Auto commit changes from spotless * fix failing tests * [CI] Auto commit changes from spotless * fix tests due to multiple random index versioning * [CI] Auto commit changes from spotless * fix tests; fix yaml tests; * [CI] Auto commit changes from spotless * fix more tests due to random index versioning * [CI] Auto commit changes from spotless * ... and more test cleeaning * [CI] Auto commit changes from spotless * add link to sparse_vector index_options for docs * fix docs * fix tests; remove old code * correct tests; simplify mocking/spy ModelRegistry * add test for dense vector w/ sparse index options * [CI] Auto commit changes from spotless * collapse multiple "@before" methods --------- Co-authored-by: elasticsearchmachine <[email protected]>
1 parent afc2051 commit b50cefe

File tree

12 files changed

+1489
-155
lines changed

12 files changed

+1489
-155
lines changed

docs/changelog/131058.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 131058
2+
summary: Adds sparse vector index options settings to semantic_text field
3+
area: Search
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -156,9 +156,11 @@ to create the endpoint. If not specified, the {{infer}} endpoint defined by
156156

157157
`index_options` {applies_to}`stack: ga 9.1`
158158
: (Optional, object) Specifies the index options to override default values
159-
for the field. Currently, `dense_vector` index options are supported.
160-
For text embeddings, `index_options` may match any allowed
161-
[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
159+
for the field. Currently, `dense_vector` and `sparse_vector` index options are supported.
160+
For text embeddings, `index_options` may match any allowed.
161+
162+
* [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
163+
* [sparse_vector index options](/reference/elasticsearch/mapping-reference/sparse-vector.md#sparse-vectors-params). {applies_to}`stack: ga 9.2`
162164

163165
`chunking_settings` {applies_to}`stack: ga 9.1`
164166
: (Optional, object) Settings for chunking text into smaller passages.
@@ -410,7 +412,7 @@ stack: ga 9.0
410412
In case you want to customize data indexing, use the
411413
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
412414
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
413-
field types and create an ingest pipeline with an
415+
field types and create an ingest pipeline with an
414416
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
415417
generate the embeddings.
416418
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)

server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java

Lines changed: 52 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@
4848
import org.elasticsearch.xcontent.DeprecationHandler;
4949
import org.elasticsearch.xcontent.NamedXContentRegistry;
5050
import org.elasticsearch.xcontent.ParseField;
51-
import org.elasticsearch.xcontent.ToXContent;
5251
import org.elasticsearch.xcontent.XContentBuilder;
5352
import org.elasticsearch.xcontent.XContentParser;
5453
import org.elasticsearch.xcontent.XContentParser.Token;
@@ -98,7 +97,7 @@ public static class Builder extends FieldMapper.Builder {
9897

9998
private final Parameter<Boolean> stored = Parameter.storeParam(m -> toType(m).fieldType().isStored(), false);
10099
private final Parameter<Map<String, String>> meta = Parameter.metaParam();
101-
private final Parameter<IndexOptions> indexOptions = new Parameter<>(
100+
private final Parameter<SparseVectorIndexOptions> indexOptions = new Parameter<>(
102101
SPARSE_VECTOR_INDEX_OPTIONS,
103102
true,
104103
() -> null,
@@ -128,9 +127,9 @@ protected Parameter<?>[] getParameters() {
128127

129128
@Override
130129
public SparseVectorFieldMapper build(MapperBuilderContext context) {
131-
IndexOptions builderIndexOptions = indexOptions.getValue();
130+
SparseVectorIndexOptions builderIndexOptions = indexOptions.getValue();
132131
if (builderIndexOptions == null) {
133-
builderIndexOptions = getDefaultIndexOptions(indexVersionCreated);
132+
builderIndexOptions = SparseVectorIndexOptions.getDefaultIndexOptions(indexVersionCreated);
134133
}
135134

136135
final boolean syntheticVectorFinal = context.isSourceSynthetic() == false && isSyntheticVector;
@@ -149,33 +148,34 @@ public SparseVectorFieldMapper build(MapperBuilderContext context) {
149148
);
150149
}
151150

152-
private IndexOptions getDefaultIndexOptions(IndexVersion indexVersion) {
153-
return (indexVersion.onOrAfter(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION)
154-
|| indexVersion.between(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION_8_X, IndexVersions.UPGRADE_TO_LUCENE_10_0_0))
155-
? IndexOptions.DEFAULT_PRUNING_INDEX_OPTIONS
156-
: null;
151+
private boolean indexOptionsSerializerCheck(boolean includeDefaults, boolean isConfigured, SparseVectorIndexOptions value) {
152+
return includeDefaults || (SparseVectorIndexOptions.isDefaultOptions(value, indexVersionCreated) == false);
157153
}
158154

159-
private boolean indexOptionsSerializerCheck(boolean includeDefaults, boolean isConfigured, IndexOptions value) {
160-
return includeDefaults || (IndexOptions.isDefaultOptions(value, indexVersionCreated) == false);
155+
public void setIndexOptions(SparseVectorIndexOptions sparseVectorIndexOptions) {
156+
indexOptions.setValue(sparseVectorIndexOptions);
161157
}
162158
}
163159

164-
public IndexOptions getIndexOptions() {
160+
public SparseVectorIndexOptions getIndexOptions() {
165161
return fieldType().getIndexOptions();
166162
}
167163

168-
private static final ConstructingObjectParser<IndexOptions, Void> INDEX_OPTIONS_PARSER = new ConstructingObjectParser<>(
164+
private static final ConstructingObjectParser<SparseVectorIndexOptions, Void> INDEX_OPTIONS_PARSER = new ConstructingObjectParser<>(
169165
SPARSE_VECTOR_INDEX_OPTIONS,
170-
args -> new IndexOptions((Boolean) args[0], (TokenPruningConfig) args[1])
166+
args -> new SparseVectorIndexOptions((Boolean) args[0], (TokenPruningConfig) args[1])
171167
);
172168

173169
static {
174-
INDEX_OPTIONS_PARSER.declareBoolean(optionalConstructorArg(), IndexOptions.PRUNE_FIELD_NAME);
175-
INDEX_OPTIONS_PARSER.declareObject(optionalConstructorArg(), TokenPruningConfig.PARSER, IndexOptions.PRUNING_CONFIG_FIELD_NAME);
170+
INDEX_OPTIONS_PARSER.declareBoolean(optionalConstructorArg(), SparseVectorIndexOptions.PRUNE_FIELD_NAME);
171+
INDEX_OPTIONS_PARSER.declareObject(
172+
optionalConstructorArg(),
173+
TokenPruningConfig.PARSER,
174+
SparseVectorIndexOptions.PRUNING_CONFIG_FIELD_NAME
175+
);
176176
}
177177

178-
private static SparseVectorFieldMapper.IndexOptions parseIndexOptions(MappingParserContext context, Object propNode) {
178+
private static SparseVectorIndexOptions parseIndexOptions(MappingParserContext context, Object propNode) {
179179
if (propNode == null) {
180180
return null;
181181
}
@@ -212,7 +212,7 @@ private static SparseVectorFieldMapper.IndexOptions parseIndexOptions(MappingPar
212212

213213
public static final class SparseVectorFieldType extends MappedFieldType {
214214
private final IndexVersion indexVersionCreated;
215-
private final IndexOptions indexOptions;
215+
private final SparseVectorIndexOptions indexOptions;
216216

217217
public SparseVectorFieldType(IndexVersion indexVersionCreated, String name, boolean isStored, Map<String, String> meta) {
218218
this(indexVersionCreated, name, isStored, meta, null);
@@ -223,14 +223,14 @@ public SparseVectorFieldType(
223223
String name,
224224
boolean isStored,
225225
Map<String, String> meta,
226-
@Nullable SparseVectorFieldMapper.IndexOptions indexOptions
226+
@Nullable SparseVectorIndexOptions indexOptions
227227
) {
228228
super(name, true, isStored, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);
229229
this.indexVersionCreated = indexVersionCreated;
230230
this.indexOptions = indexOptions;
231231
}
232232

233-
public IndexOptions getIndexOptions() {
233+
public SparseVectorIndexOptions getIndexOptions() {
234234
return indexOptions;
235235
}
236236

@@ -560,15 +560,18 @@ public void reset() {
560560
}
561561
}
562562

563-
public static class IndexOptions implements ToXContent {
563+
public static class SparseVectorIndexOptions implements IndexOptions {
564564
public static final ParseField PRUNE_FIELD_NAME = new ParseField("prune");
565565
public static final ParseField PRUNING_CONFIG_FIELD_NAME = new ParseField("pruning_config");
566-
public static final IndexOptions DEFAULT_PRUNING_INDEX_OPTIONS = new IndexOptions(true, new TokenPruningConfig());
566+
public static final SparseVectorIndexOptions DEFAULT_PRUNING_INDEX_OPTIONS = new SparseVectorIndexOptions(
567+
true,
568+
new TokenPruningConfig()
569+
);
567570

568571
final Boolean prune;
569572
final TokenPruningConfig pruningConfig;
570573

571-
IndexOptions(@Nullable Boolean prune, @Nullable TokenPruningConfig pruningConfig) {
574+
public SparseVectorIndexOptions(@Nullable Boolean prune, @Nullable TokenPruningConfig pruningConfig) {
572575
if (pruningConfig != null && (prune == null || prune == false)) {
573576
throw new IllegalArgumentException(
574577
"["
@@ -585,14 +588,37 @@ public static class IndexOptions implements ToXContent {
585588
this.pruningConfig = pruningConfig;
586589
}
587590

588-
public static boolean isDefaultOptions(IndexOptions indexOptions, IndexVersion indexVersion) {
589-
IndexOptions defaultIndexOptions = indexVersionSupportsDefaultPruningConfig(indexVersion)
591+
public static boolean isDefaultOptions(SparseVectorIndexOptions indexOptions, IndexVersion indexVersion) {
592+
SparseVectorIndexOptions defaultIndexOptions = indexVersionSupportsDefaultPruningConfig(indexVersion)
590593
? DEFAULT_PRUNING_INDEX_OPTIONS
591594
: null;
592595

593596
return Objects.equals(indexOptions, defaultIndexOptions);
594597
}
595598

599+
public static SparseVectorIndexOptions getDefaultIndexOptions(IndexVersion indexVersion) {
600+
return indexVersionSupportsDefaultPruningConfig(indexVersion) ? DEFAULT_PRUNING_INDEX_OPTIONS : null;
601+
}
602+
603+
public static SparseVectorIndexOptions parseFromMap(Map<String, Object> map) {
604+
if (map == null) {
605+
return null;
606+
}
607+
608+
try {
609+
XContentParser parser = new MapXContentParser(
610+
NamedXContentRegistry.EMPTY,
611+
DeprecationHandler.IGNORE_DEPRECATIONS,
612+
map,
613+
XContentType.JSON
614+
);
615+
616+
return INDEX_OPTIONS_PARSER.parse(parser, null);
617+
} catch (IOException ioEx) {
618+
throw new UncheckedIOException(ioEx);
619+
}
620+
}
621+
596622
public Boolean getPrune() {
597623
return prune;
598624
}
@@ -626,7 +652,7 @@ public final boolean equals(Object other) {
626652
return false;
627653
}
628654

629-
IndexOptions otherAsIndexOptions = (IndexOptions) other;
655+
SparseVectorIndexOptions otherAsIndexOptions = (SparseVectorIndexOptions) other;
630656
return Objects.equals(prune, otherAsIndexOptions.prune) && Objects.equals(pruningConfig, otherAsIndexOptions.pruningConfig);
631657
}
632658

server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -906,4 +906,8 @@ private Map<String, Float> toFloats(Map<String, ?> value) {
906906
}
907907
return result;
908908
}
909+
910+
public static IndexVersion getIndexOptionsCompatibleIndexVersion() {
911+
return IndexVersionUtils.randomVersionBetween(random(), SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT, IndexVersion.current());
912+
}
909913
}

server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldTypeTests.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,28 @@ public void testIsNotAggregatable() {
4040
MappedFieldType fieldType = new SparseVectorFieldMapper.SparseVectorFieldType(indexVersion, "field", false, Collections.emptyMap());
4141
assertFalse(fieldType.isAggregatable());
4242
}
43+
44+
public static SparseVectorFieldMapper.SparseVectorIndexOptions randomSparseVectorIndexOptions() {
45+
return randomSparseVectorIndexOptions(true);
46+
}
47+
48+
public static SparseVectorFieldMapper.SparseVectorIndexOptions randomSparseVectorIndexOptions(boolean includeNull) {
49+
if (includeNull && randomBoolean()) {
50+
return null;
51+
}
52+
53+
Boolean prune = randomBoolean() ? null : randomBoolean();
54+
if (prune == null) {
55+
new SparseVectorFieldMapper.SparseVectorIndexOptions(null, null);
56+
}
57+
58+
if (prune == Boolean.FALSE) {
59+
new SparseVectorFieldMapper.SparseVectorIndexOptions(false, null);
60+
}
61+
62+
return new SparseVectorFieldMapper.SparseVectorIndexOptions(
63+
true,
64+
new TokenPruningConfig(randomFloatBetween(1.0f, 100.0f, true), randomFloatBetween(0.0f, 1.0f, true), randomBoolean())
65+
);
66+
}
4367
}

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import static org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.SEMANTIC_TEXT_EXCLUDE_SUB_FIELDS_FROM_FIELD_CAPS;
2020
import static org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.SEMANTIC_TEXT_INDEX_OPTIONS;
2121
import static org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.SEMANTIC_TEXT_INDEX_OPTIONS_WITH_DEFAULTS;
22+
import static org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.SEMANTIC_TEXT_SPARSE_VECTOR_INDEX_OPTIONS;
2223
import static org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.SEMANTIC_TEXT_SUPPORT_CHUNKING_CONFIG;
2324
import static org.elasticsearch.xpack.inference.queries.SemanticKnnVectorQueryRewriteInterceptor.SEMANTIC_KNN_FILTER_FIX;
2425
import static org.elasticsearch.xpack.inference.queries.SemanticKnnVectorQueryRewriteInterceptor.SEMANTIC_KNN_VECTOR_QUERY_REWRITE_INTERCEPTION_SUPPORTED;
@@ -78,7 +79,8 @@ public Set<NodeFeature> getTestFeatures() {
7879
COHERE_V2_API,
7980
SEMANTIC_TEXT_INDEX_OPTIONS_WITH_DEFAULTS,
8081
SEMANTIC_QUERY_REWRITE_INTERCEPTORS_PROPAGATE_BOOST_AND_QUERY_NAME_FIX,
81-
SEMANTIC_TEXT_HIGHLIGHTING_FLAT
82+
SEMANTIC_TEXT_HIGHLIGHTING_FLAT,
83+
SEMANTIC_TEXT_SPARSE_VECTOR_INDEX_OPTIONS
8284
)
8385
);
8486
if (RERANK_SNIPPETS.isEnabled()) {

0 commit comments

Comments
 (0)