Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
6439422
Register dense vector field type
carlosdelest Apr 7, 2025
c7e48a0
Add first version of BlockDocValuesReader for dense_vector
carlosdelest Apr 7, 2025
9e40cd4
Fixed BlockDocValuesReader to work with all dense_vector types
carlosdelest Apr 7, 2025
03f6a92
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 7, 2025
d84c8dd
Small test fixes
carlosdelest Apr 7, 2025
9ff5ed8
Fix casting
carlosdelest Apr 7, 2025
2754697
Improve testing to add random indexing, and fix similarity
carlosdelest Apr 7, 2025
d196be0
Add index = false support
carlosdelest Apr 8, 2025
975b4db
Spotless
carlosdelest Apr 8, 2025
ae9fa5f
Synthetic source testing
carlosdelest Apr 8, 2025
9896adf
Add CSV tests and necessary infra for dense_vector field type
carlosdelest Apr 8, 2025
dfa420e
Make CSV test loader to use numbers when there are multivalued numeri…
carlosdelest Apr 8, 2025
d983495
Fixed test mapping to avoid parsing errors when numbers are used for …
carlosdelest Apr 8, 2025
5707e25
Avoid non-float dense vector field types for now
carlosdelest Apr 8, 2025
a8c8a6a
[CI] Auto commit changes from spotless
Apr 8, 2025
d827c6d
Fix test error when checking block loaders
carlosdelest Apr 8, 2025
d8e139d
Fix value generation for dense vectors
carlosdelest Apr 8, 2025
0539aff
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest Apr 8, 2025
7bbe7ee
Fix testing for dense vectors and block loaders
carlosdelest Apr 8, 2025
7d1f8b7
Remove dense_vector from CaseTests
carlosdelest Apr 8, 2025
8870fef
Provide support in tests for creating constant blocks from collection…
carlosdelest Apr 8, 2025
ad84f13
Fix test error
carlosdelest Apr 9, 2025
2619fe6
Fix unsupported types test
carlosdelest Apr 9, 2025
b23d5a9
Support synthetic source with non indexed vectors
carlosdelest Apr 9, 2025
6bb8e62
Remove unneeded code
carlosdelest Apr 9, 2025
8013e47
Add support for dense_vector in PositionToXContent and EsqlQueryRespo…
carlosdelest Apr 9, 2025
e0e34f4
Fix unsupported types test
carlosdelest Apr 9, 2025
c30f5d9
Fix unsupported types test
carlosdelest Apr 10, 2025
a2fbb13
Fix block loader builder
carlosdelest Apr 10, 2025
4833ce5
Ensure ordering when creating double blocks
carlosdelest Apr 10, 2025
e8878a0
Changed randomDouble() for randomFloat() to generate test data
carlosdelest Apr 10, 2025
93d45fc
Additional CSV tests
carlosdelest Apr 11, 2025
a6b0f6c
Fix class casting
carlosdelest Apr 11, 2025
cd462b8
Use Float as block type for dense_vector
carlosdelest Apr 15, 2025
0675a42
Fix tests
carlosdelest Apr 15, 2025
7f95d6a
[CI] Auto commit changes from spotless
Apr 15, 2025
2cff4e7
Fix test
carlosdelest Apr 15, 2025
efc3fb6
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest Apr 15, 2025
f8741ca
Floats are not ordered for dense_vector values
carlosdelest Apr 21, 2025
c951ee7
Use dense vector specific methods for creating builders, and check di…
carlosdelest Apr 21, 2025
ba0a6b9
Avoid doing checks on dense vectors as we can't cast to BlockBuilder
carlosdelest Apr 21, 2025
f5889b2
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 21, 2025
4b2126e
Add CSV tests with unordered data, and fix CSV data retrieval for tha…
carlosdelest Apr 30, 2025
afe79f7
Change source readers name() method
carlosdelest Apr 30, 2025
8049a58
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 30, 2025
217ce84
[CI] Auto commit changes from spotless
Apr 30, 2025
74fe676
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest May 23, 2025
af965a1
Spotless
carlosdelest May 23, 2025
53ec29c
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest May 23, 2025
60da0fe
Fix tests
carlosdelest May 23, 2025
06cbea9
Fix test suggested cast for joins
carlosdelest May 26, 2025
8f07e08
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest May 26, 2025
53bcbc7
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest May 26, 2025
fe82007
[CI] Auto commit changes from spotless
May 26, 2025
a63ca2c
Fix LookupJoinTypesIT
carlosdelest May 26, 2025
c5d6ddc
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest May 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@

import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.index.DocValues;
import org.apache.lucene.index.FloatVectorValues;
import org.apache.lucene.index.KnnVectorValues;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.NumericDocValues;
import org.apache.lucene.index.SortedDocValues;
Expand Down Expand Up @@ -504,6 +506,80 @@ public String toString() {
}
}

public static class DenseVectorBlockLoader extends DocValuesBlockLoader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a BlockLoader for dense vectors, that uses FloatVectorValues to retrieve indexed vector data.

private final String fieldName;

public DenseVectorBlockLoader(String fieldName) {
this.fieldName = fieldName;
}

@Override
public Builder builder(BlockFactory factory, int expectedCount) {
return factory.doubles(expectedCount);
}

@Override
public AllReader reader(LeafReaderContext context) throws IOException {
FloatVectorValues floatVectorValues = context.reader().getFloatVectorValues(fieldName);
if (floatVectorValues != null) {
return new FloatVectorValuesBlockReader(floatVectorValues);
}
return new ConstantNullsReader();
}
}

private static class FloatVectorValuesBlockReader extends BlockDocValuesReader {
private final FloatVectorValues floatVectorValues;
private final KnnVectorValues.DocIndexIterator iterator;

FloatVectorValuesBlockReader(FloatVectorValues floatVectorValues) {
this.floatVectorValues = floatVectorValues;
iterator = floatVectorValues.iterator();
}

@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
try (BlockLoader.DoubleBuilder builder = factory.doubles(docs.count())) {
for (int i = 0; i < docs.count(); i++) {
int doc = docs.get(i);
if (doc < iterator.docID()) {
throw new IllegalStateException("docs within same block must be in order");
}
read(doc, builder);
}
return builder.build();
}
}

@Override
public void read(int docId, BlockLoader.StoredFields storedFields, Builder builder) throws IOException {
read(docId, (DoubleBuilder) builder);
}

private void read(int doc, DoubleBuilder builder) throws IOException {
if (iterator.advance(doc) == doc) {
builder.beginPositionEntry();
float[] floats = floatVectorValues.vectorValue(iterator.index());
for (float aFloat : floats) {
builder.appendDouble(aFloat);
}
builder.endPositionEntry();
} else {
builder.appendNull();
}
}

@Override
public int docId() {
return iterator.docID();
}

@Override
public String toString() {
return "BlockDocValuesReader.FloatVectorValuesBlockReader";
}
}

public static class BytesRefsFromOrdsBlockLoader extends DocValuesBlockLoader {
private final String fieldName;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,20 @@
import org.elasticsearch.index.fielddata.FieldDataContext;
import org.elasticsearch.index.fielddata.IndexFieldData;
import org.elasticsearch.index.mapper.ArraySourceValueFetcher;
import org.elasticsearch.index.mapper.BlockDocValuesReader;
import org.elasticsearch.index.mapper.BlockLoader;
import org.elasticsearch.index.mapper.BlockSourceReader;
import org.elasticsearch.index.mapper.DocumentParserContext;
import org.elasticsearch.index.mapper.FieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.Mapper;
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.MapperParsingException;
import org.elasticsearch.index.mapper.MappingParser;
import org.elasticsearch.index.mapper.NumberFieldMapper;
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
import org.elasticsearch.index.mapper.SourceLoader;
import org.elasticsearch.index.mapper.SourceValueFetcher;
import org.elasticsearch.index.mapper.TextSearchInfo;
import org.elasticsearch.index.mapper.ValueFetcher;
import org.elasticsearch.index.query.SearchExecutionContext;
Expand Down Expand Up @@ -89,6 +94,7 @@
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
Expand Down Expand Up @@ -2077,6 +2083,18 @@ protected Object parseSourceValue(Object value) {
};
}

private SourceValueFetcher sourceValueFetcher(Set<String> sourcePaths) {
return new SourceValueFetcher(sourcePaths, null) {
@Override
protected Object parseSourceValue(Object value) {
if (value.equals("")) {
return null;
}
return NumberFieldMapper.NumberType.FLOAT.parse(value, false);
}
};
}

@Override
public DocValueFormat docValueFormat(String format, ZoneId timeZone) {
return DocValueFormat.DENSE_VECTOR;
Expand Down Expand Up @@ -2311,6 +2329,20 @@ int getVectorDimensions() {
ElementType getElementType() {
return elementType;
}

@Override
public BlockLoader blockLoader(MappedFieldType.BlockLoaderContext blContext) {
if (elementType != ElementType.FLOAT) {
throw new UnsupportedOperationException("Only float dense vectors are supported for now");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can work on this next, creating specific BlockLoaders for Byte and Bit field types.

}

if (indexed) {
return new BlockDocValuesReader.DenseVectorBlockLoader(name());
}

BlockSourceReader.LeafIteratorLookup lookup = BlockSourceReader.lookupMatchingAll();
return new BlockSourceReader.DoublesBlockLoader(sourceValueFetcher(blContext.sourcePaths(name())), lookup);
}
}

private final IndexOptions indexOptions;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@
public class EsqlCorePlugin extends Plugin implements ExtensiblePlugin {

public static final FeatureFlag AGGREGATE_METRIC_DOUBLE_FEATURE_FLAG = new FeatureFlag("esql_aggregate_metric_double");
public static final FeatureFlag DENSE_VECTOR_FEATURE_FLAG = new FeatureFlag("esql_dense_vector");
}
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,13 @@ public enum DataType {
*/
PARTIAL_AGG(builder().esType("partial_agg").unknownSize()),

AGGREGATE_METRIC_DOUBLE(builder().esType("aggregate_metric_double").estimatedSize(Double.BYTES * 3 + Integer.BYTES));
AGGREGATE_METRIC_DOUBLE(builder().esType("aggregate_metric_double").estimatedSize(Double.BYTES * 3 + Integer.BYTES)),

/**
* Fields with this type are dense vectors, represented as an array of double values.
*/
DENSE_VECTOR(builder().esType("dense_vector").unknownSize());


/**
* Types that are actively being built. These types are not returned
Expand All @@ -311,7 +317,8 @@ public enum DataType {
* check that sending them to a function produces a sane error message.
*/
public static final Map<DataType, FeatureFlag> UNDER_CONSTRUCTION = Map.ofEntries(
Map.entry(AGGREGATE_METRIC_DOUBLE, EsqlCorePlugin.AGGREGATE_METRIC_DOUBLE_FEATURE_FLAG)
Map.entry(AGGREGATE_METRIC_DOUBLE, EsqlCorePlugin.AGGREGATE_METRIC_DOUBLE_FEATURE_FLAG),
Map.entry(DENSE_VECTOR, EsqlCorePlugin.DENSE_VECTOR_FEATURE_FLAG)
);

private final String typeName;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import static org.elasticsearch.common.logging.LoggerMessageFormat.format;
import static org.elasticsearch.xpack.esql.CsvTestUtils.ExpectedResults;
import static org.elasticsearch.xpack.esql.CsvTestUtils.Type;
import static org.elasticsearch.xpack.esql.CsvTestUtils.Type.DENSE_VECTOR;
import static org.elasticsearch.xpack.esql.CsvTestUtils.Type.UNSIGNED_LONG;
import static org.elasticsearch.xpack.esql.CsvTestUtils.logMetaData;
import static org.elasticsearch.xpack.esql.core.util.DateUtils.UTC_DATE_TIME_FORMATTER;
Expand Down Expand Up @@ -145,6 +146,10 @@ private static void assertMetadata(
// Type.asType translates all bytes references into keywords
continue;
}
if (blockType == Type.DOUBLE && expectedType == DENSE_VECTOR) {
// DENSE_VECTOR is internally represented as a double block
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially change when we support byte and bit element types - we could create the appropriate blocks.

continue;
}
if (blockType == Type.NULL) {
// Null pages don't have any real type information beyond "it's all null, man"
continue;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ public enum Type {
x -> x == null ? null : stringToAggregateMetricDoubleLiteral(x),
AggregateMetricDoubleBlockBuilder.AggregateMetricDoubleLiteral.class
),
DENSE_VECTOR(Double::parseDouble, Double.class),
UNSUPPORTED(Type::convertUnsupported, Void.class);

private static Void convertUnsupported(String s) {
Expand Down Expand Up @@ -528,6 +529,8 @@ private static Void convertUnsupported(String s) {
LOOKUP.put("DATE", DATETIME);
LOOKUP.put("DT", DATETIME);
LOOKUP.put("V", VERSION);

LOOKUP.put("DENSE_VECTOR", DENSE_VECTOR);
}

private final Function<String, Object> converter;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ public class CsvTestsDataLoader {
private static final TestDataset ADDRESSES = new TestDataset("addresses");
private static final TestDataset BOOKS = new TestDataset("books").withSetting("books-settings.json");
private static final TestDataset SEMANTIC_TEXT = new TestDataset("semantic_text").withInferenceEndpoint(true);
private static final TestDataset DENSE_VECTOR = new TestDataset("dense_vector");

public static final Map<String, TestDataset> CSV_DATASET_MAP = Map.ofEntries(
Map.entry(EMPLOYEES.indexName, EMPLOYEES),
Expand Down Expand Up @@ -182,7 +183,8 @@ public class CsvTestsDataLoader {
Map.entry(DISTANCES.indexName, DISTANCES),
Map.entry(ADDRESSES.indexName, ADDRESSES),
Map.entry(BOOKS.indexName, BOOKS),
Map.entry(SEMANTIC_TEXT.indexName, SEMANTIC_TEXT)
Map.entry(SEMANTIC_TEXT.indexName, SEMANTIC_TEXT),
Map.entry(DENSE_VECTOR.indexName, DENSE_VECTOR)
);

private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enrich-policy-languages.json");
Expand Down Expand Up @@ -215,6 +217,7 @@ public class CsvTestsDataLoader {
CITY_BOUNDARIES_ENRICH,
CITY_AIRPORTS_ENRICH
);
public static final String NUMERIC_REGEX = "-?\\d+(\\.\\d+)?";

/**
* <p>
Expand Down Expand Up @@ -637,7 +640,8 @@ private static void loadCsvData(RestClient client, String indexName, URL resourc

private static String quoteIfNecessary(String value) {
boolean isQuoted = (value.startsWith("\"") && value.endsWith("\"")) || (value.startsWith("{") && value.endsWith("}"));
return isQuoted ? value : "\"" + value + "\"";
boolean isNumeric = value.matches(NUMERIC_REGEX);
return isQuoted || isNumeric ? value : "\"" + value + "\"";
}

private static void sendBulkRequest(String indexName, StringBuilder builder, RestClient client, Logger logger, List<String> failures)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@
import static org.elasticsearch.test.ESTestCase.assertEquals;
import static org.elasticsearch.test.ESTestCase.between;
import static org.elasticsearch.test.ESTestCase.randomAlphaOfLength;
import static org.elasticsearch.test.ESTestCase.randomArray;
import static org.elasticsearch.test.ESTestCase.randomBoolean;
import static org.elasticsearch.test.ESTestCase.randomByte;
import static org.elasticsearch.test.ESTestCase.randomDouble;
Expand Down Expand Up @@ -827,6 +828,8 @@ public static Literal randomLiteral(DataType type) {
throw new UncheckedIOException(e);
}
}
// TODO Need to get the dimensions
case DENSE_VECTOR -> randomArray(10, 10, i -> new Float[10], ESTestCase::randomFloat);
case UNSUPPORTED, OBJECT, DOC_DATA_TYPE, TSID_DATA_TYPE, PARTIAL_AGG -> throw new IllegalArgumentException(
"can't make random values for [" + type.typeName() + "]"
);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id:l, vector:dense_vector
0, [1.0, 2.0, 3.0]
1, [4.0, 5.0, 6.0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be very nitpicky - but can we add another dense_vector value that does not have ordered values?
this might be why we did not caught #126456 (comment) during tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mvOrdered does not impact retrieval as it's used as an optimization in some cases. But adding unordered data uncovered a small fix that needed to be done to support multivalued style fields: 4b2126e

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

retrieveDenseVectorData
required_capability: dense_vector_field_type

FROM dense_vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more csv tests here with commands we know should be supported 🙈 ?
We have KEEP already, but I'm thinking DROP, RENAME and simple EVALs (EVAL a = dense_vector_field) might be supported already.
I know it should all just work - but for our own peace of mind it would be good to cover them.
Can be a single test that uses a combination of commands we know should be supported already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one in 93d45fc. I'm not sure what this would catch, as the inner representation of dense_vector is a DoubleBlock and that is extensively tested for other fields.

I'm sure we will keep adding tests once we include arithmetic operators, conversion, etc to dense_vector.

Copy link
Member

@fang-xing-esql fang-xing-esql Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expected that the items in a dense_vector have a fixed order? The reason I'm asking is that a dense_vector looks very like a multi-valued double fields, it is hard to tell whether it is a dense_vector field or a double field with MV from its value looks, and the order of the items in an MV is not guaranteed. I wonder what is the relationship between a double field with MV and a dense_vector.

Do we expect the functions/commands that take multi-valued fields apply to dense_vector? Like those mv_xxx and to_xxx functions, mv_expand, stats by mv_fields etc.?

If I understand it right, dense_vector does not support sort or aggregation, does dense_vector support comparison, does it make sense to dense_vector fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expected that the items in a dense_vector have a fixed order?

Yes. It's crucial that the vector dimensions match between different vectors, so they can be compared.

a dense_vector looks very like a multi-valued double fields, it is hard to tell whether it is a dense_vector field or a double field with MV from its value looks

I've used MV as this seemed a supported way of internally representing a double array. From a user perspective, it's a different data type - it's a dense_vector, which will not necessarily be supported on the same functions, and it will always have the same number of dimensions / ordering for a specific mapping.

I wonder what is the relationship between a double field with MV and a dense_vector.

We were thinking on adding a TO_DENSE_VECTOR cast function so users can specific a dense_vector using the MV double syntax:

WHERE knn(field, TO_DENSE_VECTOR([0.1, 0.2, 0.3, ... , 1.0])

Besides that, there should be no relation between the two. They are different data types that have the same representation (an array of elements).

Do we expect the functions/commands that take multi-valued fields apply to dense_vector? Like those mv_xxx and to_xxx functions, mv_expand, stats by mv_fields etc.?

It would probably help to differentiate the two data types if dense_vector fields do not support MV functions.

We can provide support for multivalued functions, but most of them will not make sense in the context of a dense_vector (MV_APPEND, MV_CONCAT, MV_DEDUPE, MV_SORT, MV_SUM). Others can be useful even though they are not necessarily vector related (MV_COUNT, MV_FIRST,MV_MEDIAN, MV_MAX, MV_MIN), but supporting those could confuse ESQL users.

dense_vector does not support sort or aggregation,

Correct.

does dense_vector support comparison, does it make sense to dense_vector fields?

We could support equality. Binary comparisons like greater / less than makes no sense for dense_vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests - this is addressed from my POV.
we can follow up on supporting more functions/operators - e.g. equality makes sense to me.

| KEEP id, vector
| SORT id
;

id:l | vector:dense_vector
0 | [1.0, 2.0, 3.0]
1 | [4.0, 5.0, 6.0]
;
Original file line number Diff line number Diff line change
Expand Up @@ -63,18 +63,7 @@
"type" : "keyword"
},
"salary_change": {
"type": "float",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing CSV loading made this a problem, as there were parsing exceptions when trying to index float numeric data into integers. I didn't see a convenient way out of this and decided to remove as this field is not being tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can keep this unchanged, that will be great, this is a good example of nested fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just changed for the mapping-default-incompatible mapping, which was created to test some incompatible field mappings that did not include subfields. I'll try to give this another shot but it will require changes to the CSV loader or the dataset 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just changed for the mapping-default-incompatible mapping, which was created to test some incompatible field mappings that did not include subfields. I'll try to give this another shot but it will require changes to the CSV loader or the dataset 😢

Is it easier if we make another copy of employees's schema and data for dense_vector related tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easier if we make another copy of employees's schema and data for dense_vector related tests?

The problem is that changing how the CSV tests load data impacted this dataset. Before this change, multivalues were being uploaded as arrays of strings, which is something we don't want to do for dense_vectors as that is not a format supported on the DenseVectorFieldMapper.

It seemed like too much work to change the actual dataset when that particular field is not actually used in the tests.

I'm open to other solutions here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fang-xing-esql are these fields used in other tests somehow?
I see this was actually added by @carlosdelest a while back #117555.
The employees_incompatible index that is set with this mapping is only used in match function/operator tests.
We don't modify any of those tests here, so this looks like a safe change to me.

"fields": {
"int": {
"type": "integer"
},
"long": {
"type": "long"
},
"keyword": {
"type" : "keyword"
}
}
"type": "float"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"properties": {
"id": {
"type": "long"
},
"vector": {
"type": "dense_vector",
"similarity": "l2_norm"
}
}
}
Loading