Skip to content

Commit 3e66623

Browse files
committed
Add proper enums for some DenseVectorProperty fields
1 parent bbb0349 commit 3e66623

File tree

6 files changed

+238
-60
lines changed

6 files changed

+238
-60
lines changed

compiler/package-lock.json

Lines changed: 3 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 97 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_types/mapping/DenseVectorIndexOptions.ts

Lines changed: 0 additions & 27 deletions
This file was deleted.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
/*
2+
* Licensed to Elasticsearch B.V. under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch B.V. licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
import { float, integer } from '@_types/Numeric'
21+
import { PropertyBase } from './Property'
22+
23+
export class DenseVectorProperty extends PropertyBase {
24+
type: 'dense_vector'
25+
element_type?: DenseVectorElementType
26+
dims?: integer
27+
similarity?: DenseVectorSimilarity
28+
index?: boolean
29+
index_options?: DenseVectorIndexOptions
30+
}
31+
32+
export enum DenseVectorElementType {
33+
/**
34+
* Indexes a single bit per dimension. Useful for very high-dimensional vectors or models that specifically support
35+
* bit vectors.
36+
*
37+
* NOTE: when using `bit`, the number of dimensions must be a multiple of `8` and must represent the number of bits.
38+
*/
39+
bit,
40+
/**
41+
* Indexes a 1-byte integer value per dimension.
42+
*/
43+
byte,
44+
/**
45+
* Indexes a 4-byte floating-point value per dimension.
46+
*/
47+
float
48+
}
49+
50+
export enum DenseVectorSimilarity {
51+
/**
52+
* Computes the cosine similarity. During indexing Elasticsearch automatically normalizes vectors with `cosine`
53+
* similarity to unit length. This allows to internally use `dot_product` for computing similarity, which is more
54+
* efficient. Original un-normalized vectors can be still accessed through scripts.
55+
*
56+
* The document `_score` is computed as `(1 + cosine(query, vector)) / 2`.
57+
*
58+
* The `cosine` similarity does not allow vectors with zero magnitude, since cosine is not defined in this case.
59+
*/
60+
cosine,
61+
/**
62+
* Computes the dot product of two unit vectors. This option provides an optimized way to perform cosine similarity.
63+
* The constraints and computed score are defined by `element_type`.
64+
*
65+
* When `element_type` is `float`, all vectors must be unit length, including both document and query vectors.
66+
*
67+
* The document `_score` is computed as `(1 + dot_product(query, vector)) / 2`.
68+
*
69+
* When `element_type` is `byte`, all vectors must have the same length including both document and query vectors or
70+
* results will be inaccurate.
71+
*
72+
* The document `_score` is computed as `0.5 + (dot_product(query, vector) / (32768 * dims))` where `dims` is the
73+
* number of dimensions per vector.
74+
*/
75+
dot_product,
76+
/**
77+
* Computes similarity based on the `L2` distance (also known as Euclidean distance) between the vectors.
78+
*
79+
* The document `_score` is computed as `1 / (1 + l2_norm(query, vector)^2)`.
80+
*
81+
* For `bit` vectors, instead of using `l2_norm`, the `hamming` distance between the vectors is used.
82+
*
83+
* The `_score` transformation is `(numBits - hamming(a, b)) / numBits`.
84+
*/
85+
l2_norm,
86+
/**
87+
* Computes the maximum inner product of two vectors. This is similar to `dot_product`, but doesn't require vectors
88+
* to be normalized. This means that each vector’s magnitude can significantly effect the score.
89+
*
90+
* The document `_score` is adjusted to prevent negative values. For `max_inner_product` values `< 0`, the `_score`
91+
* is `1 / (1 + -1 * max_inner_product(query, vector))`. For non-negative `max_inner_product` results the `_score`
92+
* is calculated `max_inner_product(query, vector) + 1`.
93+
*/
94+
max_inner_product
95+
}
96+
97+
export class DenseVectorIndexOptions {
98+
type: DenseVectorIndexOptionsType
99+
m?: integer
100+
ef_construction?: integer
101+
confidence_interval?: float
102+
}
103+
104+
export enum DenseVectorIndexOptionsType {
105+
/**
106+
* This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
107+
*/
108+
flat,
109+
/**
110+
* This utilizes the HNSW algorithm for scalable approximate kNN search. This supports all `element_type` values.
111+
*/
112+
hnsw,
113+
/**
114+
* This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization.
115+
* Only supports `element_type` of `float`.
116+
*/
117+
int4_flat,
118+
/**
119+
* This utilizes the HNSW algorithm in addition to automatically scalar quantization for scalable approximate kNN
120+
* search with `element_type` of `float`.
121+
*
122+
* This can reduce the memory footprint by 8x at the cost of some accuracy.
123+
*/
124+
int4_hnsw,
125+
/**
126+
* This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports
127+
* `element_type` of `float`.
128+
*/
129+
int8_flat,
130+
/**
131+
* The default index type for `float` vectors. This utilizes the HNSW algorithm in addition to automatically scalar
132+
* quantization for scalable approximate kNN search with `element_type` of `float`.
133+
*
134+
* This can reduce the memory footprint by 4x at the cost of some accuracy.
135+
*/
136+
int8_hnsw
137+
}

0 commit comments

Comments
 (0)