You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CNDB-12937: Update jvector to 4.0.0-beta.2; add new graph construction parameters to index config (#1676)
### What is the issue
Fixes: riptano/cndb#12937
### What does this PR fix and why was it fixed
This pull request upgrades jvector from **4.0.0-beta.1** to
**4.0.0-beta.2** and introduces three new configuration options to
influence graph construction:
1. `neighborhood_overflow`
2. `alpha`
3. `enable_hierarchy`
The defaults for these hyperparameters vary between in memory and on
disk, but when these are configured, they will be uniformly applied to
graphs built by a memtable and by compaction.
**Details**
- **jvector 4.0.0-beta.2**
- Minor changes to the graph index architecture, including some code now
under `...disk.feature...` packages.
- Removed the old `CachingGraphIndex` code in Cassandra, which is no
longer used.
- New constructor arguments for controlling `neighborhood_overflow`,
`alpha`, and hierarchical levels in the HNSW graph.
- **New configuration options**
1. **`neighborhood_overflow`**: A `float` >= 1.0 controlling how
aggressively the graph tries to insert extra neighbors on each HNSW
layer.
2. **`alpha`**: A `float` > 0 used in the neighbor selection phase for
HNSW.
3. **`enable_hierarchy`**: A `boolean` indicating whether HNSW should
allow multiple layers (true) or a single-layer approximate graph
(false). Defaults to false.
I manually verified that the jvector upgrade is backwards compatible,
meaning that when we build using the new jvector version, we can read
with an old jvector version, so I did not create a new SAI on disk file
format version.
if (options.get(POSTING_LIST_LVL_MIN_LEAVES) != null || options.get(POSTING_LIST_LVL_SKIP_OPTION) != null)
176
221
{
177
222
if (TypeUtil.isLiteral(type))
@@ -213,16 +258,16 @@ else if (options.get(MAXIMUM_NODE_CONNECTIONS) != null ||
213
258
options.get(CONSTRUCTION_BEAM_WIDTH) != null ||
214
259
options.get(OPTIMIZE_FOR) != null ||
215
260
options.get(SIMILARITY_FUNCTION) != null ||
216
-
options.get(SOURCE_MODEL) != null)
261
+
options.get(SOURCE_MODEL) != null ||
262
+
options.get(NEIGHBORHOOD_OVERFLOW) != null ||
263
+
options.get(ALPHA) != null ||
264
+
options.get(ENABLE_HIERARCHY) != null)
217
265
{
218
266
if (!type.isVector())
219
267
thrownewInvalidRequestException(String.format("CQL type %s cannot have vector options", type.asCQL3Type()));
220
268
221
269
if (options.containsKey(MAXIMUM_NODE_CONNECTIONS))
222
270
{
223
-
if (!CassandraRelevantProperties.SAI_HNSW_ALLOW_CUSTOM_PARAMETERS.getBoolean())
224
-
thrownewInvalidRequestException(String.format("Maximum node connections cannot be set without enabling %s", CassandraRelevantProperties.SAI_HNSW_ALLOW_CUSTOM_PARAMETERS.name()));
@@ -237,9 +282,6 @@ else if (options.get(MAXIMUM_NODE_CONNECTIONS) != null ||
237
282
}
238
283
if (options.containsKey(CONSTRUCTION_BEAM_WIDTH))
239
284
{
240
-
if (!CassandraRelevantProperties.SAI_HNSW_ALLOW_CUSTOM_PARAMETERS.getBoolean())
241
-
thrownewInvalidRequestException(String.format("Construction beam width cannot be set without enabling %s", CassandraRelevantProperties.SAI_HNSW_ALLOW_CUSTOM_PARAMETERS.name()));
0 commit comments