-
Notifications
You must be signed in to change notification settings - Fork 380
Bug: DROP_VECTOR_INDEX leaves corrupted metadata preventing property updates #6040
Description
Bug Report: DROP_VECTOR_INDEX leaves corrupted metadata preventing property updates
Environment
- Kuzu Version: 0.11.2
- Storage Version: 39
- API: REST API (kuzudb/api-server:latest)
- Vector Extension: Loaded via
LOAD EXTENSION VECTOR
Summary
After creating and dropping a vector index on a property, that property becomes permanently un-updatable via SET or MERGE operations. The error persists even after:
- Dropping the vector index (confirmed via
SHOW_INDEXES()) - Restarting the database
- No indexes remaining in the catalog
Expected Behavior
After DROP_VECTOR_INDEX successfully removes a vector index, the property should be updatable again via SET or MERGE operations.
Actual Behavior
The property remains permanently un-updatable with error:
Runtime exception: Cannot set property vec in table embeddings because it is used in one or more indexes. Try delete and then insert.
Note: No table named "embeddings" exists in the catalog, suggesting this is an internal metadata structure.
Impact
This bug makes vector indexes incompatible with mutable data:
- Cannot update entity properties that have embeddings
- Breaks MERGE-based upsert workflows
- Makes incremental data updates impractical
Root Cause Analysis
After deeper investigation into the source code, I suspect the bug is in NodeTable::dropIndex():
File: src/storage/store/node_table.cpp
void NodeTable::addIndex(std::unique_ptr<Index> index) {
// ...
indexes.push_back(IndexHolder{std::move(index)});
hasChanges = true; // ← Marks table for checkpoint
}
void NodeTable::dropIndex(const std::string& name) {
KU_ASSERT(getIndex(name) != nullptr);
for (auto it = indexes.begin(); it != indexes.end(); ++it) {
if (StringUtils::caseInsensitiveEquals(it->getName(), name)) {
KU_ASSERT(it->isLoaded());
indexes.erase(it);
return; // ← BUG: Missing hasChanges = true!
}
}
}What happens:
DROP_VECTOR_INDEXcallsCatalog::dropIndex()→ removes from catalog ✓DROP_VECTOR_INDEXcallsNodeTable::dropIndex()→ removes from in-memoryindexesvector ✓- BUT
NodeTable::dropIndex()doesn't sethasChanges = true - On checkpoint, the table's serialization includes the OLD indexes (from disk)
- On restart/reload,
NodeTable::deserialize()loads the persisted indexes vector which still contains the dropped index
File: src/storage/store/node_table.cpp
void NodeTable::serialize(Serializer& serializer) const {
nodeGroups->serialize(serializer);
serializer.write<uint64_t>(indexes.size()); // ← Serializes indexes vector
for (auto i = 0u; i < indexes.size(); ++i) {
indexes[i].serialize(serializer);
}
}
void NodeTable::deserialize(...) {
// ...
indexes.clear();
indexes.reserve(indexInfos.size());
for (auto i = 0u; i < indexInfos.size(); ++i) {
indexes.push_back(IndexHolder(...)); // ← Reloads old indexes from disk
}
}Why updates fail:
When updating a property, NodeTable::initUpdateState() iterates through the indexes vector and calls each index's initUpdateState():
void NodeTable::initUpdateState(...) {
for (auto i = 0u; i < indexes.size(); i++) {
auto& indexHolder = indexes[i];
auto index = indexHolder.getIndex();
if (index->isBuiltOnColumn(nodeUpdateState.columnID)) {
// Calls HNSWIndex::initUpdateState() which throws error
nodeUpdateState.indexUpdateState[i] = index->initUpdateState(...);
}
}
}The HNSW index's initUpdateState() is hardcoded to throw:
File: extension/vector/src/include/index/hnsw_index.h
std::unique_ptr<UpdateState> initUpdateState(...) override {
throw common::RuntimeException{
"Cannot set property vec in table embeddings because it is "
"used in one or more indexes. Try delete and then insert."
};
}The Fix
Add hasChanges = true in NodeTable::dropIndex():
void NodeTable::dropIndex(const std::string& name) {
KU_ASSERT(getIndex(name) != nullptr);
for (auto it = indexes.begin(); it != indexes.end(); ++it) {
if (StringUtils::caseInsensitiveEquals(it->getName(), name)) {
KU_ASSERT(it->isLoaded());
indexes.erase(it);
hasChanges = true; // ← Add this line
return;
}
}
}This ensures the NodeTable gets checkpointed with the updated indexes vector, and on restart/reload, the dropped index won't be in the collection.
Are there known steps to reproduce?
Minimal Reproducible Example
-- Step 1: Create a node table with vector property
CREATE NODE TABLE Person(id STRING, name STRING, embeddings FLOAT[128], PRIMARY KEY(id));
-- Step 2: Insert a test node
CREATE (p:Person {id: "person-1", name: "Alice", embeddings: [0.1, 0.2, 0.3, ..., 0.128]});
-- Step 3: Create vector index on embeddings property
CALL CREATE_VECTOR_INDEX(
'Person',
'person_embeddings_idx',
'embeddings',
mu := 30,
ml := 60,
pu := 0.05,
metric := 'cosine',
efc := 200,
cache_embeddings := true
);
-- Step 4: Verify index was created
CALL SHOW_INDEXES() RETURN *;
-- Output: Shows person_embeddings_idx on Person.embeddings
-- Step 5: Drop the vector index
CALL DROP_VECTOR_INDEX('Person', 'person_embeddings_idx');
-- Output: "Table _X_person_embeddings_idx_LOWER has been dropped."
-- Step 6: Verify index was removed
CALL SHOW_INDEXES() RETURN *;
-- Output: Empty result (no indexes remain)
-- Step 7: Try to update the embeddings property
MATCH (p:Person {id: "person-1"})
SET p.embeddings = [0.2, 0.3, 0.4, ..., 0.129]
RETURN p.id;
-- ERROR: Runtime exception: Cannot set property vec in table embeddings because it is used in one or more indexes. Try delete and then insert.Additional Observations
The problem is table-specific
-- Create another table that never had a vector index
CREATE NODE TABLE Organization(id STRING, name STRING, embeddings FLOAT[128], PRIMARY KEY(id));
CREATE (o:Organization {id: "org-1", name: "Company", embeddings: [0.1, 0.2, ...]});
-- This works fine (no index was ever created)
MATCH (o:Organization {id: "org-1"})
SET o.embeddings = [0.2, 0.3, ...]
RETURN o.id;
-- Success!
-- But Person table (which had an index) remains broken
MATCH (p:Person {id: "person-1"})
SET p.embeddings = [0.2, 0.3, ...]
RETURN p.id;
-- Still fails with same errorOther properties are unaffected
-- Non-indexed properties on the same table work fine
MATCH (p:Person {id: "person-1"})
SET p.name = "Updated Name"
RETURN p.name;
-- Success!The issue persists across database restarts
After restarting the Kuzu server, the same error occurs, indicating the corrupted metadata is persisted to disk.