-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Description
When calling collection.add_items() with metadata containing Python dictionaries, the operation fails with a sqlite3.ProgrammingError because SQLite does not support storing dict objects.
Environment
- Python version: 3.13.11
- hnsqlite version: 0.2.4
- Operating System: Linux (arch)
Steps to Reproduce
from hnsqlite import Collection
collection = Collection(collection_name="test", sqlite_filename="test.db", dimension=300)
vecs = [np.random.rand(300) for _ in range(2)]
texts = [f"Text {i}" for i in range(2)]
# this fails
collection.add_items(
vectors=vecs,
texts=texts,
metadata=[{"key": 1}, {"key_two": 2}]
)Error Message
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Error binding parameter 5: type 'dict' is not supported
[SQL: INSERT INTO dbembedding (collection_id, doc_id, vector, text, meta, created_at) VALUES (?, ?, ?, ?, ?, ?) RETURNING id]
[parameters: (1, None, <memory at 0x7fc4fd7e0dc0>, 'Text 0', {'key': 1}, 1768741570.4978547)]
Expected Behavior
The metadata dictionaries should be automatically serialized to JSON strings before being stored in SQLite, as documented in the dbEmbedding.meta field description: "An optional json dictionary of metadata associated with the text. Can be sent in as a dictionary and will be converted to json for storage."
Root Cause
The dbEmbedding class has a @root_validator (lines 103-111) that should convert dict metadata to JSON strings. However, this validator is not triggered during SQLModel ORM operations in the add_items() method (line 433). SQLModel/SQLAlchemy bypasses Pydantic validators when creating instances for direct database insertion via session.add_all().
Suggested Fix
Manually convert metadata dictionaries to JSON strings in the add_items() method before creating dbEmbedding instances:
metadata_json = [dumps(m) if m is not None else None for m in metadata]
embeddings = [dbEmbedding(vector=v, text=t, doc_id=d, meta=m, collection_id=cid)
for v, t, d, m in zip(vector_bytes, texts, doc_ids, metadata_json)]This fix is pretty lazy, since it doesn't address the core of the issue, but it insures no breaking changes.
writen by human, but formated by Cloude Opus 4.5