Skip to content

[Bug] SQLite Error When Adding Items with Metadata Dictionaries #13

@kn0kh

Description

@kn0kh

Description

When calling collection.add_items() with metadata containing Python dictionaries, the operation fails with a sqlite3.ProgrammingError because SQLite does not support storing dict objects.

Environment

  • Python version: 3.13.11
  • hnsqlite version: 0.2.4
  • Operating System: Linux (arch)

Steps to Reproduce

from hnsqlite import Collection

collection = Collection(collection_name="test", sqlite_filename="test.db", dimension=300)

vecs = [np.random.rand(300) for _ in range(2)]
texts = [f"Text {i}" for i in range(2)]

# this fails 
collection.add_items(
    vectors=vecs, 
    texts=texts, 
    metadata=[{"key": 1}, {"key_two": 2}]
)

Error Message

sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Error binding parameter 5: type 'dict' is not supported
[SQL: INSERT INTO dbembedding (collection_id, doc_id, vector, text, meta, created_at) VALUES (?, ?, ?, ?, ?, ?) RETURNING id]
[parameters: (1, None, <memory at 0x7fc4fd7e0dc0>, 'Text 0', {'key': 1}, 1768741570.4978547)]

Expected Behavior

The metadata dictionaries should be automatically serialized to JSON strings before being stored in SQLite, as documented in the dbEmbedding.meta field description: "An optional json dictionary of metadata associated with the text. Can be sent in as a dictionary and will be converted to json for storage."

Root Cause

The dbEmbedding class has a @root_validator (lines 103-111) that should convert dict metadata to JSON strings. However, this validator is not triggered during SQLModel ORM operations in the add_items() method (line 433). SQLModel/SQLAlchemy bypasses Pydantic validators when creating instances for direct database insertion via session.add_all().

Suggested Fix

Manually convert metadata dictionaries to JSON strings in the add_items() method before creating dbEmbedding instances:

metadata_json = [dumps(m) if m is not None else None for m in metadata]
embeddings = [dbEmbedding(vector=v, text=t, doc_id=d, meta=m, collection_id=cid) 
              for v, t, d, m in zip(vector_bytes, texts, doc_ids, metadata_json)]

This fix is pretty lazy, since it doesn't address the core of the issue, but it insures no breaking changes.

writen by human, but formated by Cloude Opus 4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions