Skip to content

feature: support batched embeddings #52

@planetf1

Description

@planetf1

We found that loading docs into maestro knowledge (milvus backend) took around 20-25s locally (custom_embed : ollama, nomic-text-embed) when we issued a single create_document with 7 documents (resulting in 420 chunks).

This is ok, though for reporting and to avoid very long api calls, it will be broken down to per-document (a few seconds each).

HOWEVER when testing the same code against a 'remote' model server in cloud (same model), we found calls to the embedding model took highly variable times - from fairly quick to 9s seconds:

Image

This resulted in the overall processing time taking around 14 minutes+ even for a small document.

The big factor is suspected to be latency on these many small requests. Therefore we should aim to batch embedding generation. Specifically for the milvus, custom embedding case

Additionally making the APIs async more generally would be beneficial to this scenario, though an initial change for batched embedding would be a start. See also #48

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions