feature: support batched embeddings

We found that loading docs into maestro knowledge (milvus backend) took around 20-25s locally (custom_embed : ollama, nomic-text-embed) when we issued a single create_document with 7 documents (resulting in 420 chunks). 

This is ok, though for reporting and to avoid very long api calls, it will be broken down to per-document (a few seconds each).

HOWEVER when testing the same code against a 'remote' model server in cloud (same model), we found calls to the embedding model took highly variable times - from fairly quick to 9s seconds:

<img width="696" height="391" alt="Image" src="https://github.com/user-attachments/assets/ff788056-938e-4b04-9c30-8b6e925d7390" />

This resulted in the overall processing time taking around 14 minutes+ even for a small document.

The big factor is suspected to be latency on these many small requests. Therefore we should aim to batch embedding generation. Specifically for the milvus, custom embedding case

Additionally making the APIs async more generally would be beneficial to this scenario, though an initial change for batched embedding would be a start. See also #48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: support batched embeddings #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature: support batched embeddings #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions