-
Notifications
You must be signed in to change notification settings - Fork 5
Description
We found that loading docs into maestro knowledge (milvus backend) took around 20-25s locally (custom_embed : ollama, nomic-text-embed) when we issued a single create_document with 7 documents (resulting in 420 chunks).
This is ok, though for reporting and to avoid very long api calls, it will be broken down to per-document (a few seconds each).
HOWEVER when testing the same code against a 'remote' model server in cloud (same model), we found calls to the embedding model took highly variable times - from fairly quick to 9s seconds:
This resulted in the overall processing time taking around 14 minutes+ even for a small document.
The big factor is suspected to be latency on these many small requests. Therefore we should aim to batch embedding generation. Specifically for the milvus, custom embedding case
Additionally making the APIs async more generally would be beneficial to this scenario, though an initial change for batched embedding would be a start. See also #48