CacheBackedEmbeddings
should update the store periodically (by computing embeddings in batches)
#18026
Closed
chrispy-snps
announced in
Ideas
Replies: 1 comment
-
I submitted #18070 to implement this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Currently,
CacheBackedEmbeddings
computes vectors for all uncached documents before updating the store. I propose that the store should be periodically updated by computing embeddings in batches.Motivation
I noticed this when I was trying this feature out on our 30k document set and the cache directory hadn't appeared on disk after 30 minutes.
The motivation is to minimize compute/data loss when problems occur:
Proposal (If applicable)
Compute and store the embedding vectors in batches. Provide a new
batch_size
configuration parameter toCacheBackedEmbeddings
instances.Beta Was this translation helpful? Give feedback.
All reactions