-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
There are three parts to our runtime:
- Model downloads — can’t really optimize this; models need to be local before we can compute embeddings.
- Embedding computation — this is where most time is spent. Might be worth trying quantization to speed up inference. If ranking quality holds up, it could be a useful addition.
- Transferability metrics — already fast (just a few seconds per model), so no action needed.
Could be worth to try quantization for faster embedding time.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels