Speedup idea: quantization?

There are three parts to our runtime:

- Model downloads — can’t really optimize this; models need to be local before we can compute embeddings.
- Embedding computation — this is where most time is spent. Might be worth trying quantization to speed up inference. If ranking quality holds up, it could be a useful addition.
- Transferability metrics — already fast (just a few seconds per model), so no action needed.

Could be worth to try quantization for faster embedding time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup idea: quantization? #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speedup idea: quantization? #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions