Skip to content

Speedup idea: quantization? #15

@lukasgarbas

Description

@lukasgarbas

There are three parts to our runtime:

  • Model downloads — can’t really optimize this; models need to be local before we can compute embeddings.
  • Embedding computation — this is where most time is spent. Might be worth trying quantization to speed up inference. If ranking quality holds up, it could be a useful addition.
  • Transferability metrics — already fast (just a few seconds per model), so no action needed.

Could be worth to try quantization for faster embedding time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions