Serve UAE embed model in OpenAI compatible server? #1045
Replies: 3 comments
-
|  | 
Beta Was this translation helpful? Give feedback.
-
| ok looks like  So... what are y'all using for embeddings? You just pick a random 7b? Bert models top the MTEB chart. | 
Beta Was this translation helpful? Give feedback.
-
| I found ggml-org/llama.cpp#2872 which linked me to https://github.com/xyzhang626/embeddings.cpp Maybe I am pitching embeddings.cpp bindings in this project at least until llama.cpp gets official bert support from that first thread. | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was excited to see we can now serve multiple models from a single instance of the OpenAI compatible endpoint! I was excited to try to serve a dedicated embedding model so I could keep my embeddings consistent while swapping out for an arbitrary completion model.
How can I serve this model? I tried to convert it to gguf but got an error I'll share later. Maybe this is a llama CPP question...
https://huggingface.co/WhereIsAI/UAE-Large-V1
Beta Was this translation helpful? Give feedback.
All reactions