-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Issue Description
Some embedding models require prefixes for using them, particularly Nomic (which is one of the models on the embedding model list in aichat). For other models it is recommended to use them. However, when setting Nomic as the embedder, no such prefixes are sent, at least according to the --loglevel debug output. It seems that only the chunks are sent for embedding, and only the query for requests later, both without the prefix.
Nomic expects:
"search_document:" as prefix during the creation of embedding vectors, and
"search_query:" during the retrieval process
It also supports "clustering" and "classification" as prefixes.
Not sure how critical this is, but in the documentation on huggingface (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5), it says:
Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.
For example, if you are implementing a RAG application, you embed your documents as search_document: and embed your user queries as search_query: .
I classified this as a bug based on that description, but it is certainly not a critically breaking one...