@@ -7,7 +7,7 @@ base_model:
77Recommended way to run this model:
88
99```sh
10- llama-server -hf {namespace}/{model_name}-GGUF --embedding --pooling none
10+ llama-server -hf {namespace}/{model_name}-GGUF
1111```
1212
1313Then the endpoint can be accessed at http://localhost:8080/embedding, for
@@ -16,21 +16,33 @@ example using `curl`:
1616curl --request POST \
1717 --url http://localhost:8080/embedding \
1818 --header "Content-Type: application/json" \
19- --data '{{"input": "Hello embeddings", "embd_normalize": -1 }}' \
19+ --data '{{"input": "Hello embeddings"}}' \
2020 --silent
2121```
2222
23- Alternatively, the `llama-embedding`command line tool can be used:
23+ Alternatively, the `llama-embedding` command line tool can be used:
2424```sh
25- llama-embedding -hf {namespace}/{model_name}-GGUF --pooling none --embd-normalize 2 -- verbose-prompt -p "Hello embeddings"
25+ llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
2626```
2727
2828#### embd_normalize
29- When a pooling method is specified the normalization can be controlled by the
30- `embd_normalize` parameter. The default value is `2` which means that the
31- embeddings are normalized using the Euclidean norm (L2). Other options are:
29+ When a model uses pooling, or the pooling method is specified using `--pooling`,
30+ the normalization can be controlled by the `embd_normalize` parameter.
31+
32+ The default value is `2` which means that the embeddings are normalized using
33+ the Euclidean norm (L2). Other options are:
3234* -1 No normalization
3335* 0 Max absolute
3436* 1 Taxicab
3537* 2 Euclidean/L2
3638* \>2 P-Norm
39+
40+ This can be passed in the request body to `llama-server`, for example:
41+ ```sh
42+ --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
43+ ```
44+
45+ And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
46+ ```sh
47+ llama-embedding -hf {namespace}/{model_name}-GGUF --embd-normalize -1 -p "Hello embeddings"
48+ ```
0 commit comments