Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ default-members = [
resolver = "2"

[workspace.package]
version = "1.6.1"
version = "1.7.0"
edition = "2021"
authors = ["Olivier Dehaene", "Nicolas Patry", "Alvaro Bartolome"]
homepage = "https://github.com/huggingface/text-embeddings-inference"
Expand Down
2 changes: 1 addition & 1 deletion docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"name": "Apache 2.0",
"url": "https://www.apache.org/licenses/LICENSE-2.0"
},
"version": "1.6.0"
"version": "1.7.0"
},
"paths": {
"/decode": {
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/private_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,5 @@ model=<your private model>
volume=$PWD/data
token=<your cli Hugging Face Hub token>

docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
```
8 changes: 4 additions & 4 deletions docs/source/en/quick_tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Finally, deploy your model. Let's say you want to use `BAAI/bge-large-en-v1.5`.
model=BAAI/bge-large-en-v1.5
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
```

<Tip>
Expand Down Expand Up @@ -66,7 +66,7 @@ Let's say you want to use `BAAI/bge-reranker-large`:
model=BAAI/bge-reranker-large
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
```

Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list
Expand All @@ -87,7 +87,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
model=SamLowe/roberta-base-go_emotions
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
```

Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
Expand Down Expand Up @@ -139,5 +139,5 @@ git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
volume=$PWD

# Mount the models directory inside the container with a volume and set the model ID
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id /data/gte-base-en-v1.5
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/gte-base-en-v1.5
```
12 changes: 6 additions & 6 deletions docs/source/en/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,13 @@ Find the appropriate Docker image for your hardware in the following table:

| Architecture | Image |
|-------------------------------------|--------------------------------------------------------------------------|
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 |
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 |
| Volta | NOT SUPPORTED |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.6 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.6 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.6 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.6 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.6 (experimental) |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.7 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.7 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.7 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.7 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.7 (experimental) |

**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.
Loading