Skip to content

Commit b514093

Browse files
committed
Update quick tour
1 parent 72dac20 commit b514093

File tree

1 file changed

+83
-19
lines changed

1 file changed

+83
-19
lines changed

docs/source/en/quick_tour.md

Lines changed: 83 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
22
33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
44
the License. You may obtain a copy of the License at
@@ -16,18 +16,18 @@ rendered properly in your Markdown viewer.
1616

1717
# Quick Tour
1818

19-
## Text Embeddings
19+
## Set up
2020

2121
The easiest way to get started with TEI is to use one of the official Docker containers
2222
(see [Supported models and hardware](supported_models) to choose the right container).
2323

24-
After making sure that your hardware is supported, install the
25-
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) if you
26-
plan on utilizing GPUs. NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
24+
Hence one needs to install Docker following their [installation instructions](https://docs.docker.com/get-docker/).
2725

28-
Next, install Docker following their [installation instructions](https://docs.docker.com/get-docker/).
26+
TEI supports inference both on GPU and CPU. If you plan on using a GPU, make sure to check that your hardware is supported. Next, install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
2927

30-
Finally, deploy your model. Let's say you want to use `BAAI/bge-large-en-v1.5`. Here's how you can do this:
28+
## Deploy
29+
30+
Next it's time to deploy your model. Let's say you want to use [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5). Here's how you can do this:
3131

3232
```shell
3333
model=BAAI/bge-large-en-v1.5
@@ -42,7 +42,13 @@ We also recommend sharing a volume with the Docker container (`volume=$PWD/data`
4242

4343
</Tip>
4444

45-
Once you have deployed a model, you can use the `embed` endpoint by sending requests:
45+
## Inference
46+
47+
Inference can be performed in 3 ways: using cURL, or via the `InferenceClient` or `OpenAI` Python SDKs.
48+
49+
#### cURL
50+
51+
To send a POST request to the TEI endpoint using cURL, you can run the following command:
4652

4753
```bash
4854
curl 127.0.0.1:8080/embed \
@@ -51,16 +57,53 @@ curl 127.0.0.1:8080/embed \
5157
-H 'Content-Type: application/json'
5258
```
5359

54-
## Re-rankers
60+
#### Python
61+
62+
To run inference using Python, you can either use the [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/en/index) Python SDK (recommended) or the `openai` Python SDK.
63+
64+
##### huggingface_hub
65+
66+
You can install it via pip as `pip install --upgrade --quiet huggingface_hub`, and then run:
67+
68+
```python
69+
from huggingface_hub import InferenceClient
70+
71+
client = InferenceClient()
72+
73+
embedding = client.feature_extraction("What is deep learning?",
74+
model="http://localhost:8080/embed")
75+
print(len(embedding[0]))
76+
```
77+
78+
#### OpenAI
79+
80+
You can install it via pip as `pip install --upgrade openai`, and then run:
81+
82+
```python
83+
import os
84+
from openai import OpenAI
85+
86+
client = OpenAI(base_url="http://localhost:8080/embed")
5587

56-
Re-rankers models are Sequence Classification cross-encoders models with a single class that scores the similarity
57-
between a query and a text.
88+
response = client.embeddings.create(
89+
model="tei",
90+
input="What is deep learning?"
91+
)
5892

59-
See [this blogpost](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83) by
93+
print(response)
94+
```
95+
96+
## Re-rankers and sequence classification
97+
98+
TEI also supports re-ranker and classic sequence classification models.
99+
100+
### Re-rankers
101+
102+
Rerankers, also called cross-encoders, are sequence classification models with a single class that score the similarity between a query and a text. See [this blogpost](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83) by
60103
the LlamaIndex team to understand how you can use re-rankers models in your RAG pipeline to improve
61104
downstream performance.
62105

63-
Let's say you want to use `BAAI/bge-reranker-large`:
106+
Let's say you want to use [`BAAI/bge-reranker-large`](https://huggingface.co/BAAI/bge-reranker-large). First, you can deploy it like so:
64107

65108
```shell
66109
model=BAAI/bge-reranker-large
@@ -69,8 +112,7 @@ volume=$PWD/data
69112
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
70113
```
71114

72-
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list
73-
of texts:
115+
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list of texts. With `cURL` this can be done like so:
74116

75117
```bash
76118
curl 127.0.0.1:8080/rerank \
@@ -79,9 +121,20 @@ curl 127.0.0.1:8080/rerank \
79121
-H 'Content-Type: application/json'
80122
```
81123

82-
## Sequence Classification
124+
Alternatively, one can perform inference using the `huggingface_hub` Python SDK. You can install it via pip as `pip install --upgrade --quiet huggingface_hub`, and then run:
125+
126+
```python
127+
from huggingface_hub import InferenceClient
128+
129+
client = InferenceClient()
130+
embedding = client.feature_extraction("What is deep learning?",
131+
model="http://localhost:8080/rerank")
132+
print(len(embedding[0]))
133+
```
134+
135+
### Sequence classification models
83136

84-
You can also use classic Sequence Classification models like `SamLowe/roberta-base-go_emotions`:
137+
You can also use classic Sequence Classification models like [`SamLowe/roberta-base-go_emotions`](https://huggingface.co/SamLowe/roberta-base-go_emotions):
85138

86139
```shell
87140
model=SamLowe/roberta-base-go_emotions
@@ -99,9 +152,20 @@ curl 127.0.0.1:8080/predict \
99152
-H 'Content-Type: application/json'
100153
```
101154

155+
Alternatively, one can perform inference using the `huggingface_hub` Python SDK. You can install it via pip as `pip install --upgrade --quiet huggingface_hub`, and then run:
156+
157+
```python
158+
from huggingface_hub import InferenceClient
159+
160+
client = InferenceClient()
161+
embedding = client.feature_extraction("What is deep learning?",
162+
model="http://localhost:8080/predict")
163+
print(len(embedding[0]))
164+
```
165+
102166
## Batching
103167

104-
You can send multiple inputs in a batch. For example, for embeddings
168+
You can send multiple inputs in a batch. For example, for embeddings:
105169

106170
```bash
107171
curl 127.0.0.1:8080/embed \
@@ -140,4 +204,4 @@ volume=$PWD
140204

141205
# Mount the models directory inside the container with a volume and set the model ID
142206
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/gte-base-en-v1.5
143-
```
207+
```

0 commit comments

Comments
 (0)