Skip to content

Commit 58a5560

Browse files
authored
Merge pull request #503 from VipulMascarenhas/main
Add TEI deployment example for embeddings
2 parents 30613ab + af59317 commit 58a5560

File tree

3 files changed

+308
-0
lines changed

3 files changed

+308
-0
lines changed

LLM/embedding/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Deploy Embedding Models in OCI Data Science
2+
3+
OCI Data Science can be used to deploy embedding models. This page curates the links to some common use cases for such models.
4+
5+
[Deployment of embeddings using Text Embedding Inference](deploy-embedding-model-tei.md)
17.7 KB
Loading
Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
# Deploying embedding models with Text Embedding Inference on OCI MD
2+
3+
Hugging Face's [text-embeddings-inference](https://github.com/huggingface/text-embeddings-inference) (TEI) allows us to serve embedding models, and supports OpenAI spec for inference.
4+
TEI supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT model with Alibi positions and
5+
Mistral, Alibaba GTE and Qwen2 models with Rope positions. TEI also supports sequence classification and re-ranking models,
6+
but the scope of this example is limited to deploying embedding models.
7+
8+
9+
TEI can serve more than 10k embedding models available on Hugging Face Hub,
10+
with support for most popular models. Currently, users can deploy any embedding model on OCI Data Science platform
11+
supported by TEI. While TEI offers `/embed` endpoint as default method to get embeddings, the following example
12+
will use the OpenAI compatible route, i.e. `/v1/embeddings`. For more details, check the list of endpoints available
13+
[here](https://huggingface.github.io/text-embeddings-inference/#/).
14+
15+
16+
## Overview
17+
This guide demonstrates how to deploy and perform inference of embedding models with Oracle Data Science Service
18+
through a Bring Your Own Container (BYOC) approach. In this example, we will use a model downloaded from
19+
Hugging Face—specifically, `BAAI/bge-base-en-v1.5`, and the container is powered by Text Embedding Inference (TEI).
20+
21+
22+
23+
## Pre-Requisites
24+
To be able to run the example on this page, ensure you have access to Oracle Data Science notebook in your tenancy.
25+
26+
### Required IAM Policies
27+
28+
Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies)
29+
to grant access to OCI services.
30+
31+
### Install Desktop Container Management
32+
33+
This example requires a desktop tool to build, run, launch and push the containers. We support:
34+
35+
* [Docker Desktop](https://docs.docker.com/get-docker)
36+
* [Rancher Desktop](https://rancherdesktop.io/)
37+
38+
39+
## Prepare Inference Container
40+
TEI ships with multiple docker images that we can use to deploy an embedding model in OCI Data Science platform.
41+
For more details on images, visit the official Github repository section
42+
[here](https://github.com/huggingface/text-embeddings-inference/tree/main?tab=readme-ov-file#docker-images).
43+
Here, we show an example with one of the images. Run the following in the desktop when docker is installed.
44+
45+
```
46+
docker pull ghcr.io/huggingface/text-embeddings-inference:1.5.0
47+
```
48+
49+
Currently, OCI Data Science Model Deployment only supports container images residing in the OCI Registry.
50+
Before we can push the pulled TEI container, make sure you have created a repository in your tenancy.
51+
52+
* Go to your tenancy Container Registry
53+
* Click on the Create repository button
54+
* Select Private under Access types
55+
* Set a name for Repository name. We are using "text-embeddings-inference" in the example.
56+
* Click on Create button
57+
58+
You may need to docker login to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before in
59+
order to push the image. To login, you have to use your API Auth Token that can be created under your
60+
Oracle Cloud Account->Auth Token. You need to login only once. Replace with the OCI region you are using.
61+
62+
```
63+
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
64+
```
65+
66+
If your tenancy is federated with Oracle Identity Cloud Service, use the format /oracleidentitycloudservice/.
67+
You can then push the container image to the OCI Registry.
68+
69+
```
70+
docker tag ghcr.io/huggingface/text-embeddings-inference:1.5.0 -t <region>.ocir.io/<tenancy>/text-embeddings-inference:1.5.0
71+
docker push <region>.ocir.io/<tenancy>/text-embeddings-inference:1.5.0
72+
```
73+
74+
## Setup
75+
76+
Install dependencies in the notebook session. This is needed to prepare the artifacts, create a model
77+
and deploy it in OCI Data Science.
78+
79+
Run this in the terminal in a notebook session:
80+
```
81+
# install and/or update required python packages
82+
pip install oracle-ads oci huggingface_hub -U
83+
```
84+
85+
## Prepare the model artifacts
86+
87+
To prepare model artifacts for deployment:
88+
89+
* Download the model files from Hugging Face to local directory using a valid Hugging Face token (only needed for gated models). If you don't have a Hugging Face Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
90+
* Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Make a note of the namespace, compartment, and bucketname. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
91+
* Create model catalog entry for the model using the Object storage path.
92+
93+
## Downloading model from Hugging Face Hub
94+
95+
You can refer to [HuggingFace Hub documentation](https://huggingface.co/docs/hub/en/index) for details. Here, we'll deploy one of the most popular embedding models from the hub that is supported by TEI.
96+
97+
Run this in the terminal in a notebook session:
98+
```
99+
# Login to huggingface
100+
huggingface-cli login --token "<your-huggingface-token>"
101+
102+
# download the model to local folder. Here, BAAI/bge-base-en-v1.5 can be replaced by other models available on Hugging Face Hub.
103+
huggingface-cli download BAAI/bge-base-en-v1.5 --local-dir BAAI/bge-base-en-v1.5
104+
```
105+
106+
## Upload Model to OCI Object Storage
107+
108+
Once models are downloaded, use the terminal to upload the artifacts to object storage. Make sure that the bucket is versioned as mentioned above.
109+
110+
Run this in the terminal in a notebook session:
111+
```
112+
oci os object bulk-upload -bn <bucket> -ns <namespace> --auth resource_principal --prefix BAAI/bge-base-en-v1.5/ --src-dir BAAI/bge-base-en-v1.5/ --no-overwrite
113+
```
114+
115+
## Create Model by reference using ADS
116+
117+
Create a notebook using the default Python kernel with the Python library specified in the setup section.
118+
119+
We first set up the variables needed for creating and deploying the model.
120+
```
121+
import ads
122+
ads.set_auth("resource_principal")
123+
124+
# Extract region information from the Notebook environment variables and signer.
125+
region = ads.common.utils.extract_region()
126+
127+
hf_model_name = "BAAI/bge-base-en-v1.5"
128+
artifact_path = f"oci://<bucket>@<namespace>/{hf_model_name}"
129+
project_id="<project_ocid>"
130+
compartment_id = "<compartment_ocid>"
131+
132+
log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx"
133+
log_id = "cid1.log.oc1.xxx.xxxxx"
134+
135+
instance_shape = "VM.GPU.A10.1"
136+
container_image = "<region>.ocir.io/<tenancy>/text-embeddings-inference:1.5.0"
137+
```
138+
139+
Next, create a model catalog entry with the artifact in the object storage bucket where it was uploaded.
140+
```
141+
from ads.model.datascience_model import DataScienceModel
142+
143+
model = (DataScienceModel()
144+
.with_compartment_id(compartment_id)
145+
.with_project_id(project_id)
146+
.with_display_name(hf_model_name)
147+
.with_artifact(artifact_path)
148+
).create(model_by_reference=True)
149+
```
150+
151+
## Deploy embedding model
152+
153+
In order to deploy the model we just created, we set up the infrastructure and container runtime first.
154+
155+
### Import Model Deployment Modules
156+
157+
```
158+
from ads.model.deployment import (
159+
ModelDeployment,
160+
ModelDeploymentContainerRuntime,
161+
ModelDeploymentInfrastructure,
162+
ModelDeploymentMode,
163+
)
164+
```
165+
166+
### Setup Model Deployment Infrastructure
167+
168+
```
169+
infrastructure = (
170+
ModelDeploymentInfrastructure()
171+
.with_project_id(project_id)
172+
.with_compartment_id(compartment_id)
173+
.with_shape_name(instance_shape)
174+
.with_bandwidth_mbps(10)
175+
.with_replica(1)
176+
.with_web_concurrency(10)
177+
.with_access_log(
178+
log_group_id=log_group_id,
179+
log_id=log_id,
180+
)
181+
.with_predict_log(
182+
log_group_id=log_group_id,
183+
log_id=log_id,
184+
)
185+
)
186+
```
187+
188+
### Configure Model Deployment Runtime
189+
190+
We set the `MODEL_DEPLOY_PREDICT_ENDPOINT` endpoint environment variable with `/v1/embeddings` so that we can
191+
access the corresponding endpoint from the TEI container. One additional configuration we need to add is `cmd_var`, which
192+
specifies the location of artifacts that will be downloaded within model deployment. For models created by reference, the
193+
default artifact location is `/opt/ds/model/deployed_model/` and we need to append the object storage bucket prefix to this path.
194+
195+
```
196+
env_var = {
197+
'MODEL_DEPLOY_PREDICT_ENDPOINT': '/v1/embeddings'
198+
}
199+
# note that the model path inside the container will have the format /opt/ds/model/deployed_model/{artifact_path_prefix}
200+
cmd_var = ["--model-id", "/opt/ds/model/deployed_model/BAAI/bge-base-en-v1.5/", "--port", "8080", "--hostname", "0.0.0.0"]
201+
202+
container_runtime = (
203+
ModelDeploymentContainerRuntime()
204+
.with_image(container_image)
205+
.with_server_port(8080)
206+
.with_health_check_port(8080)
207+
.with_env(env_var)
208+
.with_cmd(cmd_var)
209+
.with_deployment_mode(ModelDeploymentMode.HTTPS)
210+
.with_model_uri(model.id)
211+
.with_region(region)
212+
)
213+
```
214+
215+
### Deploy Model Using Container Runtime
216+
217+
Once the infrastructure and runtime is configured, we can deploy the model.
218+
```
219+
deployment = (
220+
ModelDeployment()
221+
.with_display_name(f"{hf_model_name} with TEI docker container")
222+
.with_description(f"Deployment of {hf_model_name} MD with text-embeddings-inference:1.5.0 container")
223+
.with_infrastructure(infrastructure)
224+
.with_runtime(container_runtime)
225+
).deploy(wait_for_completion=False)
226+
```
227+
228+
## Inference
229+
230+
Once the model deployment has reached the Active state, we can invoke the model deployment endpoint to interact with the LLM.
231+
More details on different ways for accessing MD endpoints is documented [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md#inferencing-model).
232+
233+
```
234+
import requests
235+
236+
sentences = ["The car sped down the highway at an incredible speed.", "A vehicle raced along the freeway, moving very fast.", "The child was playing in the park with a ball."]
237+
endpoint = f"https://modeldeployment.{region}.oci.customer-oci.com/{deployment.model_deployment_id}/predict"
238+
239+
response = requests.post(
240+
endpoint,
241+
json={
242+
"input":sentences
243+
},
244+
auth=ads.common.auth.default_signer()["signer"],
245+
headers={},
246+
).json()
247+
```
248+
249+
The raw output (response) has an array of three lists with embedding for the above three sentences.
250+
251+
```
252+
{'object': 'list',
253+
'data': [{'object': 'embedding',
254+
'embedding': [-0.00735207,
255+
-0.045759525,
256+
0.061242294,
257+
0.013910536,
258+
0.048454784,
259+
-0.0059445454,
260+
0.007921069,
261+
0.029093834,
262+
0.04836494,
263+
...
264+
...
265+
...
266+
-0.005862751,
267+
0.055649005],
268+
'index': 2}],
269+
'model': '/opt/ds/model/deployed_model/BAAI/bge-base-en-v1.5/',
270+
'usage': {'prompt_tokens': 39, 'total_tokens': 39}}
271+
```
272+
273+
## Testing Embeddings generated by the model
274+
275+
Here, we have 3 sentences - two of which have similar meaning, and the third one is distinct. We'll run a simple test to
276+
find how similar or dissimilar these sentences are, using cosine similarity as a comparison metric.
277+
278+
```
279+
from sklearn.metrics.pairwise import cosine_similarity
280+
import matplotlib.pyplot as plt
281+
import seaborn as sns
282+
283+
embeddings = [sentence['embedding'] for sentence in response['data']]
284+
similarity_matrix = cosine_similarity(embeddings)
285+
286+
labels = [f"Sentence {i+1}" for i in range(len(sentences))]
287+
288+
# visualize the similarity matrix using a heatmap
289+
plt.figure(figsize=(6, 4))
290+
sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)
291+
292+
# add title and labels for better clarity
293+
plt.title('Cosine Similarity Heatmap of Sentences')
294+
plt.xticks(rotation=45, ha='right')
295+
plt.yticks(rotation=0)
296+
plt.tight_layout()
297+
plt.show()
298+
```
299+
300+
![image](cosine_similarity_embedding_with_tei.png)
301+
302+
303+
The above heatmap shows that the embedding model captures the semantic similarity between the first two sentences while distinguishing the third as different.

0 commit comments

Comments
 (0)