update doc

VipulMascarenhas · VipulMascarenhas · commit 178430f36a53 · 2024-10-17T11:42:48.000-07:00
diff --git a/LLM/embedding/deploy-embedding-model-tei.md b/LLM/embedding/deploy-embedding-model-tei.md
@@ -12,6 +12,14 @@ supported by TEI. While TEI offers `/embed` endpoint as default method to get em
 will use the OpenAI compatible route, i.e. `/v1/embeddings`. For more details, check the list of endpoints available 
 [here](https://huggingface.github.io/text-embeddings-inference/#/). 
 
+
+## Overview
+This guide demonstrates how to deploy and perform inference of embedding models with Oracle Data Science Service 
+through a Bring Your Own Container (BYOC) approach. In this example, we will use a model downloaded from 
+Hugging Face—specifically, `BAAI/bge-base-en-v1.5` from Meta, and the container is powered by Text Embedding Inference (TEI).
+
+
+
 ## Pre-Requisites
 To be able to run the example on this page, ensure you have access to Oracle Data Science notebook in your tenancy. 
 
@@ -28,7 +36,7 @@ This example requires a desktop tool to build, run, launch and push the containe
 * [Rancher Desktop](https://rancherdesktop.io/)
 
 
-# Prepare Inference Container
+## Prepare Inference Container
 TEI ships with multiple docker images that we can use to deploy an embedding model in OCI Data Science platform. 
 For more details on images, visit the official Github repository section 
 [here](https://github.com/huggingface/text-embeddings-inference/tree/main?tab=readme-ov-file#docker-images). 
@@ -63,7 +71,7 @@ docker tag ghcr.io/huggingface/text-embeddings-inference:1.5.0 -t <region>.ocir.
 docker push <region>.ocir.io/<tenancy>/text-embeddings-inference:1.5.0
 ```
 
-# Setup
+## Setup
 
 Install dependencies in the notebook session. This is needed to prepare the artifacts, create a model
 and deploy it in OCI Data Science.
@@ -74,7 +82,7 @@ Run this in the terminal in a notebook session:
  pip install oracle-ads oci huggingface_hub -U
 ```
 
-# Prepare the model artifacts
+## Prepare the model artifacts
 
 To prepare model artifacts for deployment:
 
@@ -104,7 +112,7 @@ Run this in the terminal in a notebook session:
 oci os object bulk-upload -bn <bucket> -ns <namespace> --auth resource_principal --prefix BAAI/bge-base-en-v1.5/ --src-dir BAAI/bge-base-en-v1.5/ --no-overwrite
 ```
 
-# Create Model by reference using ADS
+## Create Model by reference using ADS
 
 Create a notebook with the default python kernel where the python library in the setup section. 
 
@@ -140,11 +148,11 @@ model = (DataScienceModel()
         ).create(model_by_reference=True)
 ```
 
-# Deploy embedding model
+## Deploy embedding model
 
 In order to deploy the model we just created, we set up the infrastructure and container runtime first.
 
-## Import Model Deployment Modules
+### Import Model Deployment Modules
 
 ```
 from ads.model.deployment import (
@@ -155,7 +163,7 @@ from ads.model.deployment import (
 )
 ```
 
-## Setup Model Deployment Infrastructure
+### Setup Model Deployment Infrastructure
 
 ```
 infrastructure = (
@@ -177,7 +185,7 @@ infrastructure = (
 )
 ```
 
-## Configure Model Deployment Runtime
+### Configure Model Deployment Runtime
 
 We set the `MODEL_DEPLOY_PREDICT_ENDPOINT` endpoint environment variable with `/v1/embeddings` so that we can 
 access the corresponding endpoint from the TEI container. One additional configuration we need to add is `cmd_var`, which
@@ -204,7 +212,7 @@ container_runtime = (
 )
 ```
 
-## Deploy Model Using Container Runtime
+### Deploy Model Using Container Runtime
 
 Once the infrastructure and runtime is configured, we can deploy the model. 
 ```
@@ -217,7 +225,7 @@ deployment = (
 ).deploy(wait_for_completion=False)
 ```
 
-# Inference
+## Inference
 
 Once the model deployment has reached the Active state, we can invoke the model deployment endpoint to interact with the LLM. 
 More details on different ways for accessing MD endpoints is documented [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md#inferencing-model).
@@ -262,7 +270,7 @@ The raw output (response) has an array of three lists with embedding for the abo
  'usage': {'prompt_tokens': 39, 'total_tokens': 39}}
 ```
 
-# Testing Embeddings generated by the model
+## Testing Embeddings generated by the model
 
 Here, we have 3 sentences - two of which have similar meaning, and the third one is distinct. We'll run a simple test to 
 find how similar or dissimilar these sentences are, using cosine similarity as a comparison metric.