Skip to content

Commit 26969c1

Browse files
committed
Updated Readme and Makefile to run gradio app on Container Instance
1 parent 397c0ce commit 26969c1

File tree

2 files changed

+63
-3
lines changed

2 files changed

+63
-3
lines changed

model-deployment/containers/llm/mistral/Makefile

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@ CONTAINER_REGISTRY:=${REGION_KEY}.ocir.io
44
TGI_INFERENCE_IMAGE:=${CONTAINER_REGISTRY}/${TENANCY}/text-generation-interface-odsc:0.9.3
55
TGI_CONTAINER_NAME:=tgi-odsc
66

7-
VLLM_INFERENCE_IMAGE:=${CONTAINER_REGISTRY}/${TENANCY}/vllm-odsc:0.1.4
7+
VLLM_INFERENCE_IMAGE:=${CONTAINER_REGISTRY}/${TENANCY}/vllm-odsc:0.2.0
88
VLLM_CONTAINER_NAME:=vllm-odsc
99

10+
GRADIO_IMAGE:=${CONTAINER_REGISTRY}/${TENANCY}/gradio-odsc:0.1.0
11+
GRADIO_CONTAINER_NAME:=gradio-odsc
12+
1013
MODEL_DIR:=${PWD}/hfdata
1114
TARGET_DIR:=/home/datascience
1215
HF_DIR=/home/datascience/.cache
@@ -19,7 +22,19 @@ params:="--max-batch-prefill-tokens 1024"
1922
local_model:=/opt/ds/model/deployed_model
2023
tensor_parallelism:=1
2124

25+
VLLM:=1
26+
API_SPEC:=openai
27+
28+
IAM_TYPE:=security_token
29+
IAM_PROFILE:=custboat
30+
2231
build.app:
2332
docker build --network host -t ${GRADIO_IMAGE} -f Dockerfile.gradio .
33+
run.app.vllm:
34+
docker run --rm --network host -e OCI_IAM_TYPE=${IAM_TYPE} -e OCI_CONFIG_PROFILE=${IAM_PROFILE} -e MODEL=${model} -e VLLM=${VLLM} -e API_SPEC=${API_SPEC} --name ${GRADIO_CONTAINER_NAME} ${GRADIO_IMAGE}
35+
run.app.tgi:
36+
docker run --rm --network host -e OCI_IAM_TYPE=${IAM_TYPE} -e OCI_CONFIG_PROFILE=${IAM_PROFILE} -e MODEL=${model} --name ${GRADIO_CONTAINER_NAME} ${GRADIO_IMAGE}
2437
app:
25-
MODEL=${model} gradio app.py
38+
MODEL=${model} gradio app.py
39+
push.app:
40+
docker push ${GRADIO_IMAGE}

model-deployment/containers/llm/mistral/README.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --requ
209209
210210
## Inference
211211
212+
### Local Inference
213+
212214
* Once the model is deployed and shown as `Active` you can execute inference against it, the easier way to do it would be to use the integrated `Gradio` application in this example
213215
* Go to the model you've just deployed and click on it
214216
* Under the left side under `Resources` select `Invoking your model`
@@ -260,6 +262,49 @@ oci raw-request --http-method POST --target-uri https://<MD_OCID>/predict --requ
260262
"top_p":0.8}'
261263
```
262264
265+
### Using OCI Container Instance
266+
267+
* Once you have tested the inference locally. You can build Gradio container by running:
268+
```bash
269+
make build.app
270+
```
271+
* Before we can push the newly build container make sure that you've created the `gradio-odsc` repository in your tenancy.
272+
* Go to your tenancy [Container Registry](https://cloud.oracle.com/compute/registry/containers)
273+
* Click on the `Create repository` button
274+
* Select `Private` under Access types
275+
* Set `gradio-odsc` as a `Repository name`
276+
* Click on `Create` button
277+
278+
* You may need to `docker login` to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before been able to push the image. To login you have to use your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) that can be created under your `Oracle Cloud Account->Auth Token`. You need to login only once.
279+
280+
```bash
281+
docker login -u '<tenant-namespace>/<username>' <region>.ocir.io
282+
```
283+
284+
If `your tenancy` is **federated** with Oracle Identity Cloud Service, use the format `<tenancy-namespace>/oracleidentitycloudservice/<username>`
285+
286+
* Push the container image to the OCIR
287+
288+
```bash
289+
make push.app
290+
```
291+
* To run a Container Instance go to [Container Instance](https://console.us-ashburn-1.oraclecloud.com/container-instances)
292+
* Click on the `Create container instance`
293+
* Select the compartment in `Create in compartment` option
294+
* Leave `Placement` and `Shape` as default option
295+
* Within `Network` select your `Virtual cloud network` and `Subnet`
296+
* Click on `Next` at the bottom, it redirects to `Configure containers` page
297+
* Select the OCIR repository and image we pushed earlier under `Image`
298+
* Provide custom ENV variable for Gradio app in `Environmental variables` section
299+
* Key: `PORT`, Value: `5000` (Port at which you want to run the app)\
300+
* Key: `MODEL`, Value: `mistralai/Mistral-7B-Instruct-v0.1`
301+
* Key: `OCI_IAM_TYPE`, Value: `resource_principal`
302+
* Key: `VLLM`, Value: `1` (For VLLM Inference end-point)
303+
* Key: `API_SPEC` Value: `openai` (For VLLM OpenAI Inference end-point)
304+
* Click on `Next` at the bottom to review the configuration and then click `Create`
305+
306+
* Once container is up you should be able to open the application now on `http://<Private IP address>:<PORT>/` and use start chatting against the deployed model on OCI Data Science Service.
307+
263308
## Deploying using ADS
264309
265310
Instead of using the console, you can also deploy using the ADS from your local machine. Make sure that you've also created and setup your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) to execute the commands below.
@@ -295,7 +340,7 @@ Following are identified as the most probable failure cases while deploying larg
295340
#### Reason
296341
Insufficient model deployment timeout.
297342
298-
#### Symptoms
343+
#### Symptom
299344
The Work Request logs will show the following error:
300345
Workflow timed out. Maximum runtime was: <deployment_timeout> minutes.
301346

0 commit comments

Comments
 (0)