|
| 1 | +# Overview |
| 2 | + |
| 3 | +This repo provides two approaches to manage the inference server for LLM deployment in OCI Data Science: |
| 4 | + |
| 5 | +* [Text Generation Inference](https://github.com/huggingface/text-generation-inference) from HuggingFace. |
| 6 | +* [vLLM](https://github.com/vllm-project/vllm) developed at UC Berkeley |
| 7 | + |
| 8 | +## Prerequisites |
| 9 | + |
| 10 | +* This is Limited Available feature. Please reach out to us via email `[email protected]` to ask to be allowlisted for this LA feature. |
| 11 | +* Configure your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) to be able to run and test your code locally. |
| 12 | +* Install [Docker](https://docs.docker.com/get-docker) or [Rancher Desktop](https://rancherdesktop.io/) as docker alternative. |
| 13 | + |
| 14 | +## Required IAM Policies |
| 15 | + |
| 16 | +Public [documentation](https://docs.oracle.com/en-us/iaas/data-science/using/policies.htm). |
| 17 | + |
| 18 | +### Bring your own container [policies](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth__access-logging-service#model_dep_policies_auth__access-custom-container) |
| 19 | +`ALL { resource.type = 'datasciencemodeldeployment' }` |
| 20 | + |
| 21 | +`allow dynamic-group <dynamic-group-name> to read repos in compartment <compartment-name> where ANY {request.operation='ReadDockerRepositoryMetadata',request.operation='ReadDockerRepositoryManifest',request.operation='PullDockerLayer' }` |
| 22 | + |
| 23 | +#### If the repository is in the root compartment, allow read for the tenancy |
| 24 | + |
| 25 | +`allow dynamic-group <dynamic-group-name> to read repos in tenancy where ANY { |
| 26 | + request.operation='ReadDockerRepositoryMetadata', |
| 27 | + request.operation='ReadDockerRepositoryManifest', |
| 28 | + request.operation='PullDockerLayer' |
| 29 | +}` |
| 30 | + |
| 31 | +#### For user level policies |
| 32 | + |
| 33 | +`allow any-user to read repos in tenancy where ALL { request.principal.type = 'datasciencemodeldeployment' }` |
| 34 | + |
| 35 | +`allow any-user to read repos in compartment <compartment-name> where ALL { request.principal.type = 'datasciencemodeldeployment'}` |
| 36 | + |
| 37 | +For all other Data Science policies, please refer these [details](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/distributed_training/README.md#3-oci-policies). |
| 38 | + |
| 39 | +## Build TGI Container |
| 40 | +To construct the required containers for this deployment and retain the necessary information, please complete the following steps: |
| 41 | + |
| 42 | +* Checkout this repository |
| 43 | +* Enter the path `model-deployment/containers/llm/inference-images` |
| 44 | + |
| 45 | + ```bash |
| 46 | + cd model-deployment/containers/llm/inference-images |
| 47 | + ``` |
| 48 | +* This example uses [OCI Container Registry](https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm) to store the container image required for the deployment. For the `Makefile` to execute the container build and push process to Oracle Cloud Container Registry, you have to setup in your local terminal the `TENANCY_NAME` and `REGION_KEY` environment variables.`TENANCY_NAME` is the name of your tenancy, which you can find under your [account settings](https://cloud.oracle.com/tenancy) and the `REGION_KEY` is a 3 letter name of your tenancy region, you consider to use for this example, for example IAD for Ashburn, or FRA for Frankfurt. You can find the region keys in our public documentation for [Regions and Availability Domains](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm) |
| 49 | + |
| 50 | + ```bash |
| 51 | + export TENANCY_NAME=<your-tenancy-name> |
| 52 | + export REGION_KEY=<region-key> |
| 53 | + ``` |
| 54 | + |
| 55 | +You can find the official documentation about OCI Data Science Model Deployment: [https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_create.htm] |
| 56 | + |
| 57 | +* Build the TGI container image, this step would take awhile |
| 58 | + |
| 59 | + ```bash |
| 60 | + make build.tgi |
| 61 | + ``` |
| 62 | + |
| 63 | +* Before we can push the newly build container make sure that you've created the `text-generation-interface-odsc` repository in your tenancy. |
| 64 | + * Go to your tenancy [Container Registry](https://cloud.oracle.com/compute/registry/containers) |
| 65 | + * Click on the `Create repository` button |
| 66 | + * Select `Private` under Access types |
| 67 | + * Set `text-generation-interface-odsc` as a `Repository name` |
| 68 | + * Click on `Create` button |
| 69 | +
|
| 70 | +* You may need to `docker login` to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before been able to push the image. To login you have to use your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) that can be created under your `Oracle Cloud Account->Auth Token`. You need to login only once. |
| 71 | + |
| 72 | + ```bash |
| 73 | + docker login -u '<tenant-namespace>/<username>' <region>.ocir.io |
| 74 | + ``` |
| 75 | + |
| 76 | + If `your tenancy` is **federated** with Oracle Identity Cloud Service, use the format `<tenancy-namespace>/oracleidentitycloudservice/<username>` |
| 77 | + |
| 78 | +* Push the container image to the OCIR |
| 79 | + |
| 80 | + ```bash |
| 81 | + make push.tgi |
| 82 | + ``` |
| 83 | + |
| 84 | +## Build vLLM Container |
| 85 | + |
| 86 | +You can find the official documentation about OCI Data Science Model Deployment: [https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_create.htm] |
| 87 | + |
| 88 | +* Build the vLLM container image, this step would take awhile |
| 89 | + |
| 90 | + ```bash |
| 91 | + make build.vllm |
| 92 | + ``` |
| 93 | + |
| 94 | +* Before we can push the newly build container make sure that you've created the `vllm-odsc` repository in your tenancy. |
| 95 | + * Go to your tenancy [Container Registry](https://cloud.oracle.com/compute/registry/containers) |
| 96 | + * Click on the `Create repository` button |
| 97 | + * Select `Private` under Access types |
| 98 | + * Set `vllm-odsc` as a `Repository name` |
| 99 | + * Click on `Create` button |
| 100 | +
|
| 101 | +* You may need to `docker login` to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before been able to push the image. To login you have to use your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) that can be created under your `Oracle Cloud Account->Auth Token`. You need to login only once. |
| 102 | + |
| 103 | + ```bash |
| 104 | + docker login -u '<tenant-namespace>/<username>' <region>.ocir.io |
| 105 | + ``` |
| 106 | + |
| 107 | + If `your tenancy` is **federated** with Oracle Identity Cloud Service, use the format `<tenancy-namespace>/oracleidentitycloudservice/<username>` |
| 108 | + |
| 109 | +* Push the container image to the OCIR |
| 110 | + |
| 111 | + ```bash |
| 112 | + make push.vllm |
| 113 | + ``` |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +### Advanced debugging options: Code debugging inside the container using job |
| 118 | +For more detailed level of debugging, user can refer [README-DEBUG.md](./README-DEBUG.md). |
| 119 | + |
| 120 | +## Additional Make Commands |
| 121 | + |
| 122 | +### TGI containers |
| 123 | + |
| 124 | +`make build.tgi` to build the container |
| 125 | + |
| 126 | +`make run.tgi` to run the container |
| 127 | + |
| 128 | +`make shell.tgi` to launch container with shell prompt |
| 129 | + |
| 130 | +`make stop.tgi` to stop the running container |
| 131 | + |
| 132 | +### vLLM containers |
| 133 | + |
| 134 | +`make build.vllm` to build the container |
| 135 | + |
| 136 | +`make run.vllm` to run the container |
| 137 | + |
| 138 | +`make shell.vllm` to launch container with shell prompt |
| 139 | + |
| 140 | +`make stop.vllm` to stop the running container |
0 commit comments