Skip to content

Commit ee30107

Browse files
committed
Updating AQUA stack
2 parents 14a22f2 + 56017f4 commit ee30107

28 files changed

+32516
-921
lines changed

ai-quick-actions/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,15 @@ Manager (ORM) or manually. For further information see [Policies](policies/READM
1111

1212
---
1313

14+
**What's New:**
15+
16+
How-To Blogs:
17+
18+
1. [Deploy Meta-Llama-3-8B-Instruct with Oracle Service Managed vLLM(0.3.0) Container](llama3-with-smc.md)
19+
2. [Deploy ELYZA-japanese-Llama-2-13b-instruct with Oracle Service Managed vLLM(0.3.0) Container](deploy-with-smc.md)
20+
21+
---
22+
1423
- [Policies](policies/README.md)
1524
- [CLI](cli-tips.md)
1625
- [Model Deployment](model-deployment-tips.md)
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# Deploy ELYZA-japanese-Llama-2-13b-instruct with Oracle Service Managed vLLM(0.3.0) Container
2+
3+
![ELYZA](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b/resolve/main/key_visual.png)
4+
5+
This how-to will show how to use the Oracle Data Science Service Managed Containers - part of the Quick Actions feature, to inference with a model downloaded from Hugging Face. For this we will use [ELYZA-japanese-Llama-2-13b-instruct](https://huggingface.co/collections/elyza/elyza-japanese-llama-2-13b-6589ba0435f23c0f1c41d32a) from a company named ELYZA, which is known for its LLM research and is based out of the University of Tokyo. ELYZA uses pre-training from the English-dominant model because of the prevalence of English training data, along with an additional 18 billion tokens of Japanese data.
6+
7+
## Required IAM Policies
8+
9+
Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services.
10+
11+
## Setup
12+
13+
```python
14+
# Install required python packages
15+
16+
!pip install oracle-ads
17+
!pip install oci
18+
!pip install huggingface_hub
19+
```
20+
21+
```python
22+
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
23+
# import os
24+
# os.environ['http_proxy']="http://myproxy"
25+
# os.environ['https_proxy']="http://myproxy"
26+
27+
# Use os.environ['no_proxy'] to route traffic directly
28+
```
29+
30+
```python
31+
import ads
32+
ads.set_auth("resource_principal")
33+
```
34+
35+
```python
36+
# Extract region information from the Notebook environment variables and signer.
37+
ads.common.utils.extract_region()
38+
```
39+
40+
### Common variables
41+
42+
```python
43+
# change as required for your environment
44+
compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"]
45+
project_id = os.environ["PROJECT_OCID"]
46+
47+
log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx"
48+
log_id = "cid1.log.oc1.xxx.xxxxx"
49+
50+
instance_shape = "VM.GPU.A10.2"
51+
container_image = "dsmc://odsc-vllm-serving:0.3.0.7"
52+
region = "us-ashburn-1"
53+
```
54+
55+
The container image referenced above (`dsmc://odsc-vllm-serving:0.3.0.7`) is an Oracle Service Managed container that was build with:
56+
57+
- Oracle Linux 8 - Slim
58+
- CUDA 12.4
59+
- cuDNN 9
60+
- Torch 2.1.2
61+
- Python 3.11.5
62+
- vLLM v0.3.0
63+
64+
## Prepare The Model Artifacts
65+
66+
To prepare Model artifacts for LLM model deployment:
67+
68+
- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
69+
- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
70+
- Create model catalog entry for the model using the Object storage path
71+
72+
### Model Download from HuggingFace Model Hub
73+
74+
```python
75+
# Login to huggingface using env variable
76+
HUGGINGFACE_TOKEN = "<HUGGINGFACE_TOKEN>" # Your huggingface token
77+
!huggingface-cli login --token $HUGGINGFACE_TOKEN
78+
```
79+
80+
[This](https://huggingface.co/docs/huggingface_hub/guides/download#download-an-entire-repository) provides more information on using `snapshot_download()` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
81+
82+
```python
83+
# Download the ELYZA model from Hugging Face to a local folder.
84+
#
85+
86+
from huggingface_hub import snapshot_download
87+
from tqdm.auto import tqdm
88+
89+
model_name = "elyza/ELYZA-japanese-Llama-2-13b-instruct"
90+
local_dir = "models/ELYZA-japanese-Llama-2-13b-instruct"
91+
92+
snapshot_download(repo_id=model_name, local_dir=local_dir, force_download=True, tqdm_class=tqdm)
93+
94+
print(f"Downloaded model {model_name} to {local_dir}")
95+
```
96+
97+
## Upload Model to OCI Object Storage
98+
99+
```python
100+
model_prefix = "ELYZA-japanese-Llama-2-13b-instruct/" #"<bucket_prefix>"
101+
bucket= "<bucket_name>" # this should be a versioned bucket
102+
namespace = "<bucket_namespace>"
103+
104+
!oci os object bulk-upload --src-dir $local_dir --prefix $model_prefix -bn $bucket -ns $namespace --auth "resource_principal"
105+
```
106+
107+
## Create Model by Reference using ADS
108+
109+
```python
110+
from ads.model.datascience_model import DataScienceModel
111+
112+
artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}"
113+
114+
model = (DataScienceModel()
115+
.with_compartment_id(compartment_id)
116+
.with_project_id(project_id)
117+
.with_display_name("ELYZA-japanese-Llama-2-13b-instruct")
118+
.with_artifact(artifact_path)
119+
)
120+
121+
model.create(model_by_reference=True)
122+
```
123+
124+
### Import Model Deployment Modules
125+
126+
```python
127+
from ads.model.deployment import (
128+
ModelDeployment,
129+
ModelDeploymentContainerRuntime,
130+
ModelDeploymentInfrastructure,
131+
ModelDeploymentMode,
132+
)
133+
```
134+
135+
### Setup Model Deployment Infrastructure
136+
137+
```python
138+
infrastructure = (
139+
ModelDeploymentInfrastructure()
140+
.with_project_id(project_id)
141+
.with_compartment_id(compartment_id)
142+
.with_shape_name(instance_shape)
143+
.with_bandwidth_mbps(10)
144+
.with_replica(1)
145+
.with_web_concurrency(10)
146+
.with_access_log(
147+
log_group_id=log_group_id,
148+
log_id=log_id,
149+
)
150+
.with_predict_log(
151+
log_group_id=log_group_id,
152+
log_id=log_id,
153+
)
154+
)
155+
```
156+
157+
### Configure Model Deployment Runtime
158+
159+
```python
160+
env_var = {
161+
'BASE_MODEL': model_prefix,
162+
'PARAMS': '--served-model-name odsc-llm --seed 42',
163+
'MODEL_DEPLOY_PREDICT_ENDPOINT': '/v1/completions',
164+
'MODEL_DEPLOY_ENABLE_STREAMING': 'true'
165+
}
166+
167+
container_runtime = (
168+
ModelDeploymentContainerRuntime()
169+
.with_image(container_image)
170+
.with_server_port(8080)
171+
.with_health_check_port(8080)
172+
.with_env(env_var)
173+
.with_deployment_mode(ModelDeploymentMode.HTTPS)
174+
.with_model_uri(model.id)
175+
.with_region(region)
176+
.with_overwrite_existing_artifact(True)
177+
.with_remove_existing_artifact(True)
178+
)
179+
```
180+
181+
### Deploy Model Using Container Runtime
182+
183+
```python
184+
deployment = (
185+
ModelDeployment()
186+
.with_display_name(f"ELYZA-japanese-Llama-2-13b-Instruct MD with vLLM SMC")
187+
.with_description("Deployment of ELYZA-japanese-Llama-2-13b-Instruct MD with vLLM(0.3.0) container")
188+
.with_infrastructure(infrastructure)
189+
.with_runtime(container_runtime)
190+
).deploy(wait_for_completion=False)
191+
```
192+
193+
### Inference
194+
195+
Once the model deployment has reached the Active state, we can invoke the model deployment endpoint to interact with the LLM. More details on different ways for accessing MD endpoints is documented [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md).
196+
197+
```python
198+
import requests
199+
import ads
200+
201+
ads.set_auth("resource_principal")
202+
203+
requests.post(
204+
"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict",
205+
json={
206+
"model": "odsc-llm", # do not change this with system managed container deployments
207+
"prompt": "{bos_token}{b_inst} {system} {prompt} {e_inst}".format(
208+
bos_token="<s>",
209+
b_inst="[INST]",
210+
system="<<SYS>>\nあなたは誠実で優秀な日本人のアシスタントです。\n<</SYS>>\n\n",
211+
prompt="活性化関数とは何ですか",
212+
e_inst="[/INST]",
213+
),
214+
"max_tokens": 250,
215+
"temperature": 0.7,
216+
"top_p": 0.8,
217+
},
218+
auth=ads.common.auth.default_signer()["signer"],
219+
headers={},
220+
).json()
221+
222+
223+
```
224+
225+
#### Output
226+
227+
The LLM produced a great output from the prompt *"what is an activation function?"*, which translates to:
228+
229+
>
230+
> An activation function is a function that initializes the output of a neuron in a neural network that is part of a machine learning algorithm.
231+
> If the output of the neuron is x and the activation function is f, the new output y of the neuron can be calculated using the following formula.
232+
>
233+
> y = f(x)
234+
>
235+
>Generally, neuron activation functions often have the following characteristics.
236+
>
237+
> 1. Monotonicity: In general, the relationship between input and output of an activation function is monotonic.
238+
> 2. Differentiability: The activation function cannot be differentiated.
239+
>
240+
241+
242+
```python
243+
{
244+
"id": "cmpl-807794826bf3438397e5e0552c0dcba8",
245+
"object": "text_completion",
246+
"created": 7343,
247+
"model": "odsc-llm",
248+
"choices": [
249+
{
250+
"index": 0,
251+
"text": "活性化関数とは、機械学習アルゴリズムの一部であるニューラルネットワークにおいて、ニューロンの出力を初期化するための関数のことです。\n\nニューロンの出力をxとし、活性化関数をfとすると、ニューロンの新たな出力yは以下の式で求められます。\n\ny = f(x)\n\n一般に、ニューロンの活性化関数は以下のような特性を持つものが多く使われています。\n\n1. 単調性: 活性化関数は、入力と出力の関係が単調であることが一般的です。\n2. 可微分性: 活性化関数は、微分が定義でき",
252+
"logprobs": None,
253+
"finish_reason": "length",
254+
}
255+
],
256+
"usage": {"prompt_tokens": 86, "total_tokens": 336, "completion_tokens": 250},
257+
}
258+
```

0 commit comments

Comments
 (0)