Skip to content

Commit 28f9190

Browse files
Merge pull request #48 from oracle-quickstart/model_to_object_storage
Added docs and recipe samples for copying models to OCI object storage.
2 parents e7288f5 + 33e4f5f commit 28f9190

File tree

4 files changed

+80
-44
lines changed

4 files changed

+80
-44
lines changed

docs/common_workflows/working_with_large_models/README.md

Lines changed: 14 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Deploy Shared Node Pool
44

5-
Most large models require a large machine to handle inference / finetuning of the model, therefore we reccomend you use a bare-metal shape for the large model.
5+
Most large models require a large machine to handle inference / finetuning of the model, therefore we recommend you use a bare-metal shape for the large model.
66

77
As a first step, for bare-metal shapes we recommend deploying a shared node pool due to the large recycle times. Shared node pools allow us to deploy blueprints onto and off of the node without destroying the node resources. To deploy an H100 shared node pool, here is the JSON for the /deployment API:
88

@@ -32,51 +32,21 @@ Bare metal shapes can take up to 30 minutes to come online. If you hit a shape v
3232

3333
## Download the model to object storage (optional, but recommended)
3434

35-
For repeat deployments, large models take longer to download from huggingface than they do from object storage because of how we've implemented the download from object storage vs how the huggingface download works in code (it's single threaded).
35+
For repeat deployments, large models take longer to download from huggingface than they do from object storage because of how we've implemented the download from object storage vs how the huggingface download works in code.
3636

3737
**NOTE**: For any step involving a closed model for huggingface (meta models as an example), you will need to use your own token to download the model. We cannot distribute huggingface tokens as it breaks the SLA.
3838

3939
Steps:
4040

4141
1. Create a bucket in object storage in the same region as the shared node pool (decrease copy times). In our example, we will call this something similar to the name of the model we plan to use: `llama3290Bvisioninstruct`
4242

43-
2. Once the bucket is finished creating, copy the model using the following blueprint, replacing `<hf_token>` with your actual huggingface token:
43+
2. Once the bucket is finished creating, deploy [this blueprint](../../sample_blueprints/download_closed_hf_model_to_object_storage.json) to copy `meta-llama/Llama-3.2-90B-Vision-Instruct` to the bucket you created.
44+
- **Note**: The blueprint assumes you created the bucket using the name `llama3290Bvisioninstruct`. If you changed the name, you will also need to modify it in the example blueprint.
4445

45-
```json
46-
{
47-
"recipe_id": "example",
48-
"recipe_mode": "job",
49-
"deployment_name": "model_to_object",
50-
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:hf_downloader_v1",
51-
"recipe_container_command_args": [
52-
"meta-llama/Llama-3.2-90B-Vision-Instruct",
53-
"--local-dir",
54-
"/models",
55-
"--max-workers",
56-
"4",
57-
"--token",
58-
"<hf_token>"
59-
],
60-
"recipe_container_port": "5678",
61-
"recipe_node_shape": "VM.Standard.E4.Flex",
62-
"recipe_node_pool_size": 1,
63-
"recipe_flex_shape_ocpu_count": 4,
64-
"recipe_flex_shape_memory_size_in_gbs": 64,
65-
"recipe_node_boot_volume_size_in_gbs": 500,
66-
"recipe_ephemeral_storage_size": 450,
67-
"output_object_storage": [
68-
{
69-
"bucket_name": "mymodels",
70-
"mount_location": "/models",
71-
"volume_size_in_gbs": 450
72-
}
73-
]
74-
}
75-
```
7646

7747
Once this copy is done, the model can now be used for blueprint deployments. You can track this in API deployment logs.
7848

79-
## Deploy blueprint
49+
## Deploy the serving blueprint
8050

8151
Once the copy is done, we can now deploy the blueprint using the model, except copying it from our object storage in the same region as our blueprint. Note the bucket name is the name of the bucket you created for your model:
8252

@@ -144,3 +114,12 @@ Then, with the `request.json` in place, run:
144114
```bash
145115
curl -L https://90bvisioninstruct.<base_endpoint>.nip.io/v1/chat/completions --header 'Content-Type: application/json' --data-binary @request.json
146116
```
117+
118+
## Complete
119+
120+
At this point, you have successfully deployed 3 separate blueprints:
121+
122+
1. Spin up a bare metal shared node pool to deploy blueprints onto
123+
2. Deployed a blueprint to copy a large model from huggingface to your own object storage
124+
3. Deployed the model to an inference serving endpoint on your shared node pool
125+
4. Tested the inference endpoint

docs/model_storage/README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,14 @@ OCI AI Blueprints will automatically create an ephemeral volume, mount it to the
88

99
### How To
1010

11+
**Step 1 [OPTIONAL]**:
12+
13+
If serving large models from huggingface, it is recommended to first download them to object storage because they are loaded much more quickly from object storage than via python applications which build in the ability to pull them.
14+
15+
To download a model from huggingface to object storage, check out [this doc](../common_workflows/working_with_large_models/README.md#download-the-model-to-object-storage-optional-but-recommended).
16+
17+
**Step 2:**
18+
1119
You can host your model via object storage by:
1220

1321
1. Creating a PAR for the bucket that contains your model (`par` in the example below)
@@ -17,21 +25,14 @@ You can host your model via object storage by:
1725

1826
Include the `input_object_storage` JSON object in your deployment payload (`/deployment` POST API):
1927

20-
```
28+
```json
2129
"input_object_storage": [
22-
2330
{
24-
2531
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/IFknABDAjiiF5LATogUbRCcVQ9KL6aFUC1j-P5NSeUcaB2lntXLaR935rxa-E-u1/n/iduyx1qnmway/b/corrino_hf_oss_models/o/",
26-
2732
"mount_location": "/models",
28-
2933
"volume_size_in_gbs": 500,
30-
3134
"include": ["NousResearch/Meta-Llama-3.1-8B-Instruct"]
32-
3335
}
34-
3536
],
3637
```
3738

@@ -45,6 +46,6 @@ Notes:
4546
- `include` field inside the `input_object_storage` object inside your payload (shown above) is used to specify which folder inside the bucket to download to the ephemeral volume (that the container has access to via the mount_location directory
4647
- The entire bucket will be dumped into the ephermal volume / container mount directory if include is not provided to specify the folder inside the folder to download
4748

48-
## Option 2: File Storage Service (FSS)
49+
## Option 2: File Storage Service (FSS) - Full doc coming soon!
4950

5051
[FSS Details](../fss/README.md)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"recipe_id": "example",
3+
"recipe_mode": "job",
4+
"deployment_name": "model_to_object",
5+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:hf_downloader_v1",
6+
"recipe_container_command_args": [
7+
"meta-llama/Llama-3.2-90B-Vision-Instruct",
8+
"--local-dir",
9+
"/models",
10+
"--max-workers",
11+
"4",
12+
"--token",
13+
"<hf_token>"
14+
],
15+
"recipe_container_port": "5678",
16+
"recipe_node_shape": "VM.Standard.E4.Flex",
17+
"recipe_node_pool_size": 1,
18+
"recipe_flex_shape_ocpu_count": 4,
19+
"recipe_flex_shape_memory_size_in_gbs": 64,
20+
"recipe_node_boot_volume_size_in_gbs": 500,
21+
"recipe_ephemeral_storage_size": 450,
22+
"output_object_storage": [
23+
{
24+
"bucket_name": "llama3290Bvisioninstruct",
25+
"mount_location": "/models",
26+
"volume_size_in_gbs": 450
27+
}
28+
]
29+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"recipe_id": "example",
3+
"recipe_mode": "job",
4+
"deployment_name": "model_to_object",
5+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:hf_downloader_v1",
6+
"recipe_container_command_args": [
7+
"NousResearch/Meta-Llama-3.1-405B-FP8",
8+
"--local-dir",
9+
"/models",
10+
"--max-workers",
11+
"16"
12+
],
13+
"recipe_container_port": "5678",
14+
"recipe_node_shape": "VM.Standard.E4.Flex",
15+
"recipe_node_pool_size": 1,
16+
"recipe_flex_shape_ocpu_count": 16,
17+
"recipe_flex_shape_memory_size_in_gbs": 256,
18+
"recipe_node_boot_volume_size_in_gbs": 1000,
19+
"recipe_ephemeral_storage_size": 900,
20+
"output_object_storage": [
21+
{
22+
"bucket_name": "nousllama31405bfp8",
23+
"mount_location": "/models",
24+
"volume_size_in_gbs": 800
25+
}
26+
]
27+
}

0 commit comments

Comments
 (0)