|
2 | 2 |
|
3 | 3 | ## Deploy Shared Node Pool |
4 | 4 |
|
5 | | -Most large models require a large machine to handle inference / finetuning of the model, therefore we reccomend you use a bare-metal shape for the large model. |
| 5 | +Most large models require a large machine to handle inference / finetuning of the model, therefore we recommend you use a bare-metal shape for the large model. |
6 | 6 |
|
7 | 7 | As a first step, for bare-metal shapes we recommend deploying a shared node pool due to the large recycle times. Shared node pools allow us to deploy blueprints onto and off of the node without destroying the node resources. To deploy an H100 shared node pool, here is the JSON for the /deployment API: |
8 | 8 |
|
@@ -32,51 +32,21 @@ Bare metal shapes can take up to 30 minutes to come online. If you hit a shape v |
32 | 32 |
|
33 | 33 | ## Download the model to object storage (optional, but recommended) |
34 | 34 |
|
35 | | -For repeat deployments, large models take longer to download from huggingface than they do from object storage because of how we've implemented the download from object storage vs how the huggingface download works in code (it's single threaded). |
| 35 | +For repeat deployments, large models take longer to download from huggingface than they do from object storage because of how we've implemented the download from object storage vs how the huggingface download works in code. |
36 | 36 |
|
37 | 37 | **NOTE**: For any step involving a closed model for huggingface (meta models as an example), you will need to use your own token to download the model. We cannot distribute huggingface tokens as it breaks the SLA. |
38 | 38 |
|
39 | 39 | Steps: |
40 | 40 |
|
41 | 41 | 1. Create a bucket in object storage in the same region as the shared node pool (decrease copy times). In our example, we will call this something similar to the name of the model we plan to use: `llama3290Bvisioninstruct` |
42 | 42 |
|
43 | | -2. Once the bucket is finished creating, copy the model using the following blueprint, replacing `<hf_token>` with your actual huggingface token: |
| 43 | +2. Once the bucket is finished creating, deploy [this blueprint](../../sample_blueprints/download_closed_hf_model_to_object_storage.json) to copy `meta-llama/Llama-3.2-90B-Vision-Instruct` to the bucket you created. |
| 44 | + - **Note**: The blueprint assumes you created the bucket using the name `llama3290Bvisioninstruct`. If you changed the name, you will also need to modify it in the example blueprint. |
44 | 45 |
|
45 | | -```json |
46 | | -{ |
47 | | - "recipe_id": "example", |
48 | | - "recipe_mode": "job", |
49 | | - "deployment_name": "model_to_object", |
50 | | - "recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:hf_downloader_v1", |
51 | | - "recipe_container_command_args": [ |
52 | | - "meta-llama/Llama-3.2-90B-Vision-Instruct", |
53 | | - "--local-dir", |
54 | | - "/models", |
55 | | - "--max-workers", |
56 | | - "4", |
57 | | - "--token", |
58 | | - "<hf_token>" |
59 | | - ], |
60 | | - "recipe_container_port": "5678", |
61 | | - "recipe_node_shape": "VM.Standard.E4.Flex", |
62 | | - "recipe_node_pool_size": 1, |
63 | | - "recipe_flex_shape_ocpu_count": 4, |
64 | | - "recipe_flex_shape_memory_size_in_gbs": 64, |
65 | | - "recipe_node_boot_volume_size_in_gbs": 500, |
66 | | - "recipe_ephemeral_storage_size": 450, |
67 | | - "output_object_storage": [ |
68 | | - { |
69 | | - "bucket_name": "mymodels", |
70 | | - "mount_location": "/models", |
71 | | - "volume_size_in_gbs": 450 |
72 | | - } |
73 | | - ] |
74 | | -} |
75 | | -``` |
76 | 46 |
|
77 | 47 | Once this copy is done, the model can now be used for blueprint deployments. You can track this in API deployment logs. |
78 | 48 |
|
79 | | -## Deploy blueprint |
| 49 | +## Deploy the serving blueprint |
80 | 50 |
|
81 | 51 | Once the copy is done, we can now deploy the blueprint using the model, except copying it from our object storage in the same region as our blueprint. Note the bucket name is the name of the bucket you created for your model: |
82 | 52 |
|
@@ -144,3 +114,12 @@ Then, with the `request.json` in place, run: |
144 | 114 | ```bash |
145 | 115 | curl -L https://90bvisioninstruct.<base_endpoint>.nip.io/v1/chat/completions --header 'Content-Type: application/json' --data-binary @request.json |
146 | 116 | ``` |
| 117 | + |
| 118 | +## Complete |
| 119 | + |
| 120 | +At this point, you have successfully deployed 3 separate blueprints: |
| 121 | + |
| 122 | +1. Spin up a bare metal shared node pool to deploy blueprints onto |
| 123 | +2. Deployed a blueprint to copy a large model from huggingface to your own object storage |
| 124 | +3. Deployed the model to an inference serving endpoint on your shared node pool |
| 125 | +4. Tested the inference endpoint |
0 commit comments