Skip to content

Commit 8ef9598

Browse files
PR: Installing onto an existing cluster.
1 parent 170ec3a commit 8ef9598

File tree

5 files changed

+170
-5
lines changed

5 files changed

+170
-5
lines changed

INSTALLING_ONTO_EXISTING_CLUSTER_README.md

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ This guide helps you install and use **OCI AI Blueprints** for the first time on
66
2. Retrieve existing cluster OKE and VCN names from console.
77
3. Deploy the **OCI AI Blueprints** application onto the existing cluster.
88
4. Learn how to add existing nodes in the cluster to be used by blueprints.
9+
5. Deploy a sample recipe to that node.
10+
6. Test your deployment and undeploy
911

1012
---
1113

@@ -58,7 +60,118 @@ Some or all of these policies may be in place as required by OKE. Please review
5860
## Step 4: Add Existing Nodes to Cluster (optional)
5961
If you have existing node pools in your original OKE cluster that you'd like Blueprints to be able to use, follow these steps after the stack is finished:
6062

61-
1. Go to the stack and click "Application information". Click the API Url.
63+
1. Find the private IP address of the node you'd like to add.
64+
- Console:
65+
- Go to the OKE cluster in the console like you did above
66+
- Click on "Node pools"
67+
- Click on the pool with the node you want to add
68+
- Identify the private ip address of the node under "Nodes" in the page.
69+
- Command line with `kubectl` (assumes cluster access is setup):
70+
- run `kubectl get nodes`
71+
- run `kubectl describe node <nodename>` on each node until you find the node you want to add
72+
- The private ip appears under the `Name` field of the output of `kubectl get nodes`.
73+
2. Go to the stack and click "Application information". Click the API Url.
6274
- If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
63-
2. Login with the `Admin Username` and `Admin Password` in the Application information tab.
64-
3.
75+
3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
76+
4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
77+
5. Paste in the sample blueprint json found [here](./docs/sample_blueprints/add_node_to_control_plane.json).
78+
6. Modify the "recipe_node_name" field to the private IP address you found in step 1 above.
79+
7. Click "POST". This is a fast operation.
80+
8. Wait about 20 seconds and refresh the page. It should look like:
81+
```json
82+
[
83+
{
84+
"mode": "update",
85+
"recipe_id": null,
86+
"creation_date": "2025-03-28 11:12 AM UTC",
87+
"deployment_uuid": "750a________cc0bfd",
88+
"deployment_name": "startupaddnode",
89+
"deployment_status": "completed",
90+
"deployment_directive": "commission"
91+
}
92+
]
93+
```
94+
95+
## Step 5: Deploy a sample recipe
96+
2. Go to the stack and click "Application information". Click the API Url.
97+
- If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
98+
3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
99+
4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
100+
5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json).
101+
- Depending on the node shape, you will need to change:
102+
`"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
103+
6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](./docs/sample_blueprints/vllm_inference_sample_blueprint.json).
104+
7. Paste the blueprint you selected into context box on the deployment page and click "POST"
105+
8. To monitor the deployment, go back to "Api Root" and click "deployment_logs".
106+
- If you are deploying without a shared node pool, it can take 10-30 minutes to bring up a node, depending on shape and whether it is bare-metal or virtual.
107+
- If you are deploying with a shared node pool, the blueprint will deploy much more quickly.
108+
- It is common for a recipe to report "unhealthy" while it is deploying. This is caused by "Warnings" in the pod events when deploying to kubernetes. You only need to be alarmed when an "error" is reported.
109+
9. Wait for the following steps to complete:
110+
- Affinity / selection of node -> Directive / commission -> Command / initializing -> Canonical / name assignment -> Service -> Deployment -> Ingress -> Monitor / nominal.
111+
10. When you see the step "Monitor / nominal", you have an inference server running on your node.
112+
113+
## Step 6: Test your deployment
114+
1. Upon completion of [Step 5](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-5-deploy-a-sample-recipe), test the deployment endpoint.
115+
2. Go to Api Root, then click "deployment_digests". Find the "service_endpoint_domain" on this page.
116+
- This is <deployment-name>.<base-url>.nip.io for those who let us deploy the endpoint. If you use the default recipes above, an example of this would be:
117+
118+
`vllm-inference-deployment.158-179-30-233.nip.io`
119+
3. `curl` the metrics endpoint:
120+
```bash
121+
curl -L vllm-inference-deployment.158-179-30-233.nip.io/metrics
122+
# HELP vllm:cache_config_info Information of the LLMEngine CacheConfig
123+
# TYPE vllm:cache_config_info gauge
124+
vllm:cache_config_info{block_size="16",cache_dtype="auto",cpu_offload_gb="0",enable_prefix_caching="False",gpu_memory_utilization="0.9",is_attention_free="False",num_cpu_blocks="4096",num_gpu_blocks="10947",num_gpu_blocks_override="None",sliding_window="None",swap_space_bytes="4294967296"} 1.0
125+
# HELP vllm:num_requests_running Number of requests currently running on GPU.
126+
# TYPE vllm:num_requests_running gauge
127+
vllm:num_requests_running{model_name="/models/NousResearch/Meta-Llama-3.1-8B-Instruct"} 0.0
128+
# HELP vllm:num_requests_swapped Number of requests swapped to CPU.
129+
...
130+
```
131+
4. Send an actual post request:
132+
```bash
133+
curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, how are you?"}], "temperature": 0.7, "max_tokens": 100 }' vllm-inference-deployment.158-179-30-233.nip.io/v1/chat/completions | jq
134+
135+
# response
136+
{
137+
"id": "chatcmpl-bb9093a3f51cee3e0ebe67ed06da59f0",
138+
"object": "chat.completion",
139+
"created": 1743169357,
140+
"model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct",
141+
"choices": [
142+
{
143+
"index": 0,
144+
"message": {
145+
"role": "assistant",
146+
"content": "I'm doing well, thank you for asking! I'm a helpful assistant, so I'm always ready to assist you with any questions or tasks you may have. How about you? How's your day going so far?",
147+
"tool_calls": []
148+
},
149+
"logprobs": null,
150+
"finish_reason": "stop",
151+
"stop_reason": null
152+
}
153+
],
154+
"usage": {
155+
"prompt_tokens": 27,
156+
"total_tokens": 73,
157+
"completion_tokens": 46,
158+
"prompt_tokens_details": null
159+
},
160+
"prompt_logprobs": null
161+
}
162+
```
163+
5. When completed, undeploy the recipe:
164+
- go to Api Root -> deployment
165+
- Grab the whole deployment_uuid field for your deployment.
166+
- "deployment_uuid": "asdfjklafjdskl"
167+
- go to Api Root -> undeploy
168+
- paste the field "deployment_uuid" into the content box and wrap it in curly braces {}:
169+
- {"deployment_uuid": "asdfjklafjdskl"}
170+
- Click "POST"
171+
6. Monitor the undeploy:
172+
- go to Api Root -> deployment_logs
173+
- Look for: Directive decommission -> Ingress deleted -> Deployment deleted -> Service deleted -> Directive / decommission / completed.
174+
175+
## Need Help?
176+
- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
177+
- For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Looking to install and use OCI AI Blueprints right away? **[Click here](./GETTIN
1616

1717
We recommend following the Getting Started guide if this is your first time.
1818

19+
If you are looking to install OCI AI Blueprints onto an existing OKE cluster which already has running workloads and node pools, visit [this doc](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md).
1920
---
2021

2122
## Introduction
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"recipe_mode": "update",
3+
"deployment_name": "startupaddnode",
4+
"recipe_node_name": "10.0.10.164",
5+
"recipe_node_labels": {
6+
"corrino": "a10pool",
7+
"corrino/pool-shared-any": "true"
8+
}
9+
}

docs/sample_blueprints/vllm_inference_sample_blueprint.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"recipe_id": "llm_inference_nvidia",
33
"recipe_mode": "service",
44
"deployment_name": "vLLM Inference Deployment",
5-
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.2",
5+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
66
"recipe_node_shape": "VM.GPU.A10.2",
77
"input_object_storage": [
88
{
@@ -38,5 +38,5 @@
3838
"$(tensor_parallel_size)"
3939
],
4040
"recipe_ephemeral_storage_size": 100,
41-
"recipe_shared_memory_volume_size_limit_in_mb": 200
41+
"recipe_shared_memory_volume_size_limit_in_mb": 1000
4242
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
{
2+
"recipe_id": "llm_inference_nvidia",
3+
"recipe_mode": "service",
4+
"deployment_name": "vLLM Inference Deployment",
5+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
6+
"recipe_node_shape": "BM.GPU.A10.4",
7+
"input_object_storage": [
8+
{
9+
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/IFknABDAjiiF5LATogUbRCcVQ9KL6aFUC1j-P5NSeUcaB2lntXLaR935rxa-E-u1/n/iduyx1qnmway/b/corrino_hf_oss_models/o/",
10+
"mount_location": "/models",
11+
"volume_size_in_gbs": 500,
12+
"include": ["NousResearch/Meta-Llama-3.1-8B-Instruct"]
13+
}
14+
],
15+
"recipe_container_env": [
16+
{
17+
"key": "tensor_parallel_size",
18+
"value": "2"
19+
},
20+
{
21+
"key": "model_name",
22+
"value": "NousResearch/Meta-Llama-3.1-8B-Instruct"
23+
},
24+
{
25+
"key": "Model_Path",
26+
"value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
27+
}
28+
],
29+
"recipe_replica_count": 1,
30+
"recipe_container_port": "8000",
31+
"recipe_nvidia_gpu_count": 2,
32+
"recipe_use_shared_node_pool": true,
33+
"recipe_node_boot_volume_size_in_gbs": 200,
34+
"recipe_container_command_args": [
35+
"--model",
36+
"$(Model_Path)",
37+
"--tensor-parallel-size",
38+
"$(tensor_parallel_size)"
39+
],
40+
"recipe_ephemeral_storage_size": 100,
41+
"recipe_shared_memory_volume_size_limit_in_mb": 1000
42+
}

0 commit comments

Comments
 (0)