PR: Installing onto an existing cluster.

dkennetzoracle · dkennetzoracle · commit 8ef9598b4b20 · 2025-03-28T08:54:43.000-05:00
diff --git a/INSTALLING_ONTO_EXISTING_CLUSTER_README.md b/INSTALLING_ONTO_EXISTING_CLUSTER_README.md
@@ -6,6 +6,8 @@ This guide helps you install and use **OCI AI Blueprints** for the first time on
 2. Retrieve existing cluster OKE and VCN names from console.
 3. Deploy the **OCI AI Blueprints** application onto the existing cluster.
 4. Learn how to add existing nodes in the cluster to be used by blueprints.
+5. Deploy a sample recipe to that node.
+6. Test your deployment and undeploy
 
 ---
 
@@ -58,7 +60,118 @@ Some or all of these policies may be in place as required by OKE. Please review
 ## Step 4: Add Existing Nodes to Cluster (optional)
 If you have existing node pools in your original OKE cluster that you'd like Blueprints to be able to use, follow these steps after the stack is finished:
 
-1. Go to the stack and click "Application information". Click the API Url.
+1. Find the private IP address of the node you'd like to add.
+   - Console:
+     - Go to the OKE cluster in the console like you did above
+     - Click on "Node pools"
+     - Click on the pool with the node you want to add
+     - Identify the private ip address of the node under "Nodes" in the page.
+   - Command line with `kubectl` (assumes cluster access is setup):
+     - run `kubectl get nodes`
+     - run `kubectl describe node <nodename>` on each node until you find the node you want to add
+     - The private ip appears under the `Name` field of the output of `kubectl get nodes`.
+2. Go to the stack and click "Application information". Click the API Url.
    - If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
-2. Login with the `Admin Username` and `Admin Password` in the Application information tab.
-3. 
+3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
+4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
+5. Paste in the sample blueprint json found [here](./docs/sample_blueprints/add_node_to_control_plane.json).
+6. Modify the "recipe_node_name" field to the private IP address you found in step 1 above.
+7. Click "POST". This is a fast operation.
+8. Wait about 20 seconds and refresh the page. It should look like:
+```json
+[
+    {
+        "mode": "update",
+        "recipe_id": null,
+        "creation_date": "2025-03-28 11:12 AM UTC",
+        "deployment_uuid": "750a________cc0bfd",
+        "deployment_name": "startupaddnode",
+        "deployment_status": "completed",
+        "deployment_directive": "commission"
+    }
+]
+```
+
+## Step 5: Deploy a sample recipe
+2. Go to the stack and click "Application information". Click the API Url.
+   - If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
+3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
+4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
+5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json).
+   - Depending on the node shape, you will need to change:
+   `"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
+6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](./docs/sample_blueprints/vllm_inference_sample_blueprint.json).
+7. Paste the blueprint you selected into context box on the deployment page and click "POST"
+8. To monitor the deployment, go back to "Api Root" and click "deployment_logs".
+   - If you are deploying without a shared node pool, it can take 10-30 minutes to bring up a node, depending on shape and whether it is bare-metal or virtual.
+   - If you are deploying with a shared node pool, the blueprint will deploy much more quickly.
+   - It is common for a recipe to report "unhealthy" while it is deploying. This is caused by "Warnings" in the pod events when deploying to kubernetes. You only need to be alarmed when an "error" is reported.
+9. Wait for the following steps to complete: 
+   - Affinity / selection of node -> Directive / commission -> Command / initializing -> Canonical / name assignment -> Service -> Deployment -> Ingress -> Monitor / nominal.
+10. When you see the step "Monitor / nominal", you have an inference server running on your node.
+
+## Step 6: Test your deployment
+1. Upon completion of [Step 5](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-5-deploy-a-sample-recipe), test the deployment endpoint.
+2. Go to Api Root, then click "deployment_digests". Find the "service_endpoint_domain" on this page.
+   - This is <deployment-name>.<base-url>.nip.io for those who let us deploy the endpoint. If you use the default recipes above, an example of this would be:
+   
+   `vllm-inference-deployment.158-179-30-233.nip.io`
+3. `curl` the metrics endpoint:
+```bash
+curl -L vllm-inference-deployment.158-179-30-233.nip.io/metrics
+# HELP vllm:cache_config_info Information of the LLMEngine CacheConfig
+# TYPE vllm:cache_config_info gauge
+vllm:cache_config_info{block_size="16",cache_dtype="auto",cpu_offload_gb="0",enable_prefix_caching="False",gpu_memory_utilization="0.9",is_attention_free="False",num_cpu_blocks="4096",num_gpu_blocks="10947",num_gpu_blocks_override="None",sliding_window="None",swap_space_bytes="4294967296"} 1.0
+# HELP vllm:num_requests_running Number of requests currently running on GPU.
+# TYPE vllm:num_requests_running gauge
+vllm:num_requests_running{model_name="/models/NousResearch/Meta-Llama-3.1-8B-Instruct"} 0.0
+# HELP vllm:num_requests_swapped Number of requests swapped to CPU.
+...
+```
+4. Send an actual post request:
+```bash
+curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, how are you?"}], "temperature": 0.7, "max_tokens": 100 }' vllm-inference-deployment.158-179-30-233.nip.io/v1/chat/completions | jq
+
+# response
+{
+  "id": "chatcmpl-bb9093a3f51cee3e0ebe67ed06da59f0",
+  "object": "chat.completion",
+  "created": 1743169357,
+  "model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "I'm doing well, thank you for asking! I'm a helpful assistant, so I'm always ready to assist you with any questions or tasks you may have. How about you? How's your day going so far?",
+        "tool_calls": []
+      },
+      "logprobs": null,
+      "finish_reason": "stop",
+      "stop_reason": null
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 27,
+    "total_tokens": 73,
+    "completion_tokens": 46,
+    "prompt_tokens_details": null
+  },
+  "prompt_logprobs": null
+}
+```
+5. When completed, undeploy the recipe:
+   - go to Api Root -> deployment
+   - Grab the whole deployment_uuid field for your deployment.
+     - "deployment_uuid": "asdfjklafjdskl"
+   - go to Api Root -> undeploy
+   - paste the field "deployment_uuid" into the content box and wrap it in curly braces {}:
+      - {"deployment_uuid": "asdfjklafjdskl"} 
+   - Click "POST"
+6. Monitor the undeploy:
+   - go to Api Root -> deployment_logs
+   - Look for: Directive decommission -> Ingress deleted -> Deployment deleted -> Service deleted -> Directive / decommission / completed.
+
+## Need Help?
+- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
+- For questions or additional support, contact [vishnu.kammari@oracle.com](mailto:vishnu.kammari@oracle.com) or [grant.neuman@oracle.com](mailto:grant.neuman@oracle.com).
diff --git a/README.md b/README.md
@@ -16,6 +16,7 @@ Looking to install and use OCI AI Blueprints right away? **[Click here](./GETTIN
 
 We recommend following the Getting Started guide if this is your first time.
 
+If you are looking to install OCI AI Blueprints onto an existing OKE cluster which already has running workloads and node pools, visit [this doc](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md).
 ---
 
 ## Introduction
diff --git a/docs/sample_blueprints/add_node_to_control_plane.json b/docs/sample_blueprints/add_node_to_control_plane.json
@@ -0,0 +1,9 @@
+{
+    "recipe_mode": "update",
+    "deployment_name": "startupaddnode",
+    "recipe_node_name": "10.0.10.164",
+    "recipe_node_labels": {
+        "corrino": "a10pool",
+        "corrino/pool-shared-any": "true"
+    }
+}
diff --git a/docs/sample_blueprints/vllm_inference_sample_blueprint.json b/docs/sample_blueprints/vllm_inference_sample_blueprint.json
@@ -2,7 +2,7 @@
     "recipe_id": "llm_inference_nvidia",
     "recipe_mode": "service",
     "deployment_name": "vLLM Inference Deployment",
-    "recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.2",
+    "recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
     "recipe_node_shape": "VM.GPU.A10.2",
     "input_object_storage": [
         {
@@ -38,5 +38,5 @@
         "$(tensor_parallel_size)"
     ],
     "recipe_ephemeral_storage_size": 100,
-    "recipe_shared_memory_volume_size_limit_in_mb": 200
+    "recipe_shared_memory_volume_size_limit_in_mb": 1000
 }
diff --git a/docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json b/docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json
@@ -0,0 +1,42 @@
+{
+    "recipe_id": "llm_inference_nvidia",
+    "recipe_mode": "service",
+    "deployment_name": "vLLM Inference Deployment",
+    "recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
+    "recipe_node_shape": "BM.GPU.A10.4",
+    "input_object_storage": [
+        {
+            "par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/IFknABDAjiiF5LATogUbRCcVQ9KL6aFUC1j-P5NSeUcaB2lntXLaR935rxa-E-u1/n/iduyx1qnmway/b/corrino_hf_oss_models/o/",
+            "mount_location": "/models",
+            "volume_size_in_gbs": 500,
+            "include": ["NousResearch/Meta-Llama-3.1-8B-Instruct"]
+        }
+    ],    
+    "recipe_container_env": [
+        {
+            "key": "tensor_parallel_size",
+            "value": "2"
+        },
+        {
+            "key": "model_name",
+            "value": "NousResearch/Meta-Llama-3.1-8B-Instruct"
+        },
+        {
+            "key": "Model_Path",
+            "value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
+        }
+    ],
+    "recipe_replica_count": 1,
+    "recipe_container_port": "8000",
+    "recipe_nvidia_gpu_count": 2,
+    "recipe_use_shared_node_pool": true,
+    "recipe_node_boot_volume_size_in_gbs": 200,
+    "recipe_container_command_args": [
+        "--model",
+        "$(Model_Path)",
+        "--tensor-parallel-size",
+        "$(tensor_parallel_size)"
+    ],
+    "recipe_ephemeral_storage_size": 100,
+    "recipe_shared_memory_volume_size_limit_in_mb": 1000
+}