Skip to content

Commit 668aa10

Browse files
Major Sample Blueprint Docs Cleanup (#64)
* full clean up of repo * fix all broken links * change name of version folder * fix links in multi node and rdma readmes * changes to vllm blueprints * autoscaling readme changes * lora_finetuning changes * cpu_inference * existing oke cluster * gpu-health-check * mig * model storage * multinode inference * shared node pools * teams * using rdma enabled node pools * main README * Add new blueprints for various AI workloads including autoscaling, CPU inference, GPU health checks, multi-node inference, and shared node pools. Introduced RDMA-enabled node pools for enhanced performance and resource management. Updated documentation for each blueprint to provide comprehensive usage instructions. * Update documentation for AI blueprints: corrected links for LLM Inference and improved formatting in the features section. * Update CPU Inference Blueprint documentation: simplified title and clarified purpose for better readability. * Update sample blueprints documentation: changed 'recipe' to 'blueprint' for consistency and clarity in usage instructions.
1 parent c7e0cc1 commit 668aa10

File tree

78 files changed

+1000
-511
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1000
-511
lines changed

GETTING_STARTED_README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ This guide helps you install and use **OCI AI Blueprints** for the first time. Y
1515

1616
## Step 1: Set Up Policies in Your Tenancy
1717

18-
1. If you are **not** a tenancy administrator, ask your admin to set up the required policies in the **root compartment**. These policies are listed [here](docs/iam_policies/README.md).
19-
2. If you **are** a tenancy administrator, Resource Manager will typically deploy the minimal required policies automatically, but you can reference the same [IAM policies doc](docs/iam_policies/README.md) for advanced or custom configurations if needed.
18+
1. If you are **not** a tenancy administrator, ask your admin to set up the required policies in the **root compartment**. These policies are listed [here](docs/iam_policies.md).
19+
2. If you **are** a tenancy administrator, Resource Manager will typically deploy the minimal required policies automatically, but you can reference the same [IAM policies doc](docs/iam_policies.md) for advanced or custom configurations if needed.
2020

2121
---
2222

@@ -70,7 +70,7 @@ Now that your cluster is ready, follow these steps to install OCI AI Blueprints
7070

7171
## Step 5: Access the AI Blueprints API
7272

73-
1. Follow the instruction to access the AI Blueprints API via web and/or CURL/Postman: [Ways to Access OCI AI Blueprints](./docs/api_documentation/accessing_oci_ai_blueprints/README.md#ways-to-access-oci-ai-blueprints)
73+
1. Follow the instruction to access the AI Blueprints API via web and/or CURL/Postman: [Ways to Access OCI AI Blueprints](docs/usage_guide.md)
7474

7575
---
7676

@@ -95,5 +95,5 @@ Following this order ensures you do not have leftover services or dependencies i
9595

9696
## Need Help?
9797

98-
- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
98+
- Check out [Known Issues & Solutions](docs/known_issues.md) for troubleshooting common problems.
9999
- For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).

INSTALLING_ONTO_EXISTING_CLUSTER_README.md

Lines changed: 34 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Rather than installing blueprints onto a new cluster, a user may want to leverag
2222

2323
## Step 1: Set Up Policies in Your Tenancy
2424

25-
Some or all of these policies may be in place as required by OKE. Please review the required policies listed [here](docs/iam_policies/README.md) and add any required policies which are missing.
25+
Some or all of these policies may be in place as required by OKE. Please review the required policies listed [here](docs/iam_policies.md) and add any required policies which are missing.
2626

2727
1. If you are **not** a tenancy administrator, ask your admin to add additional required policies in the **root compartment**.
2828
2. If you **are** a tenancy administrator, you can either manually add the additional policies to an existing dynamic group, or let the resource manager deploy the required policies during stack creation.
@@ -45,6 +45,7 @@ Some or all of these policies may be in place as required by OKE. Please review
4545
- Under the section "OCI AI Blueprints IAM", click the checkbox to create the policies. (If you do not see this, ensure you've selected the correct choices for the questions above.)
4646

4747
- Otherwise, create the policies if you are an admin, or have your admin create the policies.
48+
4849
4. Select "YES" for all other options.
4950
5. Fill out additional fields for username and password, as well as Home Region.
5051
6. Under "OKE Cluster & VCN", select the cluster name and vcn name you found in step 2.
@@ -64,8 +65,8 @@ Some or all of these policies may be in place as required by OKE. Please review
6465
```
6566
9. After you've added all the relevant tooling namespaces, apply the stack by hitting "Next", then click the "run apply" box.
6667

67-
6868
## Step 4: Add Existing Nodes to Cluster (optional)
69+
6970
If you have existing node pools in your original OKE cluster that you'd like Blueprints to be able to use, follow these steps after the stack is finished:
7071

7172
1. Find the private IP address of the node you'd like to add.
@@ -82,49 +83,55 @@ If you have existing node pools in your original OKE cluster that you'd like Blu
8283
- If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
8384
3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
8485
4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
85-
5. Paste in the sample blueprint json found [here](./docs/sample_blueprints/add_node_to_control_plane.json).
86+
5. Paste in the sample blueprint json found [here](docs/sample_blueprints/exisiting_cluster_installation/add_node_to_control_plane.json).
8687
6. Modify the "recipe_node_name" field to the private IP address you found in step 1 above.
8788
7. Click "POST". This is a fast operation.
8889
8. Wait about 20 seconds and refresh the page. It should look like:
90+
8991
```json
9092
[
91-
{
92-
"mode": "update",
93-
"recipe_id": null,
94-
"creation_date": "2025-03-28 11:12 AM UTC",
95-
"deployment_uuid": "750a________cc0bfd",
96-
"deployment_name": "startupaddnode",
97-
"deployment_status": "completed",
98-
"deployment_directive": "commission"
99-
}
93+
{
94+
"mode": "update",
95+
"recipe_id": null,
96+
"creation_date": "2025-03-28 11:12 AM UTC",
97+
"deployment_uuid": "750a________cc0bfd",
98+
"deployment_name": "startupaddnode",
99+
"deployment_status": "completed",
100+
"deployment_directive": "commission"
101+
}
100102
]
101103
```
102104

103105
## Step 5: Deploy a sample recipe
106+
104107
2. Go to the stack and click "Application information". Click the API Url.
105108
- If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
106109
3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
107110
4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
108-
5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json).
111+
5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/shared_node_pools/vllm_inference_sample_shared_pool_blueprint.json).
109112
- Depending on the node shape, you will need to change:
110-
`"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
111-
6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](./docs/sample_blueprints/vllm_inference_sample_blueprint.json).
113+
`"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
114+
6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](docs/sample_blueprints/llm_inference_with_vllm/vllm-open-hf-model.json).
112115
7. Paste the blueprint you selected into context box on the deployment page and click "POST"
113116
8. To monitor the deployment, go back to "Api Root" and click "deployment_logs".
114117
- If you are deploying without a shared node pool, it can take 10-30 minutes to bring up a node, depending on shape and whether it is bare-metal or virtual.
115118
- If you are deploying with a shared node pool, the blueprint will deploy much more quickly.
116119
- It is common for a recipe to report "unhealthy" while it is deploying. This is caused by "Warnings" in the pod events when deploying to kubernetes. You only need to be alarmed when an "error" is reported.
117-
9. Wait for the following steps to complete:
120+
9. Wait for the following steps to complete:
118121
- Affinity / selection of node -> Directive / commission -> Command / initializing -> Canonical / name assignment -> Service -> Deployment -> Ingress -> Monitor / nominal.
119122
10. When you see the step "Monitor / nominal", you have an inference server running on your node.
120123

121124
## Step 6: Test your deployment
125+
122126
1. Upon completion of [Step 5](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-5-deploy-a-sample-recipe), test the deployment endpoint.
123127
2. Go to Api Root, then click "deployment_digests". Find the "service_endpoint_domain" on this page.
128+
124129
- This is <deployment-name>.<base-url>.nip.io for those who let us deploy the endpoint. If you use the default recipes above, an example of this would be:
125-
130+
126131
`vllm-inference-deployment.158-179-30-233.nip.io`
132+
127133
3. `curl` the metrics endpoint:
134+
128135
```bash
129136
curl -L vllm-inference-deployment.158-179-30-233.nip.io/metrics
130137
# HELP vllm:cache_config_info Information of the LLMEngine CacheConfig
@@ -136,7 +143,9 @@ vllm:num_requests_running{model_name="/models/NousResearch/Meta-Llama-3.1-8B-Ins
136143
# HELP vllm:num_requests_swapped Number of requests swapped to CPU.
137144
...
138145
```
146+
139147
4. Send an actual post request:
148+
140149
```bash
141150
curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, how are you?"}], "temperature": 0.7, "max_tokens": 100 }' vllm-inference-deployment.158-179-30-233.nip.io/v1/chat/completions | jq
142151

@@ -168,34 +177,38 @@ curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/
168177
"prompt_logprobs": null
169178
}
170179
```
180+
171181
5. When completed, undeploy the recipe:
172182
- go to Api Root -> deployment
173183
- Grab the whole deployment_uuid field for your deployment.
174184
- "deployment_uuid": "asdfjklafjdskl"
175185
- go to Api Root -> undeploy
176186
- paste the field "deployment_uuid" into the content box and wrap it in curly braces {}:
177-
- {"deployment_uuid": "asdfjklafjdskl"}
187+
- {"deployment_uuid": "asdfjklafjdskl"}
178188
- Click "POST"
179189
6. Monitor the undeploy:
180190
- go to Api Root -> deployment_logs
181191
- Look for: Directive decommission -> Ingress deleted -> Deployment deleted -> Service deleted -> Directive / decommission / completed.
182192

183193
## Step 7: Destroy the stack
194+
184195
Destroying the OCI AI Blueprints stack will not destroy any resources which were created or destroyed outside of the stack such as node pools or helm installs. Only things created by the stack will be destroyed for the stack. To destroy the stack:
185196

186197
1. Go to the console and navigate to Developer Services -> Resource Manager -> Stacks -> Your OCI AI Blueprints stack
187198
2. Click "Destroy" at the top
188199

189200
## Multi-Instance GPU Setup
201+
190202
If you have the nvidia gpu operator already installed, and would like to reconfigure it because you plan on using Multi-Instance GPUs (MIG) with your H100 nodes, you will need to manually update / reconfigure your cluster with helm.
191203

192204
This can be done like below:
205+
193206
```bash
194207
# Get the deployment name
195208
helm list -n gpu-operator
196209

197210
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
198-
gpu-operator-1742982512 gpu-operator 1 2025-03-26 05:48:41.913183 -0400 EDT deployed gpu-operator-v24.9.2 v24.9.2
211+
gpu-operator-1742982512 gpu-operator 1 2025-03-26 05:48:41.913183 -0400 EDT deployed gpu-operator-v24.9.2 v24.9.2
199212

200213
# Upgrade the deployment
201214
helm upgrade gpu-operator-1742982512 nvidia/gpu-operator \
@@ -212,7 +225,7 @@ REVISION: 2
212225
TEST SUITE: None
213226
```
214227

215-
216228
## Need Help?
217-
- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
229+
230+
- Check out [Known Issues & Solutions](docs/known_issues.md) for troubleshooting common problems.
218231
- For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).

README.md

Lines changed: 31 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,66 @@
11
# OCI AI Blueprints
2+
23
**Deploy, scale, and monitor AI workloads with the OCI AI Blueprints platform, and reduce your GPU onboarding time from weeks to minutes.**
34

45
OCI AI Blueprints is a streamlined, no-code solution for deploying and managing Generative AI workloads on Kubernetes Engine (OKE). By providing opinionated hardware recommendations, pre-packaged software stacks, and out-of-the-box observability tooling, OCI AI Blueprints helps you get your AI applications running quickly and efficiently—without wrestling with the complexities of infrastructure decisions, software compatibility, and MLOps best practices.
56

6-
[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/install.svg)](./GETTING_STARTED_README.md)
7+
[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/images/install.svg)](./GETTING_STARTED_README.md)
78

89
## Table of Contents
10+
911
**Getting Started**
12+
1013
- [Install AI Blueprints](./GETTING_STARTED_README.md)
11-
- [Access AI Blueprints Portal and API](./docs/api_documentation/accessing_oci_ai_blueprints/README.md)
14+
- [Access AI Blueprints Portal and API](docs/usage_guide.md)
1215

1316
**About OCI AI Blueprints**
14-
- [What is OCI AI Blueprints?](./docs/about/README.md#what-is-oci-ai-blueprints)
15-
- [Why use OCI AI Blueprints?](./docs/about/README.md#why-use-oci-ai-blueprints)
16-
- [Features](./docs/about/README.md#features)
17+
18+
- [What is OCI AI Blueprints?](docs/about.md)
19+
- [Why use OCI AI Blueprints?](docs/about.md)
20+
- [Features](docs/about.md)
1721
- [List of Blueprints](#blueprints)
18-
- [FAQ](./docs/about/README.md#frequently-asked-questions-faq)
22+
- [FAQ](docs/about.md)
1923
- [Support & Contact](https://github.com/oracle-quickstart/oci-ai-blueprints/blob/vkammari/doc_improvements/docs/about/README.md#frequently-asked-questions-faq)
2024

2125
**API Reference**
22-
- [API Reference Documentation](docs/api_documentation/README.md)
26+
27+
- [API Reference Documentation](docs/api_documentation.md)
2328

2429
**Additional Resources**
30+
2531
- [Publish Custom Blueprints](./docs/custom_blueprints)
26-
- [Installing Updates](./docs/installing_new_updates)
27-
- [IAM Policies](./docs/iam_policies/README.md)
28-
- [Repository Contents](./docs/about/README.md#repository-contents)
29-
- [Known Issues](docs/known_issues/README.md)
32+
- [Installing Updates](docs/installing_new_updates.md)
33+
- [IAM Policies](docs/iam_policies.md)
34+
- [Repository Contents](docs/about.md)
35+
- [Known Issues](docs/known_issues.md)
3036

3137
## Getting Started
38+
3239
Install OCI AI Blueprints by clicking on the button below:
3340

34-
[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/install.svg)](./GETTING_STARTED_README.md)
41+
[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/images/install.svg)](./GETTING_STARTED_README.md)
3542

3643
## Blueprints
3744

3845
Blueprints go beyond basic Terraform templates. Each blueprint:
46+
3947
- Offers validated hardware suggestions (e.g., optimal shapes, CPU/GPU configurations),
4048
- Includes end-to-end application stacks customized for different GenAI use cases, and
4149
- Comes with monitoring, logging, and auto-scaling configured out of the box.
4250

4351
After you install OCI AI Blueprints to an OKE cluster in your tenancy, you can deploy these pre-built blueprints:
4452

45-
| Blueprint | Description |
46-
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------|
47-
| [**LLM & VLM Inference with vLLM**](./docs/sample_blueprints/vllm-inference) | Deploy Llama 2/3/3.1 7B/8B models using NVIDIA GPU shapes and the vLLM inference engine with auto-scaling. |
48-
| [**Fine-Tuning Benchmarking**](./docs/sample_blueprints/lora-benchmarking) | Run MLCommons quantized Llama-2 70B LoRA finetuning on A100 for performance benchmarking. |
49-
| [**LoRA Fine-Tuning**](./docs/sample_blueprints/lora-fine-tuning) | LoRA fine-tuning of custom or HuggingFace models using any dataset. Includes flexible hyperparameter tuning. |
50-
| [**Health Check**](./docs/sample_blueprints/gpu-health-check) | Comprehensive evaluation of GPU performance to ensure optimal hardware readiness before initiating any intensive computational workload.|
51-
| [**CPU Inference**](./docs/sample_blueprints/cpu-inference) | Leverage Ollama to test CPU-based inference with models like Mistral, Gemma, and more. |
52-
| [**Multi-node Inference with RDMA and vLLM**](./docs/multi_node_inference) | Deploy Llama-405B sized LLMs across multiple nodes with RDMA using H100 nodes with vLLM and LeaderWorkerSet. |
53-
| [**Scaled Inference with vLLM**](./docs/auto_scaling) | Serve LLMs with auto-scaling using KEDA, which scales to multiple GPUs and nodes using application metrics like inference latency.|
54-
| [**LLM Inference with MIG**](./docs/mig_multi_instance_gpu) | Deploy LLMs to a fraction of a GPU with Nvidia’s multi-instance GPUs and serve them with vLLM. |
55-
| [**Job Queuing**](./docs/sample_blueprints/teams) | Take advantage of job queuing and enforce resource quotas and fair sharing between teams. |
53+
| Blueprint | Description |
54+
| --------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
55+
| [**LLM & VLM Inference with vLLM**](docs/sample_blueprints/llm_inference_with_vllm/README.md) | Deploy Llama 2/3/3.1 7B/8B models using NVIDIA GPU shapes and the vLLM inference engine with auto-scaling. |
56+
| [**Fine-Tuning Benchmarking**](./docs/sample_blueprints/lora-benchmarking) | Run MLCommons quantized Llama-2 70B LoRA finetuning on A100 for performance benchmarking. |
57+
| [**LoRA Fine-Tuning**](./docs/sample_blueprints/lora-fine-tuning) | LoRA fine-tuning of custom or HuggingFace models using any dataset. Includes flexible hyperparameter tuning. |
58+
| [**Health Check**](./docs/sample_blueprints/gpu-health-check) | Comprehensive evaluation of GPU performance to ensure optimal hardware readiness before initiating any intensive computational workload. |
59+
| [**CPU Inference**](./docs/sample_blueprints/cpu-inference) | Leverage Ollama to test CPU-based inference with models like Mistral, Gemma, and more. |
60+
| [**Multi-node Inference with RDMA and vLLM**](./docs/sample_blueprints/multi-node-inference/) | Deploy Llama-405B sized LLMs across multiple nodes with RDMA using H100 nodes with vLLM and LeaderWorkerSet. |
61+
| [**Autoscaling Inference with vLLM**](./docs/sample_blueprints/auto_scaling/) | Serve LLMs with auto-scaling using KEDA, which scales to multiple GPUs and nodes using application metrics like inference latency. |
62+
| [**LLM Inference with MIG**](./docs/sample_blueprints/mig_multi_instance_gpu/) | Deploy LLMs to a fraction of a GPU with Nvidia’s multi-instance GPUs and serve them with vLLM. |
63+
| [**Job Queuing**](./docs/sample_blueprints/teams) | Take advantage of job queuing and enforce resource quotas and fair sharing between teams. |
5664

5765
## Support & Contact
5866

0 commit comments

Comments
 (0)