oracle-quickstart
diff --git a/‎GETTING_STARTED_README.md‎
Lines changed: 4 additions & 4 deletions b/‎GETTING_STARTED_README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎INSTALLING_ONTO_EXISTING_CLUSTER_README.md‎
Lines changed: 34 additions & 21 deletions b/‎INSTALLING_ONTO_EXISTING_CLUSTER_README.md‎
Lines changed: 34 additions & 21 deletions
diff --git a/‎README.md‎
Lines changed: 31 additions & 23 deletions b/‎README.md‎
Lines changed: 31 additions & 23 deletions
@@ -15,8 +15,8 @@ This guide helps you install and use **OCI AI Blueprints** for the first time. Y
 
 ## Step 1: Set Up Policies in Your Tenancy
 
-1. If you are **not** a tenancy administrator, ask your admin to set up the required policies in the **root compartment**. These policies are listed [here](docs/iam_policies/README.md).
-2. If you **are** a tenancy administrator, Resource Manager will typically deploy the minimal required policies automatically, but you can reference the same [IAM policies doc](docs/iam_policies/README.md) for advanced or custom configurations if needed.
+1. If you are **not** a tenancy administrator, ask your admin to set up the required policies in the **root compartment**. These policies are listed [here](docs/iam_policies.md).
+2. If you **are** a tenancy administrator, Resource Manager will typically deploy the minimal required policies automatically, but you can reference the same [IAM policies doc](docs/iam_policies.md) for advanced or custom configurations if needed.
 
 ---
 
@@ -70,7 +70,7 @@ Now that your cluster is ready, follow these steps to install OCI AI Blueprints
 
 ## Step 5: Access the AI Blueprints API
 
-1. Follow the instruction to access the AI Blueprints API via web and/or CURL/Postman: [Ways to Access OCI AI Blueprints](./docs/api_documentation/accessing_oci_ai_blueprints/README.md#ways-to-access-oci-ai-blueprints)
+1. Follow the instruction to access the AI Blueprints API via web and/or CURL/Postman: [Ways to Access OCI AI Blueprints](docs/usage_guide.md)
 
 ---
 
@@ -95,5 +95,5 @@ Following this order ensures you do not have leftover services or dependencies i
 
 ## Need Help?
 
-- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
+- Check out [Known Issues & Solutions](docs/known_issues.md) for troubleshooting common problems.
 - For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).
@@ -22,7 +22,7 @@ Rather than installing blueprints onto a new cluster, a user may want to leverag
 
 ## Step 1: Set Up Policies in Your Tenancy
 
-Some or all of these policies may be in place as required by OKE. Please review the required policies listed [here](docs/iam_policies/README.md) and add any required policies which are missing.
+Some or all of these policies may be in place as required by OKE. Please review the required policies listed [here](docs/iam_policies.md) and add any required policies which are missing.
 
 1. If you are **not** a tenancy administrator, ask your admin to add additional required policies in the **root compartment**.
 2. If you **are** a tenancy administrator, you can either manually add the additional policies to an existing dynamic group, or let the resource manager deploy the required policies during stack creation.
@@ -45,6 +45,7 @@ Some or all of these policies may be in place as required by OKE. Please review
    - Under the section "OCI AI Blueprints IAM", click the checkbox to create the policies. (If you do not see this, ensure you've selected the correct choices for the questions above.)
 
 - Otherwise, create the policies if you are an admin, or have your admin create the policies.
+
 4. Select "YES" for all other options.
 5. Fill out additional fields for username and password, as well as Home Region.
 6. Under "OKE Cluster & VCN", select the cluster name and vcn name you found in step 2.
@@ -64,8 +65,8 @@ Some or all of these policies may be in place as required by OKE. Please review
    ```
 9. After you've added all the relevant tooling namespaces, apply the stack by hitting "Next", then click the "run apply" box.
 
-
 ## Step 4: Add Existing Nodes to Cluster (optional)
+
 If you have existing node pools in your original OKE cluster that you'd like Blueprints to be able to use, follow these steps after the stack is finished:
 
 1. Find the private IP address of the node you'd like to add.
@@ -82,49 +83,55 @@ If you have existing node pools in your original OKE cluster that you'd like Blu
    - If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
 3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
 4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
-5. Paste in the sample blueprint json found [here](./docs/sample_blueprints/add_node_to_control_plane.json).
+5. Paste in the sample blueprint json found [here](docs/sample_blueprints/exisiting_cluster_installation/add_node_to_control_plane.json).
 6. Modify the "recipe_node_name" field to the private IP address you found in step 1 above.
 7. Click "POST". This is a fast operation.
 8. Wait about 20 seconds and refresh the page. It should look like:
+
 ```json
 [
-    {
-        "mode": "update",
-        "recipe_id": null,
-        "creation_date": "2025-03-28 11:12 AM UTC",
-        "deployment_uuid": "750a________cc0bfd",
-        "deployment_name": "startupaddnode",
-        "deployment_status": "completed",
-        "deployment_directive": "commission"
-    }
+  {
+    "mode": "update",
+    "recipe_id": null,
+    "creation_date": "2025-03-28 11:12 AM UTC",
+    "deployment_uuid": "750a________cc0bfd",
+    "deployment_name": "startupaddnode",
+    "deployment_status": "completed",
+    "deployment_directive": "commission"
+  }
 ]
 ```
 
 ## Step 5: Deploy a sample recipe
+
 2. Go to the stack and click "Application information". Click the API Url.
    - If you get a warning about security, sometimes it takes a bit for the certificates to get signed. This will go away once that process completes on the OKE side.
 3. Login with the `Admin Username` and `Admin Password` in the Application information tab.
 4. Click the link next to "deployment" which will take you to a page with "Deployment List", and a content box.
-5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/vllm_inference_sample_shared_pool_blueprint.json).
+5. If you added a node from [Step 4](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-4-add-existing-nodes-to-cluster-optional), use the following shared node pool [blueprint](./docs/sample_blueprints/shared_node_pools/vllm_inference_sample_shared_pool_blueprint.json).
    - Depending on the node shape, you will need to change:
-   `"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
-6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](./docs/sample_blueprints/vllm_inference_sample_blueprint.json).
+     `"recipe_node_shape": "BM.GPU.A10.4"` to match your shape.
+6. If you did not add a node, or just want to deploy a fresh node, use the following [blueprint](docs/sample_blueprints/llm_inference_with_vllm/vllm-open-hf-model.json).
 7. Paste the blueprint you selected into context box on the deployment page and click "POST"
 8. To monitor the deployment, go back to "Api Root" and click "deployment_logs".
    - If you are deploying without a shared node pool, it can take 10-30 minutes to bring up a node, depending on shape and whether it is bare-metal or virtual.
    - If you are deploying with a shared node pool, the blueprint will deploy much more quickly.
    - It is common for a recipe to report "unhealthy" while it is deploying. This is caused by "Warnings" in the pod events when deploying to kubernetes. You only need to be alarmed when an "error" is reported.
-9. Wait for the following steps to complete: 
+9. Wait for the following steps to complete:
    - Affinity / selection of node -> Directive / commission -> Command / initializing -> Canonical / name assignment -> Service -> Deployment -> Ingress -> Monitor / nominal.
 10. When you see the step "Monitor / nominal", you have an inference server running on your node.
 
 ## Step 6: Test your deployment
+
 1. Upon completion of [Step 5](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#step-5-deploy-a-sample-recipe), test the deployment endpoint.
 2. Go to Api Root, then click "deployment_digests". Find the "service_endpoint_domain" on this page.
+
    - This is <deployment-name>.<base-url>.nip.io for those who let us deploy the endpoint. If you use the default recipes above, an example of this would be:
-   
+
    `vllm-inference-deployment.158-179-30-233.nip.io`
+
 3. `curl` the metrics endpoint:
+
 ```bash
 curl -L vllm-inference-deployment.158-179-30-233.nip.io/metrics
 # HELP vllm:cache_config_info Information of the LLMEngine CacheConfig
@@ -136,7 +143,9 @@ vllm:num_requests_running{model_name="/models/NousResearch/Meta-Llama-3.1-8B-Ins
 # HELP vllm:num_requests_swapped Number of requests swapped to CPU.
 ...
 ```
+
 4. Send an actual post request:
+
 ```bash
 curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, how are you?"}], "temperature": 0.7, "max_tokens": 100 }' vllm-inference-deployment.158-179-30-233.nip.io/v1/chat/completions | jq
 
@@ -168,34 +177,38 @@ curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/
   "prompt_logprobs": null
 }
 ```
+
 5. When completed, undeploy the recipe:
    - go to Api Root -> deployment
    - Grab the whole deployment_uuid field for your deployment.
      - "deployment_uuid": "asdfjklafjdskl"
    - go to Api Root -> undeploy
    - paste the field "deployment_uuid" into the content box and wrap it in curly braces {}:
-      - {"deployment_uuid": "asdfjklafjdskl"} 
+     - {"deployment_uuid": "asdfjklafjdskl"}
    - Click "POST"
 6. Monitor the undeploy:
    - go to Api Root -> deployment_logs
    - Look for: Directive decommission -> Ingress deleted -> Deployment deleted -> Service deleted -> Directive / decommission / completed.
 
 ## Step 7: Destroy the stack
+
 Destroying the OCI AI Blueprints stack will not destroy any resources which were created or destroyed outside of the stack such as node pools or helm installs. Only things created by the stack will be destroyed for the stack. To destroy the stack:
 
 1. Go to the console and navigate to Developer Services -> Resource Manager -> Stacks -> Your OCI AI Blueprints stack
 2. Click "Destroy" at the top
 
 ## Multi-Instance GPU Setup
+
 If you have the nvidia gpu operator already installed, and would like to reconfigure it because you plan on using Multi-Instance GPUs (MIG) with your H100 nodes, you will need to manually update / reconfigure your cluster with helm.
 
 This can be done like below:
+
 ```bash
 # Get the deployment name
 helm list -n gpu-operator
 
 NAME                   	NAMESPACE   	REVISION	UPDATED                             	STATUS  	CHART               	APP VERSION
-gpu-operator-1742982512	gpu-operator	1       	2025-03-26 05:48:41.913183 -0400 EDT	deployed	gpu-operator-v24.9.2	v24.9.2    
+gpu-operator-1742982512	gpu-operator	1       	2025-03-26 05:48:41.913183 -0400 EDT	deployed	gpu-operator-v24.9.2	v24.9.2
 
 # Upgrade the deployment
 helm upgrade gpu-operator-1742982512 nvidia/gpu-operator \
@@ -212,7 +225,7 @@ REVISION: 2
 TEST SUITE: None
 ```
 
-
 ## Need Help?
-- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
+
+- Check out [Known Issues & Solutions](docs/known_issues.md) for troubleshooting common problems.
 - For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).
@@ -1,58 +1,66 @@
 # OCI AI Blueprints
+
 **Deploy, scale, and monitor AI workloads with the OCI AI Blueprints platform, and reduce your GPU onboarding time from weeks to minutes.**
 
 OCI AI Blueprints is a streamlined, no-code solution for deploying and managing Generative AI workloads on Kubernetes Engine (OKE). By providing opinionated hardware recommendations, pre-packaged software stacks, and out-of-the-box observability tooling, OCI AI Blueprints helps you get your AI applications running quickly and efficiently—without wrestling with the complexities of infrastructure decisions, software compatibility, and MLOps best practices.
 
-[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/install.svg)](./GETTING_STARTED_README.md)
+[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/images/install.svg)](./GETTING_STARTED_README.md)
 
 ## Table of Contents
+
 **Getting Started**
+
 - [Install AI Blueprints](./GETTING_STARTED_README.md)
-- [Access AI Blueprints Portal and API](./docs/api_documentation/accessing_oci_ai_blueprints/README.md)
+- [Access AI Blueprints Portal and API](docs/usage_guide.md)
 
 **About OCI AI Blueprints**
-- [What is OCI AI Blueprints?](./docs/about/README.md#what-is-oci-ai-blueprints)
-- [Why use OCI AI Blueprints?](./docs/about/README.md#why-use-oci-ai-blueprints)
-- [Features](./docs/about/README.md#features)
+
+- [What is OCI AI Blueprints?](docs/about.md)
+- [Why use OCI AI Blueprints?](docs/about.md)
+- [Features](docs/about.md)
 - [List of Blueprints](#blueprints)
-- [FAQ](./docs/about/README.md#frequently-asked-questions-faq)
+- [FAQ](docs/about.md)
 - [Support & Contact](https://github.com/oracle-quickstart/oci-ai-blueprints/blob/vkammari/doc_improvements/docs/about/README.md#frequently-asked-questions-faq)
 
 **API Reference**
-- [API Reference Documentation](docs/api_documentation/README.md)
+
+- [API Reference Documentation](docs/api_documentation.md)
 
 **Additional Resources**
+
 - [Publish Custom Blueprints](./docs/custom_blueprints)
-- [Installing Updates](./docs/installing_new_updates)
-- [IAM Policies](./docs/iam_policies/README.md)
-- [Repository Contents](./docs/about/README.md#repository-contents)
-- [Known Issues](docs/known_issues/README.md)
+- [Installing Updates](docs/installing_new_updates.md)
+- [IAM Policies](docs/iam_policies.md)
+- [Repository Contents](docs/about.md)
+- [Known Issues](docs/known_issues.md)
 
 ## Getting Started
+
 Install OCI AI Blueprints by clicking on the button below:
 
-[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/install.svg)](./GETTING_STARTED_README.md)
+[![Install OCI AI Blueprints](https://raw.githubusercontent.com/oracle-quickstart/oci-ai-blueprints/9d1d61b3b79e61dabe19d1672c3e54704b294a93/docs/images/install.svg)](./GETTING_STARTED_README.md)
 
 ## Blueprints
 
 Blueprints go beyond basic Terraform templates. Each blueprint:
+
 - Offers validated hardware suggestions (e.g., optimal shapes, CPU/GPU configurations),
 - Includes end-to-end application stacks customized for different GenAI use cases, and
 - Comes with monitoring, logging, and auto-scaling configured out of the box.
 
 After you install OCI AI Blueprints to an OKE cluster in your tenancy, you can deploy these pre-built blueprints:
 
-| Blueprint                    | Description                                                                                                                             |
-| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------|
-| [**LLM & VLM Inference with vLLM**](./docs/sample_blueprints/vllm-inference)  | Deploy Llama 2/3/3.1 7B/8B models using NVIDIA GPU shapes and the vLLM inference engine with auto-scaling.                              |
-| [**Fine-Tuning Benchmarking**](./docs/sample_blueprints/lora-benchmarking) | Run MLCommons quantized Llama-2 70B LoRA finetuning on A100 for performance benchmarking.                                               |
-| [**LoRA Fine-Tuning**](./docs/sample_blueprints/lora-fine-tuning)         | LoRA fine-tuning of custom or HuggingFace models using any dataset. Includes flexible hyperparameter tuning.                            |
-| [**Health Check**](./docs/sample_blueprints/gpu-health-check)             | Comprehensive evaluation of GPU performance to ensure optimal hardware readiness before initiating any intensive computational workload.|
-| [**CPU Inference**](./docs/sample_blueprints/cpu-inference)            | Leverage Ollama to test CPU-based inference with models like Mistral, Gemma, and more.                                                  |
-| [**Multi-node Inference with RDMA and vLLM**](./docs/multi_node_inference)            | Deploy Llama-405B sized LLMs across multiple nodes with RDMA using H100 nodes with vLLM and LeaderWorkerSet.                                          |
-| [**Scaled Inference with vLLM**](./docs/auto_scaling)            | Serve LLMs with auto-scaling using KEDA, which scales to multiple GPUs and nodes using application metrics like inference latency.|
-| [**LLM Inference with MIG**](./docs/mig_multi_instance_gpu)            | Deploy LLMs to a fraction of a GPU with Nvidia’s multi-instance GPUs and serve them with vLLM.                                                 |
-| [**Job Queuing**](./docs/sample_blueprints/teams)            | Take advantage of job queuing and enforce resource quotas and fair sharing between teams.                 |
+| Blueprint                                                                                     | Description                                                                                                                              |
+| --------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| [**LLM & VLM Inference with vLLM**](docs/sample_blueprints/llm_inference_with_vllm/README.md)                  | Deploy Llama 2/3/3.1 7B/8B models using NVIDIA GPU shapes and the vLLM inference engine with auto-scaling.                               |
+| [**Fine-Tuning Benchmarking**](./docs/sample_blueprints/lora-benchmarking)                    | Run MLCommons quantized Llama-2 70B LoRA finetuning on A100 for performance benchmarking.                                                |
+| [**LoRA Fine-Tuning**](./docs/sample_blueprints/lora-fine-tuning)                             | LoRA fine-tuning of custom or HuggingFace models using any dataset. Includes flexible hyperparameter tuning.                             |
+| [**Health Check**](./docs/sample_blueprints/gpu-health-check)                                 | Comprehensive evaluation of GPU performance to ensure optimal hardware readiness before initiating any intensive computational workload. |
+| [**CPU Inference**](./docs/sample_blueprints/cpu-inference)                                   | Leverage Ollama to test CPU-based inference with models like Mistral, Gemma, and more.                                                   |
+| [**Multi-node Inference with RDMA and vLLM**](./docs/sample_blueprints/multi-node-inference/) | Deploy Llama-405B sized LLMs across multiple nodes with RDMA using H100 nodes with vLLM and LeaderWorkerSet.                             |
+| [**Autoscaling Inference with vLLM**](./docs/sample_blueprints/auto_scaling/)                 | Serve LLMs with auto-scaling using KEDA, which scales to multiple GPUs and nodes using application metrics like inference latency.       |
+| [**LLM Inference with MIG**](./docs/sample_blueprints/mig_multi_instance_gpu/)                | Deploy LLMs to a fraction of a GPU with Nvidia’s multi-instance GPUs and serve them with vLLM.                                           |
+| [**Job Queuing**](./docs/sample_blueprints/teams)                                             | Take advantage of job queuing and enforce resource quotas and fair sharing between teams.                                                |
 
 ## Support & Contact