oracle-devrel
diff --git a/‎app-dev/devops-and-containers/oke/README.md‎
Lines changed: 2 additions & 2 deletions b/‎app-dev/devops-and-containers/oke/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/README.md‎
Lines changed: 4 additions & 3 deletions b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/README.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-discovery-questionnaire/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-discovery-questionnaire/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-solution-definition/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-solution-definition/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-architecture-diagrams/README.md‎
Lines changed: 3 additions & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-architecture-diagrams/README.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-discovery-questionnaire/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-discovery-questionnaire/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-essbase-decision-tree/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-essbase-decision-tree/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-fsdr/README.md‎
Lines changed: 6 additions & 4 deletions b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-fsdr/README.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-solution-definition/README.md‎
Lines changed: 1 addition & 1 deletion b/‎cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-solution-definition/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/README.md‎
Lines changed: 221 additions & 0 deletions b/‎cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/README.md‎
Lines changed: 221 additions & 0 deletions
@@ -14,7 +14,6 @@ Reviewed: 20.12.2023
 
 - [Cloud Coaching - Deploy Microservices with Kubernetes (OKE)](https://www.youtube.com/watch?v=mu5jbFjKKn0)
 - [Cloud Coaching - OCI Observability for Kubernetes monitoring](https://www.youtube.com/watch?v=mu5jbFjKKn0)
-- [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
 - [Advanced Kubernetes Networking: OKE in a Hub-Spoke Architectures](https://medium.com/oracledevs/advanced-kubernetes-networking-oke-in-a-hub-spoke-architectures-f0ba2256e824)
 - [Scale and optimize Jenkins on Oracle Cloud Infrastructure Container Engine for Kubernetes](https://docs.oracle.com/en/solutions/oci-jenkins-oke/index.html#GUID-23A8EB94-DFFC-4D5C-897F-5F59423447D2)
 - [Argo Workflow on OKE for limitless ML](https://www.youtube.com/watch?v=HOWrwBVuLp0)
@@ -40,7 +39,8 @@ Reviewed: 20.12.2023
 - [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
 - [Disaster Recovery — Notes on Velero and OKE, Part 2: Stateful Pods with Persistent Volumes and Block Volume](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-2-stateful-pods-with-persistent-volumes-and-80204b3ac6d7)
 - [Disaster Recovery: Notes on Velero and OKE — part 3: Stateful Pods with Persistent Volumes and File Storage](https://medium.com/oracledevs/oke-disaster-recovery-notes-on-velero-and-oke-part-3-stateful-pods-with-persistent-volumes-and-a6eacef7600b)
-- [Test S3 Compatibility - Preparing Backups and DR for OKE and Velero](https://github.com/fharris/oci-s3-compatibility)
+- [Authentication with OAuth2-Proxy, Kubernetes and OCI](https://medium.com/oracledevs/authentication-with-oauth2-proxy-kubernetes-and-oci-6c8d87769184)
+- [Code for Authentication with OAuth2-Proxy Kubernetes and OCI](https://github.com/fharris/oauth2-proxy-demo)
 
 
 # Useful Links
 
@@ -6,7 +6,7 @@ These resources aim to offer guidance throughout your migration, enabling you to
 
 Explore these materials to enhance your migration strategy. We appreciate your participation and are committed to supporting your cloud migration journey.
 
-Reviewed: 7.2.2024
+Reviewed: 22.7.2024
 
 # Table of Contents
 
@@ -18,8 +18,9 @@ Reviewed: 7.2.2024
 
 # Team Publications
 
-- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html) 
- 
+- [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
+- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html)
+
 # Useful Links
 
 - [EPM System Release 11.2.17 announcement](https://blogs.oracle.com/proactivesupportepm/post/enterprise-performance-management-epm-11217-is-available)
 
@@ -2,7 +2,7 @@
 
 This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of  Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
 
-Reviewed: 7.2.2024
+Reviewed: 22.7.2024
 
 # When to use this asset?
 
 
@@ -12,7 +12,7 @@ This document serves as an integral asset for individuals and organizations seek
 
 Use this document as a starting point for the solution definition of your Essbase implementation project. This asset includes example architecture diagrams for DrawIO in the file essbase-architecture-diagrams-example.drawio.
 
-Reviewed: 19.4.2024
+Reviewed: 22.7.2024
 
 # Conclusion
 The Essbase Workload Solution Definition is expected to serve as a definitive guide to the project. All participants are encouraged to provide feedback, raise queries, and make contributions to enhance the overall project's success.
 
@@ -8,7 +8,9 @@ They serve as a helpful resource for defining solutions, preparing designs, unde
 
 For a more professional and consistent presentation, these diagrams use the official OCI icon pack for draw.io. You can download the icons pack from the official Oracle page [here](https://docs.oracle.com/en-us/iaas/Content/General/Reference/graphicsfordiagrams.htm)
 
-Reviewed: 7.2.2024
+Hyperion EPM System Reference architecture on OCI can be found in the [Architecture Center](https://docs.oracle.com/en/solutions/deploy-hyperion-oci/index.html)
+
+Reviewed: 22.7.2024
 
 # Contents
 
 
@@ -2,7 +2,7 @@
 
 This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of Hyperion and Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
 
-Reviewed: 7.2.2024
+Reviewed: 22.7.2024
 
 # When to use this asset?
 
 
@@ -2,7 +2,7 @@
 
 This GitHub repository hosts a decision path designed to guide you through the process of upgrading of Hyperion EPM System and Essbase or migrating these products to Oracle Cloud Infrastructure (OCI).
 
-Reviewed: 7.2.2024
+Reviewed: 22.7.2024
 
 # When to use this asset?
 
 
@@ -5,18 +5,20 @@ This GitHub repository provides custom scripts that serve as a starting point fo
 Included scripts:
 - start_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
 - stop_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
-- host_switch_failover.ps1/sh - script to update host file after switch to the standby region. Windows (PowerShell) or Linux (Bash). 
-- host_switch_failback.ps1/sh - script to update host file after switch from standby region back to the primary region. Windows (PowerShell) or Linux (Bash).
+- host_switch_failover.ps1/sh - script to update the host file after switching to the standby region. Windows (PowerShell) or Linux (Bash) script to be used in a user-defined plan group after starting the compute nodes in the standby region.
+- host_switch_failback.ps1/sh - script to update the host file after switching from the standby region back to the primary region. Windows (PowerShell) or Linux (Bash) to be used in a user-defined plan group after starting the compute nodes in the primary region.
 
-Reviewed: 6.6.2024
+The complete tutorial is available here: [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
+
+Reviewed: 22.7.2024
 
 # When to use this asset?
 
 Use these scripts to customize your Full Stack Disaster Recovery plans and automate switchovers and failovers between OCI regions for EPM System applications.
 
 # How to use this asset?
 
-Use these scripts in FSDR user defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
+Use these scripts in FSDR user-defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
 
 # Useful Links
 
 
@@ -2,7 +2,7 @@
 
 This repository contains an in-depth guide for Oracle Hyperion migration projects. It offers a high-level solution definition for migrating or establishing Hyperion Workloads on Oracle Cloud Infrastructure (OCI). With a comprehensive representation of the current state, prospective state, potential project scope, and anticipated timeline, this document aims to provide a precise understanding of the project's scope and intention to all participating entities.
 
-Reviewed date: 19.4.2024
+Reviewed date: 22.7.2024
 
 # When to use this asset?
 
 
@@ -0,0 +1,221 @@
+# From Fine-Tuning to Serving LLMs with OCI and dstack
+
+dstack is an open-source tool that simplifies AI container orchestration and makes distributed training and deployment of LLMs more accessible. Combining dstack and OCI unlocks a streamlined process for setting up cloud infrastructure for distributed training and scalable model deployment.
+
+This article walks you through fine-tuning a model using dstack on OCI, incorporating best practices from the Hugging Face Alignment Handbook, and then deploying the model using Hugging Face’s Text Generation Inference (TGI).
+
+**NOTE**: The experiment described in the article used an OCI cluster of three nodes, each with 2 x A10 GPUs, to fine-tune the Gemma 7B model.
+
+## How dstack works
+
+dstack offers a unified interface for the development, training, and deployment of AI models across any cloud or data center. For example, you can specify a configuration for a training task or a model to be deployed, and dstack will take care of setting up the required infrastructure and orchestrating the containers. One of the advantages dstack offers is that it allows the use of any hardware, frameworks, and scripts.
+
+## Setting up dstack with OCI
+
+With four simple steps, we can use dstack with OCI. First, we need to install the dstack Python package. Since dstack supports multiple cloud providers, we can narrow down the scope to OCI:
+
+```
+pip install dstack[oci]
+```
+
+Next, we need to configure the OCI specific credentials inside the `~/.dstack/server/config.yml`. Below assumes that you have credentials for OCI CLI configured. For other configuration options, please follow the dstack’s official document.
+
+```
+projects:
+- name: main
+  backends:
+  - type: oci
+    creds:
+      type: default
+```
+
+The final step is to run the dstack server as below.
+
+```
+dstack server
+INFO     Applying ~/.dstack/server/config.yml...
+INFO     Configured the main project in ~/.dstack/config.yml
+INFO     The admin token is ab6e8759-9cd9-4e84-8d47-5b94ac877ebf
+INFO     The dstack server 0.18.4 is running at http://127.0.0.1:3000
+```
+
+Then, switch to the folder with your project scripts and initialize dstack.
+
+```
+dstack init
+```
+
+## Fine-Tuning on OCI with dstack
+To fine-tune Gemma 7B model, we’ll be using the Hugging Face Alignment Handbook to ensure the incorporation of the best fine-tuning practices. The source code of this tutorial can be obtained from GitHub. Let's dive into the practical steps for fine-tuning your LLM. 
+
+Once, you switch to the project folder, here's the command to initiate the fine-tuning job on OCI with dstack:
+
+```
+ACCEL_CONFIG_PATH=fsdp_qlora_full_shard.yaml \   
+  FT_MODEL_CONFIG_PATH=qlora_finetune_config.yaml \
+  HUGGING_FACE_HUB_TOKEN=xxxx \
+  WANDB_API_KEY=xxxx \
+  dstack run . -f ft.task.dstack.yml
+```
+
+The `FT_MODEL_CONFIG_PATH`, `ACCEL_CONFIG_PATH`, `HUGGING_FACE_HUB_TOKEN`, and `WANDB_API_KEY` environment variables are defined inside the `ft.task.dstack.yml` task configuration. `dstack run` submits the task defined in `ft.task.dstack.yml` on OCI. 
+
+**NOTE**: that dstack automatically copies the current directory’s content when executing the task.
+
+Let’s explore the key parts of each YAML file (for the full contents, check the repository). 
+
+The `qlora_finetune_config.yaml` file is the recipe configuration that the Alignment Handbook can understand about how you would want to fine-tune an LLM:
+
+```
+# Model arguments
+model_name_or_path: google/gemma-7b
+tokenizer_name_or_path: philschmid/gemma-tokenizer-chatml 
+torch_dtype: bfloat16
+bnb_4bit_quant_storage: bfloat16
+
+# LoRA arguments
+load_in_4bit: true
+use_peft: true
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules:
+  - q_proj
+  - k_proj
+# ...
+
+
+# Data training arguments
+dataset_mixer:
+  chansung/mental_health_counseling_conversations: 1.0
+dataset_splits:
+  - train
+  - test
+# ...
+```
+
+* **Model arguments**
+
+    * `model_name_or_path`: Google’s Gemma 7B is chosen as the base model
+    * `tokenizer_name_or_path`: alignment-handbook uses apply_chat_template() method of the chosen tokenizer. This tutorial uses the ChatML template instead of the Gemma 7B’s standard conversation template.
+    * `torch_dtype` and `bnb_4bit_quant_storage`: these two values should be defined the same if we want to leverage FSDP+QLoRA fine-tuning method. Since Gemma 7B is hard to fit into a single A10 GPU, this blog post uses FSDP+QLoRA to shard a model into 2 x A10 GPUs while leveraging QLoRA technique.
+* **LoRA arguments**: LoRA specific configurations. Since this blog post leverages FSDP+QLoRA technique, `load_in_4bit` is set to `true`. Other configurations could vary from experiment to experiment.
+* **Data training arguments**: we have prepared a dataset which is based on Amod’s mental health counseling conversations’ dataset. Since alignment-handbook only understands the data in the form of `[{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, …]` which can be interpreted with tokenizer’s `apply_chat_template()` method, the prepared dataset is basically the conversion of the original dataset into the `apply_chat_template()` compatible format.
+
+The `fsdp_qlora_full_shard.yaml` file configures accelerate how to use the underlying infrastructure for fine-tuning the LLM:
+
+```
+compute_environment: LOCAL_MACHINE
+distributed_type: FSDP  # Use Fully Sharded Data Parallelism
+fsdp_config: 
+  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  fsdp_backward_prefetch: BACKWARD_PRE
+  fsdp_cpu_ram_efficient_loading: true
+  fsdp_use_orig_params: false 
+  fsdp_offload_params: true
+  fsdp_sharding_strategy: FULL_SHARD
+  # ... (other FSDP configurations)
+# ... (other configurations)
+```
+
+* `distributed_type`: `FSDP` indicates the use of Fully Sharded Data Parallel (FSDP), a technique that enables training large models that would otherwise not fit on a single GPU.
+* `fsdp_config`: These set up how FSDP operates, such as how the model is sharded (`fsdp_sharding_strategy`) and whether parameters are offloaded to CPU (`fsdp_offload_params`).
+
+![Hybrid shards](assets/images/image2.png "Hybrid shards")
+
+With the `FSDP` of `distributed_type` and `FULL_SHARD` of `fsdp_config`’s `fsdp_sharding_strategy`, a model will be sharded across multiple GPUs in a single machine. When dealing with multiple compute nodes, each node will host an identical copy of the model, which is itself split across multiple GPUs within that node. This means each partitioned model instance on each node processes different sections or batches of your dataset. To distribute a single model across multiple GPUs spanning across multiple nodes, configure the parameter `fsdp_sharding_strategy` as `HYBRID_SHARD`.
+
+Additional parameters like "machine_rank," "num_machines," and "num_processes" are important for coordination. However, it's recommended to set these values dynamically at runtime, as this provides flexibility when switching between different infrastructure setups.
+
+## The power of dstack: simplified configuration
+
+Finally, let's explore the `fsdp_qlora_full_shard.yaml` configuration that puts everything together and instructs dstack on how to provision infrastructure and run the task.
+
+```
+type: task
+nodes: 3
+
+python: "3.11" 
+env:
+  - ACCEL_CONFIG_PATH
+  - FT_MODEL_CONFIG_PATH
+  - HUGGING_FACE_HUB_TOKEN
+  - WANDB_API_KEY 
+commands:
+  # ... (setup steps, cloning repo, installing requirements)
+  - ACCELERATE_LOG_LEVEL=info accelerate launch \
+      --config_file recipes/custom/accel_config.yaml \
+      --main_process_ip=$DSTACK_MASTER_NODE_IP \
+      --main_process_port=8008 \
+      --machine_rank=$DSTACK_NODE_RANK \
+      --num_processes=$DSTACK_GPUS_NUM \
+      --num_machines=$DSTACK_NODES_NUM \
+      scripts/run_sft.py recipes/custom/config.yaml
+ports:
+  - 6006 
+resources:
+  gpu: 1..2
+  shm_size: 24GB
+```
+
+**Key points to highlight**:
+* **Seamless Integration**: dstack effortlessly integrates with Hugging Face's open source ecosystem. In Particular, you can simply use the accelerate library with the configurations that we defined in `fsdp_qlora_full_shard.yaml` as normal.
+* **Automatic Configuration**: `DSTACK_MASTER_NODE_IP`, `DSTACK_NODE_RANK`, `DSTACK_GPUS_NUM`, and `DSTACK_NODES_NUM` variables are automatically managed by dstack, reducing manual setup.
+* **Resource Allocation**: dstack makes it easy to specify the number of nodes and GPUs (gpu: 1..2) for your fine-tuning job. Hence, for this blog post, there are three nodes each of which is equipped with 2 x A10(24GB) GPUs.
+
+## Serving your fine-tuned model with dstack
+
+Once your model is fine-tuned, dstack makes it a breeze to deploy it on OCI using Hugging Face's Text Generation Inference (TGI) framework. 
+
+Here's an example of how you can define a service in dstack:
+
+```
+type: service
+image: ghcr.io/huggingface/text-generation-inference:latest
+env: 
+  - HUGGING_FACE_HUB_TOKEN
+  - MODEL_ID=chansung/mental_health_counseling_merged_v0.1 
+commands: 
+  - text-generation-launcher \
+    --max-input-tokens 512 --max-total-tokens 1024 \      
+    --max-batch-prefill-tokens 512 --port 8000
+port: 8000
+
+resources:
+  gpu:
+    memory: 48GB
+
+# (Optional) Enable the OpenAI-compatible endpoint
+model: 
+  format: tgi
+  type: chat
+  name: chansung/mental_health_counseling_merged_v0.1 
+```
+
+**Key advantages of this approach**:
+* **Secure HTTPS Gateway**: Dstack simplifies the process of setting up a secure HTTPS connection through a gateway, a crucial aspect of production-level model serving.
+* **Optimized for Inference**: The TGI framework is designed for efficient text generation inference, ensuring your model delivers responsive and reliable results.
+* **Auto-scaling**: dstack allows to specify the auto-scaling policy, including the minimum and maximum number of model replicas.
+
+At this point, you can interact with the service via standard curl command and Python’s requests, OpenAI SDK, and Hugging Face’s InferenceClient libraries. For instance, the code snippet below shows an example of curl.
+
+```
+curl -X POST https://black-octopus-1.mycustomdomain.com/generate \
+  -H "Authorization: Bearer <dstack-token>" \
+  -H 'Content-Type: application/json' \
+  -d '{"inputs": "I feel bad...", "parameters": {"max_new_tokens": 128}}' 
+```
+
+Additionally, for a deployed model, dstack automatically provides a user interface to directly interact with the model:
+
+<p align="center">
+    <img src="https://github.com/oracle-devrel/technology-engineering/blob/dstack-tutorial/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/assets/images/image1.png" width="600">
+</p>
+
+## Conclusion
+
+By following the steps outlined in this article, you've unlocked a powerful approach to fine-tuning and deploying LLMs using the combined capabilities of dstack, OCI, and Hugging Face's ecosystem. You can now leverage dstack's user-friendly interface to manage your OCI resources effectively, streamlining the process of setting up distributed training environments for your LLM projects. 
+
+Furthermore, the integration with Hugging Face's Alignment Handbook and TGI framework empowers you to fine-tune and serve your models seamlessly, ensuring they're optimized for performance and ready for real-world applications. We encourage you to explore the possibilities further and experiment with different models and configurations to achieve your desired outcomes in the world of natural language processing.
+
+**About the author**: Chansung Park is a HuggingFace fellow and is an AI researcher working on LLMs.