Skip to content

Commit c32b6d5

Browse files
Merge branch 'main' into litellm-tutorial
2 parents a24633b + 6439d2a commit c32b6d5

File tree

319 files changed

+8047
-2198
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

319 files changed

+8047
-2198
lines changed

app-dev/devops-and-containers/oke/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ Reviewed: 20.12.2023
1414

1515
- [Cloud Coaching - Deploy Microservices with Kubernetes (OKE)](https://www.youtube.com/watch?v=mu5jbFjKKn0)
1616
- [Cloud Coaching - OCI Observability for Kubernetes monitoring](https://www.youtube.com/watch?v=mu5jbFjKKn0)
17-
- [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
1817
- [Advanced Kubernetes Networking: OKE in a Hub-Spoke Architectures](https://medium.com/oracledevs/advanced-kubernetes-networking-oke-in-a-hub-spoke-architectures-f0ba2256e824)
1918
- [Scale and optimize Jenkins on Oracle Cloud Infrastructure Container Engine for Kubernetes](https://docs.oracle.com/en/solutions/oci-jenkins-oke/index.html#GUID-23A8EB94-DFFC-4D5C-897F-5F59423447D2)
2019
- [Argo Workflow on OKE for limitless ML](https://www.youtube.com/watch?v=HOWrwBVuLp0)
@@ -40,7 +39,8 @@ Reviewed: 20.12.2023
4039
- [Disaster Recovery — Notes on Velero and OKE, Part 1: Stateless Pods](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-1-stateless-pods-b4ba3e737386)
4140
- [Disaster Recovery — Notes on Velero and OKE, Part 2: Stateful Pods with Persistent Volumes and Block Volume](https://medium.com/oracledevs/disaster-recovery-notes-on-velero-and-oke-part-2-stateful-pods-with-persistent-volumes-and-80204b3ac6d7)
4241
- [Disaster Recovery: Notes on Velero and OKE — part 3: Stateful Pods with Persistent Volumes and File Storage](https://medium.com/oracledevs/oke-disaster-recovery-notes-on-velero-and-oke-part-3-stateful-pods-with-persistent-volumes-and-a6eacef7600b)
43-
- [Test S3 Compatibility - Preparing Backups and DR for OKE and Velero](https://github.com/fharris/oci-s3-compatibility)
42+
- [Authentication with OAuth2-Proxy, Kubernetes and OCI](https://medium.com/oracledevs/authentication-with-oauth2-proxy-kubernetes-and-oci-6c8d87769184)
43+
- [Code for Authentication with OAuth2-Proxy Kubernetes and OCI](https://github.com/fharris/oauth2-proxy-demo)
4444

4545

4646
# Useful Links

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ These resources aim to offer guidance throughout your migration, enabling you to
66

77
Explore these materials to enhance your migration strategy. We appreciate your participation and are committed to supporting your cloud migration journey.
88

9-
Reviewed: 7.2.2024
9+
Reviewed: 22.7.2024
1010

1111
# Table of Contents
1212

@@ -18,8 +18,9 @@ Reviewed: 7.2.2024
1818

1919
# Team Publications
2020

21-
- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html)
22-
21+
- [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
22+
- [Cyber recovery solution on Oracle Cloud Infrastructure](https://docs.oracle.com/en/solutions/oci-automated-cyber-recovery/index.html)
23+
2324
# Useful Links
2425

2526
- [EPM System Release 11.2.17 announcement](https://blogs.oracle.com/proactivesupportepm/post/enterprise-performance-management-epm-11217-is-available)

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-discovery-questionnaire/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/essbase-solution-definition/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This document serves as an integral asset for individuals and organizations seek
1212

1313
Use this document as a starting point for the solution definition of your Essbase implementation project. This asset includes example architecture diagrams for DrawIO in the file essbase-architecture-diagrams-example.drawio.
1414

15-
Reviewed: 19.4.2024
15+
Reviewed: 22.7.2024
1616

1717
# Conclusion
1818
The Essbase Workload Solution Definition is expected to serve as a definitive guide to the project. All participants are encouraged to provide feedback, raise queries, and make contributions to enhance the overall project's success.

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-architecture-diagrams/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@ They serve as a helpful resource for defining solutions, preparing designs, unde
88

99
For a more professional and consistent presentation, these diagrams use the official OCI icon pack for draw.io. You can download the icons pack from the official Oracle page [here](https://docs.oracle.com/en-us/iaas/Content/General/Reference/graphicsfordiagrams.htm)
1010

11-
Reviewed: 7.2.2024
11+
Hyperion EPM System Reference architecture on OCI can be found in the [Architecture Center](https://docs.oracle.com/en/solutions/deploy-hyperion-oci/index.html)
12+
13+
Reviewed: 22.7.2024
1214

1315
# Contents
1416

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-discovery-questionnaire/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This document serves as a standard questionnaire designed to gather crucial information necessary for the execution of Hyperion and Essbase application migration projects. It captures specific data that aids in estimating the effort required for a successful migration.
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-essbase-decision-tree/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This GitHub repository hosts a decision path designed to guide you through the process of upgrading of Hyperion EPM System and Essbase or migrating these products to Oracle Cloud Infrastructure (OCI).
44

5-
Reviewed: 7.2.2024
5+
Reviewed: 22.7.2024
66

77
# When to use this asset?
88

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-fsdr/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,20 @@ This GitHub repository provides custom scripts that serve as a starting point fo
55
Included scripts:
66
- start_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
77
- stop_services.ps1/sh - script to start all EPM System services, including WLS and OHS, on Windows (PowerShell) or Linux (Bash) compute
8-
- host_switch_failover.ps1/sh - script to update host file after switch to the standby region. Windows (PowerShell) or Linux (Bash).
9-
- host_switch_failback.ps1/sh - script to update host file after switch from standby region back to the primary region. Windows (PowerShell) or Linux (Bash).
8+
- host_switch_failover.ps1/sh - script to update the host file after switching to the standby region. Windows (PowerShell) or Linux (Bash) script to be used in a user-defined plan group after starting the compute nodes in the standby region.
9+
- host_switch_failback.ps1/sh - script to update the host file after switching from the standby region back to the primary region. Windows (PowerShell) or Linux (Bash) to be used in a user-defined plan group after starting the compute nodes in the primary region.
1010

11-
Reviewed: 6.6.2024
11+
The complete tutorial is available here: [Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery](https://docs.oracle.com/en/learn/fsdr-integration-epm/)
12+
13+
Reviewed: 22.7.2024
1214

1315
# When to use this asset?
1416

1517
Use these scripts to customize your Full Stack Disaster Recovery plans and automate switchovers and failovers between OCI regions for EPM System applications.
1618

1719
# How to use this asset?
1820

19-
Use these scripts in FSDR user defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
21+
Use these scripts in FSDR user-defined plan groups [link](https://docs.oracle.com/en-us/iaas/disaster-recovery/doc/add-user-defined-plan-groups.html)
2022

2123
# Useful Links
2224

cloud-architecture/oracle-apps-hyperion-siebel-gbu/hyperion-essbase/hyperion-solution-definition/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This repository contains an in-depth guide for Oracle Hyperion migration projects. It offers a high-level solution definition for migrating or establishing Hyperion Workloads on Oracle Cloud Infrastructure (OCI). With a comprehensive representation of the current state, prospective state, potential project scope, and anticipated timeline, this document aims to provide a precise understanding of the project's scope and intention to all participating entities.
44

5-
Reviewed date: 19.4.2024
5+
Reviewed date: 22.7.2024
66

77
# When to use this asset?
88

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# From Fine-Tuning to Serving LLMs with OCI and dstack
2+
3+
dstack is an open-source tool that simplifies AI container orchestration and makes distributed training and deployment of LLMs more accessible. Combining dstack and OCI unlocks a streamlined process for setting up cloud infrastructure for distributed training and scalable model deployment.
4+
5+
This article walks you through fine-tuning a model using dstack on OCI, incorporating best practices from the Hugging Face Alignment Handbook, and then deploying the model using Hugging Face’s Text Generation Inference (TGI).
6+
7+
**NOTE**: The experiment described in the article used an OCI cluster of three nodes, each with 2 x A10 GPUs, to fine-tune the Gemma 7B model.
8+
9+
## How dstack works
10+
11+
dstack offers a unified interface for the development, training, and deployment of AI models across any cloud or data center. For example, you can specify a configuration for a training task or a model to be deployed, and dstack will take care of setting up the required infrastructure and orchestrating the containers. One of the advantages dstack offers is that it allows the use of any hardware, frameworks, and scripts.
12+
13+
## Setting up dstack with OCI
14+
15+
With four simple steps, we can use dstack with OCI. First, we need to install the dstack Python package. Since dstack supports multiple cloud providers, we can narrow down the scope to OCI:
16+
17+
```
18+
pip install dstack[oci]
19+
```
20+
21+
Next, we need to configure the OCI specific credentials inside the `~/.dstack/server/config.yml`. Below assumes that you have credentials for OCI CLI configured. For other configuration options, please follow the dstack’s official document.
22+
23+
```
24+
projects:
25+
- name: main
26+
backends:
27+
- type: oci
28+
creds:
29+
type: default
30+
```
31+
32+
The final step is to run the dstack server as below.
33+
34+
```
35+
dstack server
36+
INFO Applying ~/.dstack/server/config.yml...
37+
INFO Configured the main project in ~/.dstack/config.yml
38+
INFO The admin token is ab6e8759-9cd9-4e84-8d47-5b94ac877ebf
39+
INFO The dstack server 0.18.4 is running at http://127.0.0.1:3000
40+
```
41+
42+
Then, switch to the folder with your project scripts and initialize dstack.
43+
44+
```
45+
dstack init
46+
```
47+
48+
## Fine-Tuning on OCI with dstack
49+
To fine-tune Gemma 7B model, we’ll be using the Hugging Face Alignment Handbook to ensure the incorporation of the best fine-tuning practices. The source code of this tutorial can be obtained from GitHub. Let's dive into the practical steps for fine-tuning your LLM.
50+
51+
Once, you switch to the project folder, here's the command to initiate the fine-tuning job on OCI with dstack:
52+
53+
```
54+
ACCEL_CONFIG_PATH=fsdp_qlora_full_shard.yaml \
55+
FT_MODEL_CONFIG_PATH=qlora_finetune_config.yaml \
56+
HUGGING_FACE_HUB_TOKEN=xxxx \
57+
WANDB_API_KEY=xxxx \
58+
dstack run . -f ft.task.dstack.yml
59+
```
60+
61+
The `FT_MODEL_CONFIG_PATH`, `ACCEL_CONFIG_PATH`, `HUGGING_FACE_HUB_TOKEN`, and `WANDB_API_KEY` environment variables are defined inside the `ft.task.dstack.yml` task configuration. `dstack run` submits the task defined in `ft.task.dstack.yml` on OCI.
62+
63+
**NOTE**: that dstack automatically copies the current directory’s content when executing the task.
64+
65+
Let’s explore the key parts of each YAML file (for the full contents, check the repository).
66+
67+
The `qlora_finetune_config.yaml` file is the recipe configuration that the Alignment Handbook can understand about how you would want to fine-tune an LLM:
68+
69+
```
70+
# Model arguments
71+
model_name_or_path: google/gemma-7b
72+
tokenizer_name_or_path: philschmid/gemma-tokenizer-chatml
73+
torch_dtype: bfloat16
74+
bnb_4bit_quant_storage: bfloat16
75+
76+
# LoRA arguments
77+
load_in_4bit: true
78+
use_peft: true
79+
lora_r: 8
80+
lora_alpha: 16
81+
lora_dropout: 0.05
82+
lora_target_modules:
83+
- q_proj
84+
- k_proj
85+
# ...
86+
87+
88+
# Data training arguments
89+
dataset_mixer:
90+
chansung/mental_health_counseling_conversations: 1.0
91+
dataset_splits:
92+
- train
93+
- test
94+
# ...
95+
```
96+
97+
* **Model arguments**
98+
99+
* `model_name_or_path`: Google’s Gemma 7B is chosen as the base model
100+
* `tokenizer_name_or_path`: alignment-handbook uses apply_chat_template() method of the chosen tokenizer. This tutorial uses the ChatML template instead of the Gemma 7B’s standard conversation template.
101+
* `torch_dtype` and `bnb_4bit_quant_storage`: these two values should be defined the same if we want to leverage FSDP+QLoRA fine-tuning method. Since Gemma 7B is hard to fit into a single A10 GPU, this blog post uses FSDP+QLoRA to shard a model into 2 x A10 GPUs while leveraging QLoRA technique.
102+
* **LoRA arguments**: LoRA specific configurations. Since this blog post leverages FSDP+QLoRA technique, `load_in_4bit` is set to `true`. Other configurations could vary from experiment to experiment.
103+
* **Data training arguments**: we have prepared a dataset which is based on Amod’s mental health counseling conversations’ dataset. Since alignment-handbook only understands the data in the form of `[{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, …]` which can be interpreted with tokenizer’s `apply_chat_template()` method, the prepared dataset is basically the conversion of the original dataset into the `apply_chat_template()` compatible format.
104+
105+
The `fsdp_qlora_full_shard.yaml` file configures accelerate how to use the underlying infrastructure for fine-tuning the LLM:
106+
107+
```
108+
compute_environment: LOCAL_MACHINE
109+
distributed_type: FSDP # Use Fully Sharded Data Parallelism
110+
fsdp_config:
111+
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
112+
fsdp_backward_prefetch: BACKWARD_PRE
113+
fsdp_cpu_ram_efficient_loading: true
114+
fsdp_use_orig_params: false
115+
fsdp_offload_params: true
116+
fsdp_sharding_strategy: FULL_SHARD
117+
# ... (other FSDP configurations)
118+
# ... (other configurations)
119+
```
120+
121+
* `distributed_type`: `FSDP` indicates the use of Fully Sharded Data Parallel (FSDP), a technique that enables training large models that would otherwise not fit on a single GPU.
122+
* `fsdp_config`: These set up how FSDP operates, such as how the model is sharded (`fsdp_sharding_strategy`) and whether parameters are offloaded to CPU (`fsdp_offload_params`).
123+
124+
![Hybrid shards](assets/images/image2.png "Hybrid shards")
125+
126+
With the `FSDP` of `distributed_type` and `FULL_SHARD` of `fsdp_config`’s `fsdp_sharding_strategy`, a model will be sharded across multiple GPUs in a single machine. When dealing with multiple compute nodes, each node will host an identical copy of the model, which is itself split across multiple GPUs within that node. This means each partitioned model instance on each node processes different sections or batches of your dataset. To distribute a single model across multiple GPUs spanning across multiple nodes, configure the parameter `fsdp_sharding_strategy` as `HYBRID_SHARD`.
127+
128+
Additional parameters like "machine_rank," "num_machines," and "num_processes" are important for coordination. However, it's recommended to set these values dynamically at runtime, as this provides flexibility when switching between different infrastructure setups.
129+
130+
## The power of dstack: simplified configuration
131+
132+
Finally, let's explore the `fsdp_qlora_full_shard.yaml` configuration that puts everything together and instructs dstack on how to provision infrastructure and run the task.
133+
134+
```
135+
type: task
136+
nodes: 3
137+
138+
python: "3.11"
139+
env:
140+
- ACCEL_CONFIG_PATH
141+
- FT_MODEL_CONFIG_PATH
142+
- HUGGING_FACE_HUB_TOKEN
143+
- WANDB_API_KEY
144+
commands:
145+
# ... (setup steps, cloning repo, installing requirements)
146+
- ACCELERATE_LOG_LEVEL=info accelerate launch \
147+
--config_file recipes/custom/accel_config.yaml \
148+
--main_process_ip=$DSTACK_MASTER_NODE_IP \
149+
--main_process_port=8008 \
150+
--machine_rank=$DSTACK_NODE_RANK \
151+
--num_processes=$DSTACK_GPUS_NUM \
152+
--num_machines=$DSTACK_NODES_NUM \
153+
scripts/run_sft.py recipes/custom/config.yaml
154+
ports:
155+
- 6006
156+
resources:
157+
gpu: 1..2
158+
shm_size: 24GB
159+
```
160+
161+
**Key points to highlight**:
162+
* **Seamless Integration**: dstack effortlessly integrates with Hugging Face's open source ecosystem. In Particular, you can simply use the accelerate library with the configurations that we defined in `fsdp_qlora_full_shard.yaml` as normal.
163+
* **Automatic Configuration**: `DSTACK_MASTER_NODE_IP`, `DSTACK_NODE_RANK`, `DSTACK_GPUS_NUM`, and `DSTACK_NODES_NUM` variables are automatically managed by dstack, reducing manual setup.
164+
* **Resource Allocation**: dstack makes it easy to specify the number of nodes and GPUs (gpu: 1..2) for your fine-tuning job. Hence, for this blog post, there are three nodes each of which is equipped with 2 x A10(24GB) GPUs.
165+
166+
## Serving your fine-tuned model with dstack
167+
168+
Once your model is fine-tuned, dstack makes it a breeze to deploy it on OCI using Hugging Face's Text Generation Inference (TGI) framework.
169+
170+
Here's an example of how you can define a service in dstack:
171+
172+
```
173+
type: service
174+
image: ghcr.io/huggingface/text-generation-inference:latest
175+
env:
176+
- HUGGING_FACE_HUB_TOKEN
177+
- MODEL_ID=chansung/mental_health_counseling_merged_v0.1
178+
commands:
179+
- text-generation-launcher \
180+
--max-input-tokens 512 --max-total-tokens 1024 \
181+
--max-batch-prefill-tokens 512 --port 8000
182+
port: 8000
183+
184+
resources:
185+
gpu:
186+
memory: 48GB
187+
188+
# (Optional) Enable the OpenAI-compatible endpoint
189+
model:
190+
format: tgi
191+
type: chat
192+
name: chansung/mental_health_counseling_merged_v0.1
193+
```
194+
195+
**Key advantages of this approach**:
196+
* **Secure HTTPS Gateway**: Dstack simplifies the process of setting up a secure HTTPS connection through a gateway, a crucial aspect of production-level model serving.
197+
* **Optimized for Inference**: The TGI framework is designed for efficient text generation inference, ensuring your model delivers responsive and reliable results.
198+
* **Auto-scaling**: dstack allows to specify the auto-scaling policy, including the minimum and maximum number of model replicas.
199+
200+
At this point, you can interact with the service via standard curl command and Python’s requests, OpenAI SDK, and Hugging Face’s InferenceClient libraries. For instance, the code snippet below shows an example of curl.
201+
202+
```
203+
curl -X POST https://black-octopus-1.mycustomdomain.com/generate \
204+
-H "Authorization: Bearer <dstack-token>" \
205+
-H 'Content-Type: application/json' \
206+
-d '{"inputs": "I feel bad...", "parameters": {"max_new_tokens": 128}}'
207+
```
208+
209+
Additionally, for a deployed model, dstack automatically provides a user interface to directly interact with the model:
210+
211+
<p align="center">
212+
<img src="https://github.com/oracle-devrel/technology-engineering/blob/dstack-tutorial/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/assets/images/image1.png" width="600">
213+
</p>
214+
215+
## Conclusion
216+
217+
By following the steps outlined in this article, you've unlocked a powerful approach to fine-tuning and deploying LLMs using the combined capabilities of dstack, OCI, and Hugging Face's ecosystem. You can now leverage dstack's user-friendly interface to manage your OCI resources effectively, streamlining the process of setting up distributed training environments for your LLM projects.
218+
219+
Furthermore, the integration with Hugging Face's Alignment Handbook and TGI framework empowers you to fine-tune and serve your models seamlessly, ensuring they're optimized for performance and ready for real-world applications. We encourage you to explore the possibilities further and experiment with different models and configurations to achieve your desired outcomes in the world of natural language processing.
220+
221+
**About the author**: Chansung Park is a HuggingFace fellow and is an AI researcher working on LLMs.

0 commit comments

Comments
 (0)