Skip to content

Commit f25d0b6

Browse files
authored
Merge branch 'main' into lzf.2024.07-01
2 parents ec995e7 + 20639a3 commit f25d0b6

File tree

157 files changed

+4151
-41
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

157 files changed

+4151
-41
lines changed
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# From Fine-Tuning to Serving LLMs with OCI and dstack
2+
3+
dstack is an open-source tool that simplifies AI container orchestration and makes distributed training and deployment of LLMs more accessible. Combining dstack and OCI unlocks a streamlined process for setting up cloud infrastructure for distributed training and scalable model deployment.
4+
5+
This article walks you through fine-tuning a model using dstack on OCI, incorporating best practices from the Hugging Face Alignment Handbook, and then deploying the model using Hugging Face’s Text Generation Inference (TGI).
6+
7+
**NOTE**: The experiment described in the article used an OCI cluster of three nodes, each with 2 x A10 GPUs, to fine-tune the Gemma 7B model.
8+
9+
## How dstack works
10+
11+
dstack offers a unified interface for the development, training, and deployment of AI models across any cloud or data center. For example, you can specify a configuration for a training task or a model to be deployed, and dstack will take care of setting up the required infrastructure and orchestrating the containers. One of the advantages dstack offers is that it allows the use of any hardware, frameworks, and scripts.
12+
13+
## Setting up dstack with OCI
14+
15+
With four simple steps, we can use dstack with OCI. First, we need to install the dstack Python package. Since dstack supports multiple cloud providers, we can narrow down the scope to OCI:
16+
17+
```
18+
pip install dstack[oci]
19+
```
20+
21+
Next, we need to configure the OCI specific credentials inside the `~/.dstack/server/config.yml`. Below assumes that you have credentials for OCI CLI configured. For other configuration options, please follow the dstack’s official document.
22+
23+
```
24+
projects:
25+
- name: main
26+
backends:
27+
- type: oci
28+
creds:
29+
type: default
30+
```
31+
32+
The final step is to run the dstack server as below.
33+
34+
```
35+
dstack server
36+
INFO Applying ~/.dstack/server/config.yml...
37+
INFO Configured the main project in ~/.dstack/config.yml
38+
INFO The admin token is ab6e8759-9cd9-4e84-8d47-5b94ac877ebf
39+
INFO The dstack server 0.18.4 is running at http://127.0.0.1:3000
40+
```
41+
42+
Then, switch to the folder with your project scripts and initialize dstack.
43+
44+
```
45+
dstack init
46+
```
47+
48+
## Fine-Tuning on OCI with dstack
49+
To fine-tune Gemma 7B model, we’ll be using the Hugging Face Alignment Handbook to ensure the incorporation of the best fine-tuning practices. The source code of this tutorial can be obtained from GitHub. Let's dive into the practical steps for fine-tuning your LLM.
50+
51+
Once, you switch to the project folder, here's the command to initiate the fine-tuning job on OCI with dstack:
52+
53+
```
54+
ACCEL_CONFIG_PATH=fsdp_qlora_full_shard.yaml \
55+
FT_MODEL_CONFIG_PATH=qlora_finetune_config.yaml \
56+
HUGGING_FACE_HUB_TOKEN=xxxx \
57+
WANDB_API_KEY=xxxx \
58+
dstack run . -f ft.task.dstack.yml
59+
```
60+
61+
The `FT_MODEL_CONFIG_PATH`, `ACCEL_CONFIG_PATH`, `HUGGING_FACE_HUB_TOKEN`, and `WANDB_API_KEY` environment variables are defined inside the `ft.task.dstack.yml` task configuration. `dstack run` submits the task defined in `ft.task.dstack.yml` on OCI.
62+
63+
**NOTE**: that dstack automatically copies the current directory’s content when executing the task.
64+
65+
Let’s explore the key parts of each YAML file (for the full contents, check the repository).
66+
67+
The `qlora_finetune_config.yaml` file is the recipe configuration that the Alignment Handbook can understand about how you would want to fine-tune an LLM:
68+
69+
```
70+
# Model arguments
71+
model_name_or_path: google/gemma-7b
72+
tokenizer_name_or_path: philschmid/gemma-tokenizer-chatml
73+
torch_dtype: bfloat16
74+
bnb_4bit_quant_storage: bfloat16
75+
76+
# LoRA arguments
77+
load_in_4bit: true
78+
use_peft: true
79+
lora_r: 8
80+
lora_alpha: 16
81+
lora_dropout: 0.05
82+
lora_target_modules:
83+
- q_proj
84+
- k_proj
85+
# ...
86+
87+
88+
# Data training arguments
89+
dataset_mixer:
90+
chansung/mental_health_counseling_conversations: 1.0
91+
dataset_splits:
92+
- train
93+
- test
94+
# ...
95+
```
96+
97+
* **Model arguments**
98+
99+
* `model_name_or_path`: Google’s Gemma 7B is chosen as the base model
100+
* `tokenizer_name_or_path`: alignment-handbook uses apply_chat_template() method of the chosen tokenizer. This tutorial uses the ChatML template instead of the Gemma 7B’s standard conversation template.
101+
* `torch_dtype` and `bnb_4bit_quant_storage`: these two values should be defined the same if we want to leverage FSDP+QLoRA fine-tuning method. Since Gemma 7B is hard to fit into a single A10 GPU, this blog post uses FSDP+QLoRA to shard a model into 2 x A10 GPUs while leveraging QLoRA technique.
102+
* **LoRA arguments**: LoRA specific configurations. Since this blog post leverages FSDP+QLoRA technique, `load_in_4bit` is set to `true`. Other configurations could vary from experiment to experiment.
103+
* **Data training arguments**: we have prepared a dataset which is based on Amod’s mental health counseling conversations’ dataset. Since alignment-handbook only understands the data in the form of `[{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, …]` which can be interpreted with tokenizer’s `apply_chat_template()` method, the prepared dataset is basically the conversion of the original dataset into the `apply_chat_template()` compatible format.
104+
105+
The `fsdp_qlora_full_shard.yaml` file configures accelerate how to use the underlying infrastructure for fine-tuning the LLM:
106+
107+
```
108+
compute_environment: LOCAL_MACHINE
109+
distributed_type: FSDP # Use Fully Sharded Data Parallelism
110+
fsdp_config:
111+
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
112+
fsdp_backward_prefetch: BACKWARD_PRE
113+
fsdp_cpu_ram_efficient_loading: true
114+
fsdp_use_orig_params: false
115+
fsdp_offload_params: true
116+
fsdp_sharding_strategy: FULL_SHARD
117+
# ... (other FSDP configurations)
118+
# ... (other configurations)
119+
```
120+
121+
* `distributed_type`: `FSDP` indicates the use of Fully Sharded Data Parallel (FSDP), a technique that enables training large models that would otherwise not fit on a single GPU.
122+
* `fsdp_config`: These set up how FSDP operates, such as how the model is sharded (`fsdp_sharding_strategy`) and whether parameters are offloaded to CPU (`fsdp_offload_params`).
123+
124+
![Hybrid shards](assets/images/image2.png "Hybrid shards")
125+
126+
With the `FSDP` of `distributed_type` and `FULL_SHARD` of `fsdp_config`’s `fsdp_sharding_strategy`, a model will be sharded across multiple GPUs in a single machine. When dealing with multiple compute nodes, each node will host an identical copy of the model, which is itself split across multiple GPUs within that node. This means each partitioned model instance on each node processes different sections or batches of your dataset. To distribute a single model across multiple GPUs spanning across multiple nodes, configure the parameter `fsdp_sharding_strategy` as `HYBRID_SHARD`.
127+
128+
Additional parameters like "machine_rank," "num_machines," and "num_processes" are important for coordination. However, it's recommended to set these values dynamically at runtime, as this provides flexibility when switching between different infrastructure setups.
129+
130+
## The power of dstack: simplified configuration
131+
132+
Finally, let's explore the `fsdp_qlora_full_shard.yaml` configuration that puts everything together and instructs dstack on how to provision infrastructure and run the task.
133+
134+
```
135+
type: task
136+
nodes: 3
137+
138+
python: "3.11"
139+
env:
140+
- ACCEL_CONFIG_PATH
141+
- FT_MODEL_CONFIG_PATH
142+
- HUGGING_FACE_HUB_TOKEN
143+
- WANDB_API_KEY
144+
commands:
145+
# ... (setup steps, cloning repo, installing requirements)
146+
- ACCELERATE_LOG_LEVEL=info accelerate launch \
147+
--config_file recipes/custom/accel_config.yaml \
148+
--main_process_ip=$DSTACK_MASTER_NODE_IP \
149+
--main_process_port=8008 \
150+
--machine_rank=$DSTACK_NODE_RANK \
151+
--num_processes=$DSTACK_GPUS_NUM \
152+
--num_machines=$DSTACK_NODES_NUM \
153+
scripts/run_sft.py recipes/custom/config.yaml
154+
ports:
155+
- 6006
156+
resources:
157+
gpu: 1..2
158+
shm_size: 24GB
159+
```
160+
161+
**Key points to highlight**:
162+
* **Seamless Integration**: dstack effortlessly integrates with Hugging Face's open source ecosystem. In Particular, you can simply use the accelerate library with the configurations that we defined in `fsdp_qlora_full_shard.yaml` as normal.
163+
* **Automatic Configuration**: `DSTACK_MASTER_NODE_IP`, `DSTACK_NODE_RANK`, `DSTACK_GPUS_NUM`, and `DSTACK_NODES_NUM` variables are automatically managed by dstack, reducing manual setup.
164+
* **Resource Allocation**: dstack makes it easy to specify the number of nodes and GPUs (gpu: 1..2) for your fine-tuning job. Hence, for this blog post, there are three nodes each of which is equipped with 2 x A10(24GB) GPUs.
165+
166+
## Serving your fine-tuned model with dstack
167+
168+
Once your model is fine-tuned, dstack makes it a breeze to deploy it on OCI using Hugging Face's Text Generation Inference (TGI) framework.
169+
170+
Here's an example of how you can define a service in dstack:
171+
172+
```
173+
type: service
174+
image: ghcr.io/huggingface/text-generation-inference:latest
175+
env:
176+
- HUGGING_FACE_HUB_TOKEN
177+
- MODEL_ID=chansung/mental_health_counseling_merged_v0.1
178+
commands:
179+
- text-generation-launcher \
180+
--max-input-tokens 512 --max-total-tokens 1024 \
181+
--max-batch-prefill-tokens 512 --port 8000
182+
port: 8000
183+
184+
resources:
185+
gpu:
186+
memory: 48GB
187+
188+
# (Optional) Enable the OpenAI-compatible endpoint
189+
model:
190+
format: tgi
191+
type: chat
192+
name: chansung/mental_health_counseling_merged_v0.1
193+
```
194+
195+
**Key advantages of this approach**:
196+
* **Secure HTTPS Gateway**: Dstack simplifies the process of setting up a secure HTTPS connection through a gateway, a crucial aspect of production-level model serving.
197+
* **Optimized for Inference**: The TGI framework is designed for efficient text generation inference, ensuring your model delivers responsive and reliable results.
198+
* **Auto-scaling**: dstack allows to specify the auto-scaling policy, including the minimum and maximum number of model replicas.
199+
200+
At this point, you can interact with the service via standard curl command and Python’s requests, OpenAI SDK, and Hugging Face’s InferenceClient libraries. For instance, the code snippet below shows an example of curl.
201+
202+
```
203+
curl -X POST https://black-octopus-1.mycustomdomain.com/generate \
204+
-H "Authorization: Bearer <dstack-token>" \
205+
-H 'Content-Type: application/json' \
206+
-d '{"inputs": "I feel bad...", "parameters": {"max_new_tokens": 128}}'
207+
```
208+
209+
Additionally, for a deployed model, dstack automatically provides a user interface to directly interact with the model:
210+
211+
<p align="center">
212+
<img src="https://github.com/oracle-devrel/technology-engineering/blob/dstack-tutorial/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/dstack/assets/images/image1.png" width="600">
213+
</p>
214+
215+
## Conclusion
216+
217+
By following the steps outlined in this article, you've unlocked a powerful approach to fine-tuning and deploying LLMs using the combined capabilities of dstack, OCI, and Hugging Face's ecosystem. You can now leverage dstack's user-friendly interface to manage your OCI resources effectively, streamlining the process of setting up distributed training environments for your LLM projects.
218+
219+
Furthermore, the integration with Hugging Face's Alignment Handbook and TGI framework empowers you to fine-tune and serve your models seamlessly, ensuring they're optimized for performance and ready for real-world applications. We encourage you to explore the possibilities further and experiment with different models and configurations to achieve your desired outcomes in the world of natural language processing.
220+
221+
**About the author**: Chansung Park is a HuggingFace fellow and is an AI researcher working on LLMs.
220 KB
Loading
171 KB
Loading

data-platform/analytical-data-platform-lakehouse/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ Reviewed: 18.01.2024
5555
- Blog post describing and comparing the different insert methods in Autonomous Database to support low latency data ingestion for IoT workloads.
5656
- [Managing Active Metadata with Oracle Data Platform](https://gianlucarossi06.github.io/data-organon/2024/05/31/Active-Metadata-4-OCI-Data-Platform.html)
5757
- Blog post describing how to define and store active metadata using OCI Data Platform, using a practical example. Active Metadata can be anything stored as custom properties in a data catalog allowing users to understand, for instance, data freshness.
58+
- [Streaming IoT Data into Object Storage with Streaming service](https://jakubillner.github.io/2024/06/28/streaming-ingest.html)
59+
- Blog post describing how to ingest and store IoT data for an analytical workload using OCI Streaming, Connector Hub, and Object Storage.
5860

5961

6062
## YouTube

data-platform/analytics/oracle-analytics-cloud/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Reviewed: 31.10.2023
1313
## Specialists Blogs for various features & functionality
1414
|Content Link |Functionality|Descripton|
1515
| ------------ |------------|----------|
16+
|[Unleash the Power of Template Viewer: Streamlined Testing for Flawless Oracle Analytics Publisher Reports](https://www.linkedin.com/pulse/unleash-power-template-viewer-streamlined-testing-flawless-kasetty-bxiqc/)|Oracle Analytics Publisher Template Viewer|How to leverage Template Viewer to test Oracle Analytics Publisher templates.
1617
|[Leverage the OCI Modern Data Platform to implement an Enterprise Analytics Solution](https://blogs.oracle.com/coretec/post/leverage-oci-modern-data-platform-to-implement-enterprise-analytics-solution)|OAC-Enterprise Analytics Solution |How to leverage the OCI modern data platform to implement an enterprise analytics solution.
1718
|[Top 5 reasons Oracle Analytics Cloud stands apart in the ML/AI Analytics landscape](https://blogs.oracle.com/analytics/post/top-5-reasons-oracle-analytics-cloud-stands-apart-in-the-mlai-analytics-landscape)|OAC Machine Learning|What are primary reasons to choose Oracle Analytics Cloud (OAC) from an ML/AI perspective.
1819
|[Oracle Analytics Cloud: Set up and configure Oracle Analytics Cloud environments using Terraform](https://blogs.oracle.com/analytics/post/oracle-analytics-cloud-set-up-and-configure-oracle-analytics-cloud-environments-using-terraform)|OAC Setup & Configure|How to provision and configure Oracle analytics cloud on OCI using Terraform.
@@ -61,6 +62,8 @@ Reviewed: 31.10.2023
6162
## OAC Latest Release and Announcements
6263
|Content Link |Descripton|
6364
| ------------ |------------|
65+
|[Oracle Analytics Cloud new features - Jul 2024](https://www.youtube.com/watch?v=0BVxTCvDmaQ&list=PL6gBNP-Fr8KXAOF9RgJIU5ykJvD8fHoxj)|Oracle Analytics Cloud Jul-2024 new features videos|
66+
|[Oracle Analytics Cloud new features - May 2024](https://www.youtube.com/watch?v=eoNmcRZ5wYI&list=PL6gBNP-Fr8KU55dSbzkEKySjSDWlL3BWm)|Oracle Analytics Cloud May-2024 new features videos|
6467
|[Oracle Analytics Cloud new features - March 2024](https://www.youtube.com/playlist?list=PL6gBNP-Fr8KWlnpaELiCxQJii-F4c7Ehz)|Oracle Analytics Cloud March-2024 new features videos|
6568
|[Oracle Analytics Cloud new features - January 2024](https://www.youtube.com/playlist?list=PL6gBNP-Fr8KUGvVDRGC8IyXo8yQzUtMiD)|Oracle Analytics Cloud January-2024 new features videos|
6669
|[Oracle Analytics New Capabilities - November 2023](https://www.youtube.com/playlist?list=PL6gBNP-Fr8KXVh3PVwWfl1nC_TyHi_yl8)|Oracle Analytics Cloud November-2023 release|
@@ -113,7 +116,7 @@ Reviewed: 31.10.2023
113116
|Content Link |Descripton|
114117
| ------------ |------------|
115118
|[OAC vs PowerBI vs Tableau](https://www.oracle.com/business-analytics/comparison-chart.html)|Comparison of Oracle Analytics Cloud with other leading business analytics products|
116-
|[Gartner Analytics Review 2023](https://www.youtube.com/watch?v=nYNbpGeu_nw)|Oracle Analytics as visionary in the 2023 Gartner’s Magic Quadrant|
119+
|[Gartner Analytics Review 2024](https://www.oracle.com/news/announcement/oracle-named-leader-in-2024-gartner-magic-quadrant-for-analytics-and-business-intelligence-platforms-2024-06-24/)|Oracle Named a Leader in the 2024 Gartner® Magic Quadrant™ for Analytics and Business Intelligence Platforms|
117120

118121

119122
## Blogs for AI/ML with Oracle Analytics Platform
Binary file not shown.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# AI Vector Search
2+
3+
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads and allows you to query data based on semantics, rather than keywords. The VECTOR data type is introduced with the release of Oracle Database 23ai, providing the foundation to store vector embeddings alongside business data in the database. Using embedding models, you can transform unstructured data into vector embeddings that can then be used for semantic queries on business data.
4+
5+
Reviewed Date: 17.07.2024
6+
7+
# Useful Links
8+
9+
## Documentation
10+
11+
- [Oracle.com](https://www.oracle.com/database/ai-vector-search/)
12+
- [Oracle AI Vector Search User's Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/overview-ai-vector-search.html)
13+
- [PL/SQL Packages and Types Reference: DBMS_VECTOR](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector1.html#GUID-F9FCB225-821A-4CCA-92B5-58B9927234FA)
14+
- [PL/SQL Packages and Types Reference: DBMS_VECTOR_CHAIN](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-D80DDBEF-F1A9-4267-9D3C-A54D237D95C1)
15+
- [Oracle AI Vector Search FAQ](https://www.oracle.com/database/ai-vector-search/faq/)
16+
17+
## Blogs & Videos
18+
19+
- [Oracle Announces General Availability of AI Vector Search in Oracle Database 23ai](https://blogs.oracle.com/database/post/oracle-announces-general-availability-of-ai-vector-search-in-oracle-database-23ai)
20+
- [OML4Py: Leveraging ONNX and Hugging Face for AI Vector Search](https://blogs.oracle.com/machinelearning/post/oml4py-leveraging-onnx-and-hugging-face-for-advanced-ai-vector-search)
21+
- [Use AI Vector Search to Build GenAI Apps with Enterprise Data| Oracle DatabaseWorld AI Edition](https://www.youtube.com/watch?v=5o5Ds8KLqVw&list=PLcFwxJMrxygALJRhZCbnjtDBYWCpWXPGz&index=3)
22+
23+
24+
## LiveLabs Workshops
25+
26+
- [Oracle AI Vector Search - 15 Minute Basics](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3975&clear=RR,180&session=3449305441143)
27+
- [Oracle AI Vector Search - Basics](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=1070&clear=RR,180)
28+
- [AI Vector Search - Complete RAG Application using PL/SQL in Oracle Database 23ai](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3934&clear=RR,180&session=11020955624236)
29+
- [AI Vector Search - 7 Easy Steps to Building a RAG Application using LangChain](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3927&clear=RR,180&session=11020955624236)
30+
31+
32+
# Team Publications
33+
34+
- [Getting started with vectors in 23ai](https://blogs.oracle.com/coretec/post/getting-started-with-vectors-in-23ai)
35+
36+
# License
37+
38+
Copyright (c) 2024 Oracle and/or its affiliates.
39+
40+
Licensed under the Universal Permissive License (UPL), Version 1.0.
41+
42+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
REM Creation of tablespace, user and directory
2+
3+
-- connect as user sys to FREEPDB1
4+
5+
-- create tablespace
6+
7+
create bigfile tablespace TBS_VECTOR datafile size 256M autoextend on maxsize 2G;
8+
9+
-- create user with the new role DB_DEVELOPER_ROLE
10+
DROP USER vector_user cascade;
11+
12+
create user vector_user identified by "Oracle_4U"
13+
default tablespace TBS_VECTOR temporary tablespace TEMP
14+
quota unlimited on TBS_VECTOR;
15+
16+
grant create mining model to vector_user;
17+
grant DB_DEVELOPER_ROLE to vector_user;
18+
19+
-- create directory
20+
21+
CREATE OR REPLACE DIRECTORY dm_dump as '&directorypath';
22+
GRANT all ON DIRECTORY dm_dump TO vector_user;

0 commit comments

Comments
 (0)