|
| 1 | +# 🚀 Quickstart: Fine‑Tuning Granite Models with FSDP/DeepSpeed & LoRA/QLoRA Using a Simple FEAST Store example |
| 2 | + |
| 3 | +This notebook guides you through the process of fine-tuning **Large Language Models** using **Feast**, **Kubeflow-Training**, and modern optimization strategies like **FSDP**, **DeepSpeed**, and **LoRA** to boost training performance and efficiency. |
| 4 | + |
| 5 | +In particular, this example demonstrates: |
| 6 | +1. How to implement **Fully Sharded Data Parallel (FSDP)** and **DeepSpeed** to distribute training across multiple GPUs, enhancing scalability and speed. |
| 7 | +2. How to apply **Low-Rank Adaptation (LoRA)** or **Quantized Low-Rank Adaptation (QLoRA)** via the [PEFT library](https://github.com/huggingface/peft) for parameter-efficient fine-tuning, reducing computational and memory overhead. |
| 8 | +3. How to retrieve and manage **training features using Feast**, enabling consistent, scalable, and reproducible ML pipelines. |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## 🍽️ What is Feast and How Are We Using It? |
| 13 | + |
| 14 | +[Feast (Feature Store)](https://github.com/feast-dev/feast) is a powerful operational data system for machine learning that helps manage, store, and serve features consistently during training and inference. In this workflow, **Feast acts as the centralized source of truth for model features**, decoupling feature engineering from model training. |
| 15 | + |
| 16 | +Specifically, we use Feast to: |
| 17 | + |
| 18 | +- **Define and register feature definitions** for training data using a standardized interface. |
| 19 | +- **Ingest and materialize features** from upstream sources (e.g., batch files, data warehouses). |
| 20 | +- **Fetch training features** as PyTorch-friendly tensors, ensuring consistency across training and production. |
| 21 | +- **Version control feature sets** to improve reproducibility and traceability in experiments. |
| 22 | + |
| 23 | +By integrating Feast into the fine-tuning pipeline, we ensure that the training process is not only scalable but also **robust, modular, and production-ready**. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## 💡 Why Use FSDP, DeepSpeed, LoRA, and Feast for Fine-Tuning? |
| 28 | + |
| 29 | +- **Efficient Distributed Training:** Utilize FSDP or DeepSpeed to handle large models by distributing the training process, enabling faster and more scalable training sessions. |
| 30 | +- **Parameter Efficiency with LoRA/QLoRA:** Implement LoRA to fine-tune models by updating only a subset of parameters, LoRA’s low‑rank adapters let you adapt a full LLM by training a tiny fraction of parameters, saving compute and storage by ~90% and speeding up training ; QLoRA extends the LoRA technique by loading the model in 4‑bit quantization, freezing the quantized weights, and updating only the adapters to support massive models on limited‑memory GPUs. |
| 31 | +- **Feature Management with Feast:** Fetch well-defined, version-controlled features seamlessly into your pipeline, boosting reproducibility and easing data integration. |
| 32 | +- **Flexible Configuration Management:** Store DeepSpeed and LoRA settings in separate YAML files, allowing for easy modifications and experimentation without altering the core training script. |
| 33 | +- **Mixed-Precision Training:** Leverage automatic mixed precision (AMP) to accelerate training and reduce memory usage by combining different numerical precisions. |
| 34 | +- **Model Saving and Uploading:** Save the fine-tuned model and tokenizer locally and upload them to an S3 bucket for persistent storage and easy deployment. |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Requirements |
| 39 | + |
| 40 | +* An OpenShift cluster with OpenShift AI (RHOAI) 2.17+ installed: |
| 41 | + * The `dashboard`, `trainingoperator` and `workbenches` components enabled |
| 42 | +* Sufficient worker nodes for your configuration(s) with NVIDIA GPUs (Ampere-based or newer recommended) |
| 43 | + * If using PEFT LoRA/QLoRA techniques, then can use NVIDIA GPUs (G4dn) |
| 44 | +* AWS S3 storage available |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | + |
| 49 | +By following this notebook, you'll gain hands-on experience in setting up a **feature-rich, efficient, and scalable** fine-tuning pipeline for **Granite language models**, leveraging tooling across model training and feature engineering. |
| 50 | + |
| 51 | + |
| 52 | +## Setup |
| 53 | + |
| 54 | +* Access the OpenShift AI dashboard, for example from the top navigation bar menu: |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | +* Log in, then go to _Data Science Projects_ and create a project: |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +* Once the project is created, click on _Create a workbench_: |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +* Then create a workbench with the following settings: |
| 67 | + |
| 68 | + * Select the `PyTorch` (or the `ROCm-PyTorch`) notebook image: |
| 69 | + |
| 70 | +  |
| 71 | + |
| 72 | + * Select the _Small_ container size and a sufficient persistent storage volume. |
| 73 | + * In _'Environment variables'_ section, use variable type as _Secret_ and provide key/value pair to store _HF-TOKEN_ as a kubernetes secret : |
| 74 | + |
| 75 | +  |
| 76 | + |
| 77 | + * Click on _Create connection_ to create a workbench connection to your S3 compatible storage bucket: |
| 78 | + |
| 79 | + * Select option : _S3 compatible object storage - v1_ |
| 80 | + |
| 81 | + |
| 82 | +  |
| 83 | + |
| 84 | + * Fill all the needed fields, also specify _Bucket_ value (it is used in the workbench), then confirm: |
| 85 | + |
| 86 | + |
| 87 | +  |
| 88 | + |
| 89 | + > [!NOTE] |
| 90 | + > |
| 91 | + > * Adding an accelerator is only needed to test the fine-tuned model from within the workbench so you can spare an accelerator if needed. |
| 92 | + > * Keep the default 20GB workbench storage, it is enough to run the inference from within the workbench. |
| 93 | + > * If you use different connection name than _s3-data-connection_ then you need to adjust the _aws_connection_name_ properly in notebook to refer to this new name. |
| 94 | +
|
| 95 | + |
| 96 | + * Review the configuration and click _Create workbench_: |
| 97 | + |
| 98 | +  |
| 99 | + |
| 100 | +* From "Workbenches" page, click on _Open_ when the workbench you've just created becomes ready: |
| 101 | + |
| 102 | + |
| 103 | + |
| 104 | +* From the workbench, clone this repository, i.e., `https://github.com/opendatahub-io/distributed-workloads.git` |
| 105 | + |
| 106 | +* Navigate to the `distributed-workloads/examples/kfto-feast` directory and open the `kfto_feast` notebook |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | +You can now proceed with the instructions from the notebook. Enjoy! |
| 111 | + |
| 112 | + |
0 commit comments