Skip to content

Commit 3b9313b

Browse files
efazalopenshift-merge-bot[bot]
authored andcommitted
Updated preprocessing dataset function to generate training dataset with an intersection with knowlwdge base.
Signed-off-by: Esa Fazal <[email protected]>
1 parent 77fac3a commit 3b9313b

File tree

2 files changed

+390
-392
lines changed

2 files changed

+390
-392
lines changed

examples/kfto-sft-feast-rag/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Fine-Tuning a RAG Model with Feast on OpenShift AI
22

3-
This project provides an end-to-end example of how to fine-tune a Retrieval-Augmented Generation (RAG) model on **OpenShift AI**. It uses the **Feast** feature store for efficient retrieval of context and the **Kubeflow Training SDK** to orchestrate the distributed fine-tuning job on the cluster.
3+
This project provides an end-to-end example of how to fine-tune a Retrieval-Augmented Generation (RAG) model on **OpenShift AI**. It uses the **Feast** (Feature Store) for efficient retrieval of context and the **Kubeflow Training SDK** to orchestrate the distributed fine-tuning job on the cluster.
44

55
The core idea is to enhance a generator model (like BART) by providing it with relevant documents retrieved from a knowledge base at runtime. This notebook handles the entire lifecycle: ingesting data into the feature store, fine-tuning the RAG model on synthetically generated Q&A pairs, and testing the final artifact.
66

@@ -11,10 +11,10 @@ The core idea is to enhance a generator model (like BART) by providing it with r
1111
Before you begin, ensure you have the following setup:
1212

1313
* An OpenShift cluster with OpenShift AI (RHOAI) 2.20+ installed:
14-
* The `dashboard`, `trainingoperator` and `workbenches` components enabled
15-
* Workbench with medium size container, 1 NVIDIA GPU accelerator, and cluster storage of 200GB.
16-
* Sufficient worker nodes for your configuration(s) with NVIDIA GPUs (Ampere-based or newer recommended)
17-
* A dynamic storage provisioner supporting RWX PVC provisioning
14+
* The `dashboard`, `trainingoperator` and `workbenches` components enabled.
15+
* Workbench with medium size container, 1 NVIDIA GPU / 1 AMD GPU accelerator, and cluster storage of 200GB.
16+
* Sufficient worker nodes for your configuration(s) with NVIDIA GPUs (Ampere-based or newer recommended) or AMD GPUs depending on your environment.
17+
* A dynamic storage provisioner supporting RWX PVC provisioning.
1818
* A standalone Milvus deployment. See example [here](https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/vector-databases/milvus#deployment).
1919

2020
***
@@ -29,9 +29,9 @@ You must run this notebook from within an OpenShift AI Workbench. Follow these s
2929
* Then create a workbench with a preferred name and with the following settings:
3030
* Select the `PyTorch` (or the `ROCm-PyTorch`) workbench image with the recommended version.
3131
* Select the `Medium` as the deployment container size.
32-
* Add an accelerator (GPU).
32+
* Add one NVIDIA / AMD accelerator (GPU) depending on environment.
3333
* Create a storage that'll be shared between the workbench and the fine-tuning runs.
34-
Make sure it uses a storage class with RWX capability and give it enough size according to the size of the model you want to fine-tune.
34+
Make sure it uses a storage class with RWX capability and give it enough capacity according to the size of the model you want to fine-tune.
3535
> [!NOTE]
3636
> You can attach an existing shared storage if you already have one instead.
3737
* Review the storage configuration and click "Create workbench"
@@ -64,7 +64,7 @@ The notebook is structured to guide you through the following key stages:
6464
* It defines a `RagSequenceForGeneration` model, combining a question-encoder with a generator model.
6565
* It uses a custom `FeastRAGRetriever` to connect the RAG model to the Feast feature store.
6666
* The notebook uses the Kubeflow `TrainingClient` to submit this `main` function as a distributed `PyTorchJob` to the OpenShift cluster.
67-
* **Monitoring**: You can monitor the job's progress directly through its logs and visualize metrics using the integrated TensorBoard.
67+
* **Monitoring**: You can monitor the job's progress directly through its logs and visualize metrics using the integrated TensorBoard dashboard.
6868
* **Inference and Testing**: After the training job is complete, the final, fine-tuned RAG model is loaded from shared storage for testing.
6969

7070
***

0 commit comments

Comments
 (0)