red-hat-data-services
diff --git a/‎examples/rag-llm/README.md
Lines changed: 36 additions & 0 deletions b/‎examples/rag-llm/README.md
Lines changed: 36 additions & 0 deletions
diff --git a/‎examples/rag-llm/docs/01.png
89.8 KB b/‎examples/rag-llm/docs/01.png
89.8 KB
diff --git a/‎examples/rag-llm/docs/02.png
72.6 KB b/‎examples/rag-llm/docs/02.png
72.6 KB
diff --git a/‎examples/rag-llm/docs/03.png
110 KB b/‎examples/rag-llm/docs/03.png
110 KB
diff --git a/‎examples/rag-llm/docs/04a.png
79.8 KB b/‎examples/rag-llm/docs/04a.png
79.8 KB
diff --git a/‎examples/rag-llm/docs/04b.png
50.2 KB b/‎examples/rag-llm/docs/04b.png
50.2 KB
diff --git a/‎examples/rag-llm/docs/04c.png
67.7 KB b/‎examples/rag-llm/docs/04c.png
67.7 KB
diff --git a/‎examples/rag-llm/docs/05.png
117 KB b/‎examples/rag-llm/docs/05.png
117 KB
diff --git a/‎examples/rag-llm/huggingface_rag.ipynb
Lines changed: 205 additions & 0 deletions b/‎examples/rag-llm/huggingface_rag.ipynb
Lines changed: 205 additions & 0 deletions
@@ -0,0 +1,36 @@
+# Retrieval-Augmented Generation on OpenShift AI
+
+These examples show how a user can run Retrieval-Augmented Generation using Jupyter Notebooks provided by OpenShift AI.
+
+The huggingface_rag example is based on this HuggingFace blog post - https://huggingface.co/blog/ngxson/make-your-own-rag
+
+
+## Requirements
+
+* An OpenShift cluster with OpenShift AI (RHOAI) 2.17+ installed:
+  * The `dashboard` and `workbenches` components enabled
+* Sufficient worker node to run workbench with NVIDIA GPUs (Ampere-based or newer recommended) or AMD GPUs (AMD Instinct MI300X or newer recommended)
+
+
+## Setup
+
+* Access the OpenShift AI dashboard, for example from the top navigation bar menu:
+![](./docs/01.png)
+* Log in, then go to _Data Science Projects_ and create a project:
+![](./docs/02.png)
+* Once the project is created, click on _Create a workbench_:
+![](./docs/03.png)
+* Then create a workbench with the following settings:
+    * Select the `PyTorch` (or the `ROCm-PyTorch`) notebook image:
+    ![](./docs/04a.png)
+    * Select the `Medium` container size and add an accelerator:
+    ![](./docs/04b.png)
+    * Keep the default 20GB workbench storage, it is enough to run the inference from within the workbench:
+    * Review the configuration and click "Create workbench":
+    ![](./docs/04c.png)
+* From "Workbenches" page, click on _Open_ when the workbench you've just created becomes ready:
+![](./docs/05.png)
+* From the workbench, clone this repository, i.e., `https://github.com/opendatahub-io/distributed-workloads.git`
+* Navigate to the `distributed-workloads/examples/rag-llm` directory and open one of available notebooks
+
+You can now proceed with the instructions from the notebook. Enjoy!
@@ -0,0 +1,205 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b29cb9f2-e3c0-44cc-8327-7757c5add287",
+   "metadata": {},
+   "source": [
+    "# Setup\n",
+    "\n",
+    "Install all required dependencies."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bff9c793-7ca5-4f3b-8353-b55d3acb3b4e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install --quiet --upgrade transformers datasets faiss-cpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24ef69a7-c616-4b06-b1ad-d3cb98abe7df",
+   "metadata": {},
+   "source": [
+    "#  Hugging Face RAG"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f0abf06-145f-4644-b25d-823c6ffc58af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Models\n",
+    "encoder_model = \"facebook/dpr-ctx_encoder-multiset-base\"\n",
+    "generator_model = \"facebook/rag-sequence-nq\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43d74dc3-39f1-4d22-9c74-aaedd0131093",
+   "metadata": {},
+   "source": [
+    "Prepare chunk dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb54a2f0-aef6-4308-a8b4-07e9c7cca23b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import urllib.request\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "link = \"https://huggingface.co/ngxson/demo_simple_rag_py/raw/main/cat-facts.txt\"\n",
+    "dataset_list = []\n",
+    "\n",
+    "# Retrieve knowledge from provided link, use every line as a separate chunk.\n",
+    "for line in urllib.request.urlopen(link):\n",
+    "    dataset_list.append({\"text\": line.decode('utf-8'), \"title\": \"cats\"})\n",
+    "\n",
+    "print(f'Loaded {len(dataset_list)} entries')\n",
+    "\n",
+    "dataset = Dataset.from_list(dataset_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "677c95fe-1d36-4dfe-bf0d-1283857e5ee7",
+   "metadata": {},
+   "source": [
+    "Encode dataset chunks into embeddings (vector representations), append embeddings into dataset.\n",
+    "\n",
+    "Add faiss index for similarity search."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3c118e29-7fbf-4741-a474-3e5a3d46d8c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import (\n",
+    "    DPRContextEncoder,\n",
+    "    DPRContextEncoderTokenizerFast,\n",
+    ")\n",
+    "import torch\n",
+    "\n",
+    "\n",
+    "torch.set_grad_enabled(False)\n",
+    "\n",
+    "ctx_encoder = DPRContextEncoder.from_pretrained(encoder_model)\n",
+    "ctx_tokenizer = DPRContextEncoderTokenizerFast.from_pretrained(encoder_model)\n",
+    "ds_with_embeddings = dataset.map(lambda example: {'embeddings': ctx_encoder(**ctx_tokenizer(example[\"text\"], return_tensors=\"pt\"))[0][0].numpy()})\n",
+    "ds_with_embeddings.add_faiss_index(column='embeddings')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc5bb3cd-9785-43fa-b1c4-e16e78b69073",
+   "metadata": {},
+   "source": [
+    "**Specify user query here**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9dc40487-bf1c-49ed-9106-0dc46e38820c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "input_query = \"what is the name of the tiniest cat\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b52d2349-02aa-47d6-a5d0-783e8361feee",
+   "metadata": {},
+   "source": [
+    "Generate response for user query using context from dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c61b305-5f62-4f99-8577-0708ba5e5f28",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration\n",
+    "\n",
+    "tokenizer = RagTokenizer.from_pretrained(generator_model)\n",
+    "\n",
+    "# Construct retriever to return relevant context from dataset\n",
+    "retriever = RagRetriever.from_pretrained(\n",
+    "    generator_model, index_name=\"custom\", indexed_dataset=ds_with_embeddings\n",
+    ")\n",
+    "\n",
+    "model = RagSequenceForGeneration.from_pretrained(generator_model, retriever=retriever)\n",
+    "\n",
+    "# Move model to GPU\n",
+    "device = 0\n",
+    "model = model.to(device)\n",
+    "\n",
+    "input_dict = tokenizer.prepare_seq2seq_batch(input_query, return_tensors=\"pt\").to(device)\n",
+    "\n",
+    "generated = model.generate(input_ids=input_dict[\"input_ids\"])\n",
+    "print(tokenizer.batch_decode(generated, skip_special_tokens=True)[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d52ad0e5-de8c-49f8-8e2c-0a4811e4f095",
+   "metadata": {},
+   "source": [
+    "# Cleaning Up\n",
+    "\n",
+    "Delete model from GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff1c1fd1-ac51-42d4-b879-53d466b2c045",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "\n",
+    "del model, input_dict\n",
+    "torch.cuda.empty_cache()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}