Skip to content

Commit fddb564

Browse files
committed
feat: add rag basic tutuorials
1 parent 7137548 commit fddb564

7 files changed

+1273
-0
lines changed

notebooks/tutorials/adalflow_rag_documents.ipynb

Lines changed: 443 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 376 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,376 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"<div style=\"display: flex; justify-content: flex-start; align-items: center; gap: 15px; margin-bottom: 20px;\">\n",
8+
" <a target=\"_blank\" href=\"https://colab.research.google.com/github.com/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_rag_vanilla.ipynb\">\n",
9+
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
10+
" </a>\n",
11+
" <a href=\"https://github.com/SylphAI-Inc/AdalFlow/blob/main/tutorials/adalflow_rag_vanilla.py\" target=\"_blank\" style=\"display: flex; align-items: center;\">\n",
12+
" <img src=\"https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png\" alt=\"GitHub\" style=\"height: 20px; width: 20px; margin-right: 5px;\">\n",
13+
" <span style=\"vertical-align: middle;\"> Open Source Code </span>\n",
14+
" </a>\n",
15+
"</div>"
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"metadata": {},
21+
"source": [
22+
"# 🤗 Welcome to AdalFlow!\n",
23+
"## The PyTorch library to auto-optimize any LLM task pipelines\n",
24+
"\n",
25+
"Thanks for trying us out, we're here to provide you with the best LLM application development experience you can dream of 😊 any questions or concerns you may have, [come talk to us on discord,](https://discord.gg/ezzszrRZvT) we're always here to help! ⭐ <i>Star us on <a href=\"https://github.com/SylphAI-Inc/AdalFlow\">Github</a> </i> ⭐\n",
26+
"\n",
27+
"\n",
28+
"# Quick Links\n",
29+
"\n",
30+
"Github repo: https://github.com/SylphAI-Inc/AdalFlow\n",
31+
"\n",
32+
"Full Tutorials: https://adalflow.sylph.ai/index.html#.\n",
33+
"\n",
34+
"Deep dive on each API: check out the [developer notes](https://adalflow.sylph.ai/tutorials/index.html).\n",
35+
"\n",
36+
"Common use cases along with the auto-optimization: check out [Use cases](https://adalflow.sylph.ai/use_cases/index.html).\n",
37+
"\n",
38+
"# Author\n",
39+
"This notebook was created by community contributor [Ajith](https://github.com/ajithvcoder/).\n",
40+
"\n",
41+
"# Outline\n",
42+
"\n",
43+
"This is a quick introduction of what AdalFlow is capable of. We will cover:\n",
44+
"\n",
45+
"* How to use adalflow for rag\n",
46+
"\n",
47+
"Adalflow can be used in a genric manner for any api provider without worrying much about prompt, \n",
48+
"model args and parsing results\n",
49+
"\n",
50+
"**Next: Try our [adalflow-rag-for-documents](\"https://colab.research.google.com/github.com/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_rag_documents.ipynb\")**\n",
51+
"\n",
52+
"\n",
53+
"# Installation\n",
54+
"\n",
55+
"1. Use `pip` to install the `adalflow` Python package. We will need `openai`, `groq`, and `faiss`(cpu version) from the extra packages.\n",
56+
"\n",
57+
" ```bash\n",
58+
" pip install torch --index-url https://download.pytorch.org/whl/cpu\n",
59+
" pip install sentence-transformers==3.3.1\n",
60+
" pip install adalflow[openai,groq,faiss-cpu]\n",
61+
" ```\n",
62+
"2. Setup `openai` and `groq` API key in the environment variables"
63+
]
64+
},
65+
{
66+
"cell_type": "markdown",
67+
"metadata": {},
68+
"source": [
69+
"### Set Environment Variables\n",
70+
"\n",
71+
"Note: Enter your api keys in below cell #todo"
72+
]
73+
},
74+
{
75+
"cell_type": "code",
76+
"execution_count": null,
77+
"metadata": {},
78+
"outputs": [
79+
{
80+
"name": "stdout",
81+
"output_type": "stream",
82+
"text": [
83+
"Overwriting .env\n"
84+
]
85+
}
86+
],
87+
"source": [
88+
"%%writefile .env\n",
89+
"\n",
90+
"OPENAI_API_KEY=\"PASTE-OPENAI_API_KEY_HERE\"\n",
91+
"GROQ_API_KEY=\"PASTE-GROQ_API_KEY-HERE\""
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": 1,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"from adalflow.utils import setup_env\n",
101+
"\n",
102+
"# Load environment variables - Make sure to have OPENAI_API_KEY in .env file and .env is present in current folder\n",
103+
"setup_env(\".env\")"
104+
]
105+
},
106+
{
107+
"cell_type": "code",
108+
"execution_count": 2,
109+
"metadata": {},
110+
"outputs": [
111+
{
112+
"name": "stderr",
113+
"output_type": "stream",
114+
"text": [
115+
"/workspace/ajithdev/AdalFlow/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
116+
" from .autonotebook import tqdm as notebook_tqdm\n"
117+
]
118+
}
119+
],
120+
"source": [
121+
"import os\n",
122+
"from typing import List, Dict\n",
123+
"import numpy as np\n",
124+
"from sentence_transformers import SentenceTransformer\n",
125+
"from faiss import IndexFlatL2\n",
126+
"\n",
127+
"from adalflow.components.model_client import GroqAPIClient, OpenAIClient\n",
128+
"from adalflow.core.types import ModelType\n",
129+
"from adalflow.utils import setup_env"
130+
]
131+
},
132+
{
133+
"cell_type": "markdown",
134+
"metadata": {},
135+
"source": [
136+
"`AdalflowRAGPipeline` is a class that implements a Retrieval-Augmented Generation (RAG) pipeline with adalflow. It integrates:\n",
137+
"\n",
138+
"- Embedding models (e.g., Sentence Transformers) for document and query embeddings.\n",
139+
"- FAISS for vector similarity search.\n",
140+
"- A LLM client to generate context-aware responses using retrieved documents."
141+
]
142+
},
143+
{
144+
"cell_type": "code",
145+
"execution_count": 3,
146+
"metadata": {},
147+
"outputs": [],
148+
"source": [
149+
"class AdalflowRAGPipeline:\n",
150+
" def __init__(self, \n",
151+
" model_client = None,\n",
152+
" model_kwargs = None,\n",
153+
" embedding_model='all-MiniLM-L6-v2', \n",
154+
" vector_dim=384, \n",
155+
" top_k_retrieval=1):\n",
156+
" \"\"\" \n",
157+
" Initialize RAG Pipeline with embedding and retrieval components\n",
158+
" \n",
159+
" Args:\n",
160+
" embedding_model (str): Sentence transformer model for embeddings\n",
161+
" vector_dim (int): Dimension of embedding vectors\n",
162+
" top_k_retrieval (int): Number of documents to retrieve\n",
163+
" \"\"\"\n",
164+
" # Initialize model client for generation\n",
165+
" self.model_client = model_client\n",
166+
" \n",
167+
" # Initialize embedding model\n",
168+
" self.embedding_model = SentenceTransformer(embedding_model)\n",
169+
" \n",
170+
" # Initialize FAISS index for vector similarity search\n",
171+
" self.index = IndexFlatL2(vector_dim)\n",
172+
" \n",
173+
" # Store document texts and their embeddings\n",
174+
" self.documents = []\n",
175+
" self.document_embeddings = []\n",
176+
" \n",
177+
" # Retrieval parameters\n",
178+
" self.top_k_retrieval = top_k_retrieval\n",
179+
" \n",
180+
" # Conversation history and context\n",
181+
" self.conversation_history = \"\"\n",
182+
" self.model_kwargs = model_kwargs\n",
183+
"\n",
184+
" def add_documents(self, documents: List[str]):\n",
185+
" \"\"\"\n",
186+
" Add documents to the RAG pipeline's knowledge base\n",
187+
" \n",
188+
" Args:\n",
189+
" documents (List[str]): List of document texts to add\n",
190+
" \"\"\"\n",
191+
" for doc in documents:\n",
192+
" # Embed document\n",
193+
" embedding = self.embedding_model.encode(doc)\n",
194+
" \n",
195+
" # Add to index and document store\n",
196+
" self.index.add(np.array([embedding]))\n",
197+
" self.documents.append(doc)\n",
198+
" self.document_embeddings.append(embedding)\n",
199+
"\n",
200+
" def retrieve_relevant_docs(self, query: str) -> List[str]:\n",
201+
" \"\"\"\n",
202+
" Retrieve most relevant documents for a given query\n",
203+
" \n",
204+
" Args:\n",
205+
" query (str): Input query to find relevant documents\n",
206+
" \n",
207+
" Returns:\n",
208+
" List[str]: Top k most relevant documents\n",
209+
" \"\"\"\n",
210+
" # Embed query\n",
211+
" query_embedding = self.embedding_model.encode(query)\n",
212+
" \n",
213+
" # Perform similarity search\n",
214+
" distances, indices = self.index.search(\n",
215+
" np.array([query_embedding]), \n",
216+
" self.top_k_retrieval\n",
217+
" )\n",
218+
" \n",
219+
" # Retrieve and return top documents\n",
220+
" return [self.documents[i] for i in indices[0]]\n",
221+
"\n",
222+
" def generate_response(self, query: str) -> str:\n",
223+
" \"\"\"\n",
224+
" Generate a response using retrieval-augmented generation\n",
225+
" \n",
226+
" Args:\n",
227+
" query (str): User's input query\n",
228+
" \n",
229+
" Returns:\n",
230+
" str: Generated response incorporating retrieved context\n",
231+
" \"\"\"\n",
232+
" # Retrieve relevant documents\n",
233+
" retrieved_docs = self.retrieve_relevant_docs(query)\n",
234+
" \n",
235+
" # Construct context-aware prompt\n",
236+
" context = \"\\n\\n\".join([f\"Context Document: {doc}\" for doc in retrieved_docs])\n",
237+
" full_prompt = f\"\"\"\n",
238+
" Context:\n",
239+
" {context}\n",
240+
" \n",
241+
" Query: {query}\n",
242+
" \n",
243+
" Generate a comprehensive and informative response that:\n",
244+
" 1. Uses the provided context documents\n",
245+
" 2. Directly answers the query\n",
246+
" 3. Incorporates relevant information from the context\n",
247+
" \"\"\"\n",
248+
" \n",
249+
" # Prepare API arguments\n",
250+
" api_kwargs = self.model_client.convert_inputs_to_api_kwargs(\n",
251+
" input=full_prompt,\n",
252+
" model_kwargs=self.model_kwargs,\n",
253+
" model_type=ModelType.LLM\n",
254+
" )\n",
255+
" \n",
256+
" # Call API and parse response\n",
257+
" response = self.model_client.call(\n",
258+
" api_kwargs=api_kwargs, \n",
259+
" model_type=ModelType.LLM\n",
260+
" )\n",
261+
" response_text = self.model_client.parse_chat_completion(response)\n",
262+
" \n",
263+
" # Update conversation history\n",
264+
" self.conversation_history += f\"\\nQuery: {query}\\nResponse: {response_text}\"\n",
265+
" \n",
266+
" return response_text\n"
267+
]
268+
},
269+
{
270+
"cell_type": "markdown",
271+
"metadata": {},
272+
"source": [
273+
"The `run_rag_pipeline` function demonstrates how to use the AdalflowRAGPipeline for embedding documents, retrieving relevant context, and generating responses:"
274+
]
275+
},
276+
{
277+
"cell_type": "code",
278+
"execution_count": 4,
279+
"metadata": {},
280+
"outputs": [],
281+
"source": [
282+
"def run_rag_pipeline(model_client, model_kwargs, documents, queries):\n",
283+
" rag_pipeline = AdalflowRAGPipeline(model_client=model_client, model_kwargs=model_kwargs)\n",
284+
"\n",
285+
" rag_pipeline.add_documents(documents)\n",
286+
"\n",
287+
" # Generate responses\n",
288+
" for query in queries:\n",
289+
" print(f\"\\nQuery: {query}\")\n",
290+
" response = rag_pipeline.generate_response(query)\n",
291+
" print(f\"Response: {response}\")"
292+
]
293+
},
294+
{
295+
"cell_type": "code",
296+
"execution_count": null,
297+
"metadata": {},
298+
"outputs": [
299+
{
300+
"name": "stdout",
301+
"output_type": "stream",
302+
"text": [
303+
"\n",
304+
"Query: Does Ajith Kumar has any nick name ?\n",
305+
"Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=78, prompt_tokens=122, total_tokens=200), raw_response='Based on the provided context documents, Ajith Kumar, also known as Ajithvcoder, has a nickname that he has given himself. According to the context, Ajithvcoder is his nickname that he has chosen for himself.\\n\\nTherefore, the answer to the query is:\\n\\nYes, Ajith Kumar has a nickname that he has given himself, which is Ajithvcoder.', metadata=None)\n",
306+
"\n",
307+
"Query: What is the ajithvcoder's favourite food?\n",
308+
"Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=67, prompt_tokens=109, total_tokens=176), raw_response='Based on the provided context document, I can confidently answer the query as follows:\\n\\nAjithvcoder\\'s favourite food is Hyderabadi Panner Dum Briyani.\\n\\nThis answer is directly supported by the context document, which states: \"ajithvcoder likes Hyderabadi panner dum briyani much.\"', metadata=None)\n",
309+
"\n",
310+
"Query: When did ajithvcoder graduated ?\n",
311+
"Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=57, prompt_tokens=107, total_tokens=164), raw_response=\"Based on the provided context documents, we can determine that Ajith V.Coder graduated on May 2016.\\n\\nHere's a comprehensive and informative response that directly answers the query:\\n\\nAjith V.Coder graduated on May 2016, which is mentioned in the context document.\", metadata=None)\n"
312+
]
313+
}
314+
],
315+
"source": [
316+
"# setup_env()\n",
317+
"\n",
318+
"# ajithvcoder's statements are added so that we can validate that the LLM is generating from these lines only\n",
319+
"documents = [\n",
320+
" \"ajithvcoder is a good person whom the world knows as Ajith Kumar, ajithvcoder is his nick name that AjithKumar gave himself\",\n",
321+
" \"The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair.\",\n",
322+
" \"ajithvcoder likes Hyderabadi panner dum briyani much.\",\n",
323+
" \"The Louvre Museum in Paris is the world's largest art museum, housing thousands of works of art.\",\n",
324+
" \"ajithvcoder has a engineering degree and he graduated on May, 2016.\"\n",
325+
"]\n",
326+
"\n",
327+
"# Questions related to ajithvcoder's are added so that we can validate\n",
328+
"# that the LLM is generating from above given lines only\n",
329+
"queries = [\n",
330+
" \"Does Ajith Kumar has any nick name ?\",\n",
331+
" \"What is the ajithvcoder's favourite food?\",\n",
332+
" \"When did ajithvcoder graduated ?\"\n",
333+
"]\n",
334+
"\n",
335+
"groq_model_kwargs = {\n",
336+
" \"model\": \"llama-3.2-1b-preview\", # Use 16k model for larger context\n",
337+
" \"temperature\": 0.1,\n",
338+
" \"max_tokens\": 800,\n",
339+
"}\n",
340+
"\n",
341+
"openai_model_kwargs = {\n",
342+
" \"model\": \"gpt-3.5-turbo\", # Use 16k model for larger context\n",
343+
" \"temperature\": 0.1,\n",
344+
" \"max_tokens\": 800,\n",
345+
"}\n",
346+
"\n",
347+
"# Below example shows that adalflow can be used in a genric manner for any api provider\n",
348+
"# without worrying about prompt and parsing results\n",
349+
"model_client = GroqAPIClient()\n",
350+
"run_rag_pipeline(model_client, groq_model_kwargs, documents, queries)\n",
351+
"run_rag_pipeline(OpenAIClient(), openai_model_kwargs, documents, queries)\n"
352+
]
353+
}
354+
],
355+
"metadata": {
356+
"kernelspec": {
357+
"display_name": ".venv",
358+
"language": "python",
359+
"name": "python3"
360+
},
361+
"language_info": {
362+
"codemirror_mode": {
363+
"name": "ipython",
364+
"version": 3
365+
},
366+
"file_extension": ".py",
367+
"mimetype": "text/x-python",
368+
"name": "python",
369+
"nbconvert_exporter": "python",
370+
"pygments_lexer": "ipython3",
371+
"version": "3.12.7"
372+
}
373+
},
374+
"nbformat": 4,
375+
"nbformat_minor": 2
376+
}

0 commit comments

Comments
 (0)