Skip to content

Commit f2c82fa

Browse files
committed
feat: add rst files for rag_vanilla, rag_documents
1 parent fddb564 commit f2c82fa

File tree

2 files changed

+585
-0
lines changed

2 files changed

+585
-0
lines changed
Lines changed: 322 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
.. raw:: html
2+
3+
<div style="display: flex; justify-content: flex-start; align-items: center; margin-bottom: 20px;">
4+
<a href="https://colab.research.google.com/github/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_rag_documents.ipynb" target="_blank" style="margin-right: 20px;">
5+
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" style="height: 20px;">
6+
</a>
7+
8+
<a href="https://github.com/SylphAI-Inc/AdalFlow/tree/main/tutorials/adalflow_rag_documents.py" target="_blank" style="display: flex; align-items: center;">
9+
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="GitHub" style="height: 20px; width: 20px; margin-right: 5px;">
10+
<span style="vertical-align: middle;"> Open Source Code</span>
11+
</a>
12+
</div>
13+
14+
RAG for documents
15+
=============================
16+
17+
Overview
18+
--------
19+
20+
This implementation showcases an end-to-end RAG system capable of handling large-scale text files and
21+
generating context-aware responses. It is both modular and extensible, making it adaptable to various
22+
use cases and LLM APIs.
23+
24+
**Imports**
25+
26+
- **SentenceTransformer**: Used for creating dense vector embeddings for textual data.
27+
- **FAISS**: Provides efficient similarity search using vector indexing.
28+
- **tiktoken**: ensures that the text preprocessing aligns with the tokenization requirements of the underlying language models, making the pipeline robust and efficient.
29+
- **GroqAPIClient and OpenAIClient**: Custom classes for interacting with different LLM providers.
30+
- **ModelType**: Enum for specifying the model type.
31+
32+
.. code-block:: python
33+
34+
import os
35+
import tiktoken
36+
from typing import List, Dict, Tuple
37+
import numpy as np
38+
from sentence_transformers import SentenceTransformer
39+
from faiss import IndexFlatL2
40+
41+
from adalflow.components.model_client import GroqAPIClient, OpenAIClient
42+
from adalflow.core.types import ModelType
43+
from adalflow.utils import setup_env
44+
45+
The ``AdalflowRAGPipeline`` class sets up the Retrieval-Augmented Generation (RAG) pipeline. Its ``__init__`` method initializes key components:
46+
47+
- An embedding model (``all-MiniLM-L6-v2``) is loaded using ``SentenceTransformer`` to convert text into dense vector embeddings with a dimensionality of 384.
48+
- A FAISS index (``IndexFlatL2``) is created for similarity-based document retrieval.
49+
- Parameters such as ``top_k_retrieval`` (number of documents to retrieve) and ``max_context_tokens`` (limit on token count in the context) are configured.
50+
- A tokenizer (``tiktoken``) ensures precise token counting, crucial for handling large language models (LLMs).
51+
52+
The method also initializes storage for documents, their embeddings, and associated metadata for efficient management and retrieval.
53+
54+
The ``AdalflowRAGPipeline`` class provides a flexible pipeline for Retrieval-Augmented Generation (RAG),
55+
initializing with parameters such as the embedding model (``all-MiniLM-L6-v2`` by default), vector dimension,
56+
top-k retrieval count, and token limits for context. It utilizes a tokenizer for token counting, a
57+
SentenceTransformer for embeddings, and a FAISS index for similarity searches, while also maintaining
58+
document data and metadata. The ``load_text_file`` method processes large text files into manageable chunks
59+
by splitting the content into fixed line groups, facilitating easier embedding and storage. To handle
60+
multiple files, ``add_documents_from_directory`` iterates over text files in a directory, embeds the content,
61+
and stores them in the FAISS index along with metadata. Token counting is achieved via the ``count_tokens``
62+
method, leveraging a tokenizer to precisely determine the number of tokens in a given text. The
63+
``retrieve_and_truncate_context`` method fetches the most relevant documents from the FAISS index based on
64+
query embeddings, truncating the context to adhere to token limits. Finally, the ``generate_response`` method
65+
constructs a comprehensive prompt by combining the retrieved context and query, invokes the provided model
66+
client for a response, and parses the results into a readable format. This pipeline demonstrates seamless
67+
integration of text retrieval and generation to handle large-scale document queries effectively.
68+
69+
70+
.. code-block:: python
71+
72+
class AdalflowRAGPipeline:
73+
def __init__(self,
74+
model_client=None,
75+
model_kwargs=None,
76+
embedding_model='all-MiniLM-L6-v2',
77+
vector_dim=384,
78+
top_k_retrieval=3,
79+
max_context_tokens=800):
80+
"""
81+
Initialize RAG Pipeline for handling large text files
82+
83+
Args:
84+
embedding_model (str): Sentence transformer model for embeddings
85+
vector_dim (int): Dimension of embedding vectors
86+
top_k_retrieval (int): Number of documents to retrieve
87+
max_context_tokens (int): Maximum tokens to send to LLM
88+
"""
89+
# Initialize model client for generation
90+
self.model_client = model_client
91+
92+
# Initialize tokenizer for precise token counting
93+
self.tokenizer = tiktoken.get_encoding("cl100k_base")
94+
95+
# Initialize embedding model
96+
self.embedding_model = SentenceTransformer(embedding_model)
97+
98+
# Initialize FAISS index for vector similarity search
99+
self.index = IndexFlatL2(vector_dim)
100+
101+
# Store document texts, embeddings, and metadata
102+
self.documents = []
103+
self.document_embeddings = []
104+
self.document_metadata = []
105+
106+
# Retrieval and context management parameters
107+
self.top_k_retrieval = top_k_retrieval
108+
self.max_context_tokens = max_context_tokens
109+
110+
# Model generation parameters
111+
self.model_kwargs = model_kwargs
112+
113+
def load_text_file(self, file_path: str) -> List[str]:
114+
"""
115+
Load a large text file and split into manageable chunks
116+
117+
Args:
118+
file_path (str): Path to the text file
119+
120+
Returns:
121+
List[str]: List of document chunks
122+
"""
123+
with open(file_path, 'r', encoding='utf-8') as file:
124+
# Read entire file
125+
content = file.read()
126+
127+
# Split content into chunks (e.g., 10 lines per chunk)
128+
lines = content.split('\n')
129+
chunks = []
130+
chunk_size = 10 # Adjust based on your file structure
131+
132+
for i in range(0, len(lines), chunk_size):
133+
chunk = '\n'.join(lines[i:i+chunk_size])
134+
chunks.append(chunk)
135+
136+
return chunks
137+
138+
def add_documents_from_directory(self, directory_path: str):
139+
"""
140+
Add documents from all text files in a directory
141+
142+
Args:
143+
directory_path (str): Path to directory containing text files
144+
"""
145+
for filename in os.listdir(directory_path):
146+
if filename.endswith('.txt'):
147+
file_path = os.path.join(directory_path, filename)
148+
document_chunks = self.load_text_file(file_path)
149+
150+
for chunk in document_chunks:
151+
# Embed document chunk
152+
embedding = self.embedding_model.encode(chunk)
153+
154+
# Add to index and document store
155+
self.index.add(np.array([embedding]))
156+
self.documents.append(chunk)
157+
self.document_embeddings.append(embedding)
158+
self.document_metadata.append({
159+
'filename': filename,
160+
'chunk_index': len(self.document_metadata)
161+
})
162+
163+
def count_tokens(self, text: str) -> int:
164+
"""
165+
Count tokens in a given text
166+
167+
Args:
168+
text (str): Input text
169+
170+
Returns:
171+
int: Number of tokens
172+
"""
173+
return len(self.tokenizer.encode(text))
174+
175+
def retrieve_and_truncate_context(self, query: str) -> str:
176+
"""
177+
Retrieve relevant documents and truncate to fit token limit
178+
179+
Args:
180+
query (str): Input query
181+
182+
Returns:
183+
str: Concatenated context within token limit
184+
"""
185+
# Retrieve relevant documents
186+
query_embedding = self.embedding_model.encode(query)
187+
distances, indices = self.index.search(
188+
np.array([query_embedding]),
189+
self.top_k_retrieval
190+
)
191+
192+
# Collect and truncate context
193+
context = []
194+
current_tokens = 0
195+
196+
for idx in indices[0]:
197+
doc = self.documents[idx]
198+
doc_tokens = self.count_tokens(doc)
199+
200+
# Check if adding this document would exceed token limit
201+
if current_tokens + doc_tokens <= self.max_context_tokens:
202+
context.append(doc)
203+
current_tokens += doc_tokens
204+
else:
205+
break
206+
207+
return "\n\n".join(context)
208+
209+
def generate_response(self, query: str) -> str:
210+
"""
211+
Generate a response using retrieval-augmented generation
212+
213+
Args:
214+
query (str): User's input query
215+
216+
Returns:
217+
str: Generated response incorporating retrieved context
218+
"""
219+
# Retrieve and truncate context
220+
retrieved_context = self.retrieve_and_truncate_context(query)
221+
222+
# Construct context-aware prompt
223+
full_prompt = f"""
224+
Context Documents:
225+
{retrieved_context}
226+
227+
Query: {query}
228+
229+
Generate a comprehensive response that:
230+
1. Directly answers the query
231+
2. Incorporates relevant information from the context documents
232+
3. Provides clear and concise information
233+
"""
234+
235+
# Prepare API arguments
236+
api_kwargs = self.model_client.convert_inputs_to_api_kwargs(
237+
input=full_prompt,
238+
model_kwargs=self.model_kwargs,
239+
model_type=ModelType.LLM
240+
)
241+
242+
# Call API and parse response
243+
response = self.model_client.call(
244+
api_kwargs=api_kwargs,
245+
model_type=ModelType.LLM
246+
)
247+
response_text = self.model_client.parse_chat_completion(response)
248+
249+
return response_text
250+
251+
The ``run_rag_pipeline`` function demonstrates how to use the ``AdalflowRAGPipeline``. It initializes the pipeline,
252+
adds documents from a directory, and generates responses for a list of user queries. The function is generic
253+
and can accommodate various LLM API clients, such as GroqAPIClient or OpenAIClient, highlighting the pipeline's
254+
flexibility and modularity.
255+
256+
257+
.. code-block:: python
258+
259+
def run_rag_pipeline(model_client, model_kwargs, documents, queries):
260+
261+
# Example usage of RAG pipeline
262+
rag_pipeline = AdalflowRAGPipeline(
263+
model_client=model_client,
264+
model_kwargs=model_kwargs,
265+
top_k_retrieval=1, # Retrieve top 1 most relevant chunks
266+
max_context_tokens=800 # Limit context to 1500 tokens
267+
)
268+
269+
# Add documents from a directory of text files
270+
rag_pipeline.add_documents_from_directory(documents)
271+
272+
# Generate responses
273+
for query in queries:
274+
print(f"\nQuery: {query}")
275+
response = rag_pipeline.generate_response(query)
276+
print(f"Response: {response}")
277+
278+
279+
This block provides an example of running the pipeline with different models and queries. It specifies:
280+
281+
- The document directory containing the text files.
282+
- Example queries about topics such as the "Crystal Cavern" and "rare trees in Elmsworth."
283+
- Configuration for Groq and OpenAI model parameters, including the model type, temperature, and token limits.
284+
285+
.. code-block:: python
286+
287+
documents = '../../tutorials/assets/documents'
288+
289+
queries = [
290+
"What year was the Crystal Cavern discovered?",
291+
"What is the name of the rare tree in Elmsworth?",
292+
"What local legend claim that Lunaflits surrounds?"
293+
]
294+
295+
groq_model_kwargs = {
296+
"model": "llama-3.2-1b-preview", # Use 16k model for larger context
297+
"temperature": 0.1,
298+
"max_tokens": 800,
299+
}
300+
301+
openai_model_kwargs = {
302+
"model": "gpt-3.5-turbo",
303+
"temperature": 0.1,
304+
"max_tokens": 800,
305+
}
306+
# Below example shows that adalflow can be used in a genric manner for any api provider
307+
# without worrying about prompt and parsing results
308+
run_rag_pipeline(GroqAPIClient(), groq_model_kwargs, documents, queries)
309+
run_rag_pipeline(OpenAIClient(), openai_model_kwargs, documents, queries)
310+
311+
The example emphasizes that ``AdalflowRAGPipeline`` can interact seamlessly with multiple API providers,
312+
enabling integration with diverse LLMs without modifying the core logic for prompt construction or
313+
response parsing.
314+
315+
316+
.. admonition:: API reference
317+
:class: highlight
318+
319+
- :class:`utils.setup_env`
320+
- :class:`core.types.ModelType`
321+
- :class:`components.model_client.OpenAIClient`
322+
- :class:`components.model_client.GroqAPIClient`

0 commit comments

Comments
 (0)