Skip to content

Commit bbdd0d9

Browse files
tgenaitaybene2k1SamyOubouaziz
authored
feat(genapi): tutorials for Generative APIs (#3802)
* feat(genapi): rag tutorial basics * feat(genapi): improved instructions * feat(genapi): simplified * feat(genapi): fixed errors and created pixtral tutorial * feat(genapi): add picture * feat(genapi): added links * Apply suggestions from code review Co-authored-by: SamyOubouaziz <[email protected]> * Apply suggestions from code review Co-authored-by: SamyOubouaziz <[email protected]> * Apply suggestions from code review Co-authored-by: SamyOubouaziz <[email protected]> * Update tutorials/how-to-implement-rag-generativeapis/index.mdx Co-authored-by: SamyOubouaziz <[email protected]> * Update tutorials/how-to-implement-rag-generativeapis/index.mdx Co-authored-by: SamyOubouaziz <[email protected]> * Apply suggestions from code review Co-authored-by: SamyOubouaziz <[email protected]> * docs(fix): categories --------- Co-authored-by: Benedikt Rollik <[email protected]> Co-authored-by: SamyOubouaziz <[email protected]> Co-authored-by: Benedikt Rollik <[email protected]>
1 parent 18b3dc2 commit bbdd0d9

File tree

4 files changed

+617
-1
lines changed

4 files changed

+617
-1
lines changed
Lines changed: 357 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
---
2+
meta:
3+
title: Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Generative APIs
4+
description: Step by step Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Generative APIs
5+
content:
6+
h1: Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Generative APIs
7+
tags: inference API postgresql pgvector object storage RAG langchain AI LLMs embeddings
8+
dates:
9+
validation: 2024-10-10
10+
posted: 2018-10-10
11+
categories:
12+
- managed-inference
13+
---
14+
15+
Retrieval-Augmented Generation (RAG) enhances language models by incorporating relevant information from your own datasets. This hybrid approach improves both the accuracy and contextual relevance of the model's outputs, making it ideal for advanced AI applications.
16+
17+
In this tutorial, you will learn how to implement RAG using LangChain, a leading framework for developing powerful language model applications. We will integrate LangChain with **Scaleway’s Generative APIs**, **Scaleway’s PostgreSQL Managed Database** (utilizing `pgvector` for vector storage), and **Scaleway’s Object Storage** to ensure seamless data management and efficient integration.
18+
19+
## What you will learn
20+
- How to embed text using ***Scaleway Generative APIs***
21+
- How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector
22+
- How to manage large datasets efficiently with ***Scaleway Object Storage***
23+
24+
<Macro id="requirements" />
25+
26+
- A Scaleway account logged into the [console](https://console.scaleway.com)
27+
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
28+
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/)
29+
- Access to the [Generative APIs service](ai-data/generative-apis/quickstart/)
30+
- An [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model.
31+
- A [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings.
32+
33+
## Configure your development environment
34+
35+
### Install required packages
36+
37+
Run the following command to install the required packages:
38+
39+
```sh
40+
pip install langchain psycopg2 python-dotenv langchainhub
41+
```
42+
### Create a .env file
43+
44+
Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values.
45+
46+
```sh
47+
# .env file
48+
49+
# Scaleway API credentials https://console.scaleway.com/iam/api-keys
50+
## Will be used to authenticate to Scaleway Object Storage and Scaleway Generative APIs
51+
SCW_ACCESS_KEY=your_scaleway_access_key_id
52+
SCW_API_KEY=your_scaleway_secret_key
53+
54+
# Scaleway Managed Database (PostgreSQL) credentials
55+
## Will be used to store embeddings of your proprietary data
56+
SCW_DB_USER=your_scaleway_managed_db_username
57+
SCW_DB_PASSWORD=your_scaleway_managed_db_password
58+
SCW_DB_NAME="rdb"
59+
SCW_DB_HOST=your_scaleway_managed_db_host # The IP address of your database instance
60+
SCW_DB_PORT=your_scaleway_managed_db_port # The port number for your database instance
61+
62+
# Scaleway S3 bucket configuration
63+
## Will be used to store your proprietary data (PDF, CSV etc)
64+
SCW_BUCKET_NAME=your_scaleway_bucket_name
65+
SCW_REGION=fr-par
66+
SCW_BUCKET_ENDPOINT="https://s3.{{SCW_REGION}}.scw.cloud" # S3 main endpoint, e.g., https://s3.fr-par.scw.cloud
67+
68+
# Scaleway Generative APIs endpoint
69+
## LLM and Embedding model are served through this base URL
70+
SCW_GENERATIVE_APIs_ENDPOINT="https://api.scaleway.ai/v1"
71+
```
72+
73+
## Setting Up Scaleway Managed Database
74+
75+
### Connect to your PostgreSQL database
76+
77+
You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
78+
79+
### Install the pgvector extension
80+
81+
[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
82+
83+
```sql
84+
CREATE EXTENSION IF NOT EXISTS vector;
85+
```
86+
### Create a table to track processed documents
87+
88+
To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization:
89+
90+
```sql
91+
CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT);
92+
```
93+
94+
### Connect to PostgreSQL programmatically
95+
96+
Connect to your PostgreSQL instance and perform tasks programmatically.
97+
98+
```python
99+
# rag.py file
100+
101+
from dotenv import load_dotenv
102+
import psycopg2
103+
import os
104+
105+
# Load environment variables
106+
load_dotenv()
107+
108+
# Establish connection to PostgreSQL database using environment variables
109+
conn = psycopg2.connect(
110+
database=os.getenv("SCW_DB_NAME"),
111+
user=os.getenv("SCW_DB_USER"),
112+
password=os.getenv("SCW_DB_PASSWORD"),
113+
host=os.getenv("SCW_DB_HOST"),
114+
port=os.getenv("SCW_DB_PORT")
115+
)
116+
117+
# Create a cursor to execute SQL commands
118+
cur = conn.cursor()
119+
```
120+
121+
## Embeddings and vector store setup
122+
123+
### Import required modules
124+
125+
```python
126+
# rag.py
127+
128+
from langchain_openai import OpenAIEmbeddings
129+
from langchain_postgres import PGVector
130+
```
131+
132+
### Configure OpenAI Embeddings
133+
134+
We will use the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
135+
136+
```python
137+
# rag.py
138+
139+
embeddings = OpenAIEmbeddings(
140+
openai_api_key=os.getenv("SCW_API_KEY"),
141+
openai_api_base=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
142+
model="sentence-t5-xxl",
143+
tiktoken_enabled=False,
144+
)
145+
```
146+
147+
#### Key parameters:
148+
- `openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, hosted by Scaleway’s Generative APIs.
149+
- `openai_api_base`: This is the base URL that points Scaleway Generative APIs where the embedding model is hosted. This URL serves as the entry point to make API calls for generating embeddings.
150+
- `model="sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems.
151+
- `tiktoken_enabled=False`: This parameter disables the use of TikToken for tokenization within the embeddings process.
152+
153+
### Create a pgvector store
154+
155+
Configure the connection string for your PostgreSQL instance and create a pgvector store to store these embeddings.
156+
157+
```python
158+
# rag.py
159+
160+
connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}"
161+
vector_store = PGVector(connection=connection_string, embeddings=embeddings)
162+
```
163+
164+
## Load and process documents
165+
166+
At this stage, you need to have proprietary data (e.g., PDF, CSV) stored in your Scaleway Object storage bucket.
167+
168+
Below we will use LangChain's [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents, and split them into chunks.
169+
Then, we will embed and store them in your PostgreSQL database.
170+
171+
### Import required modules
172+
173+
```python
174+
#rag.py
175+
176+
import boto3
177+
from langchain_community.document_loaders import S3FileLoader
178+
from langchain.text_splitter import RecursiveCharacterTextSplitter
179+
180+
```
181+
182+
### Load metadata for improved efficiency
183+
184+
By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document.
185+
186+
```python
187+
# rag.py
188+
189+
session = boto3.session.Session()
190+
client_s3 = session.client(service_name='s3', endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
191+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
192+
aws_secret_access_key=os.getenv("SCW_API_KEY", ""))
193+
paginator = client_s3.get_paginator('list_objects_v2')
194+
page_iterator = paginator.paginate(Bucket=os.getenv("SCW_BUCKET_NAME", ""))
195+
```
196+
197+
In this code sample, we:
198+
- Set up a Boto3 session: we initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
199+
- Create an S3 client: we establish an S3 client to interact with the Scaleway Object Storage service.
200+
- Set up pagination for listing objects: we prepare pagination to handle potentially large lists of objects efficiently.
201+
- Iterate through the bucket: this initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
202+
203+
### Iterate through metadata
204+
205+
Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
206+
207+
```python
208+
# rag.py
209+
210+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False)
211+
for page in page_iterator:
212+
for obj in page.get('Contents', []):
213+
cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],))
214+
response = cur.fetchone()
215+
if response is None:
216+
file_loader = S3FileLoader(
217+
bucket=os.getenv("SCW_BUCKET_NAME", ""),
218+
key=obj['Key'],
219+
endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
220+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
221+
aws_secret_access_key=os.getenv("SCW_API_KEY", "")
222+
)
223+
file_to_load = file_loader.load()
224+
cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],))
225+
chunks = text_splitter.split_text(file_to_load[0].page_content)
226+
try:
227+
embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks]
228+
vector_store.add_embeddings(chunks, embeddings_list)
229+
cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],))
230+
except Exception as e:
231+
logger.error(f"An error occurred: {e}")
232+
233+
conn.commit()
234+
```
235+
236+
- S3FileLoader: the S3FileLoader loads each file individually from your **Scaleway Object Storage bucket** using the file's `object_key` (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time.
237+
- RecursiveCharacterTextSplitter: the RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial, because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once).
238+
- Embedding the chunks: for each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search.
239+
- Embedding storage: after generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query.
240+
- Avoiding redundant processing: the script checks the `object_loaded` table in PostgreSQL to see if a document has already been processed (i.e., the `object_key` exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources.
241+
242+
#### Why 500 characters?
243+
244+
The chunk size of 500 characters is chosen to fit comfortably within the context size limit of the embedding model used in this tutorial. By keeping chunks small, we avoid exceeding the model’s context window, which could lead to truncated embeddings or poor performance during inference.
245+
246+
#### Why store both chunk and embedding?
247+
248+
Storing both the chunk and its corresponding embedding allows for efficient document retrieval later.
249+
When a query is made, the RAG system will retrieve the most relevant embeddings, and the corresponding text chunks will be used to generate the final response.
250+
251+
### Query the RAG System with a pre-defined prompt template
252+
253+
### Import required modules
254+
255+
```python
256+
#rag.py
257+
258+
from langchain import hub
259+
from langchain_core.output_parsers import StrOutputParser
260+
from langchain_core.runnables import RunnablePassthrough
261+
from langchain_openai import ChatOpenAI
262+
263+
```
264+
265+
### Set up LLM for querying
266+
267+
Now, set up the RAG system to handle queries
268+
269+
```python
270+
#rag.py
271+
272+
llm = ChatOpenAI(
273+
base_url=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
274+
api_key=os.getenv("SCW_API_KEY"),
275+
model="llama-3.1-8b-instruct",
276+
)
277+
278+
prompt = hub.pull("rlm/rag-prompt")
279+
retriever = vector_store.as_retriever()
280+
281+
282+
rag_chain = (
283+
{"context": retriever, "question": RunnablePassthrough()}
284+
| prompt
285+
| llm
286+
| StrOutputParser()
287+
)
288+
289+
for r in rag_chain.stream("Your question"):
290+
print(r, end="", flush=True)
291+
time.sleep(0.1)
292+
```
293+
- LLM initialization: we initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name.
294+
295+
- Prompt setup: the prompt is pulled from the hub using a predefined template, ensuring consistent query formatting.
296+
297+
- Retriever configuration: we set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query.
298+
299+
- RAG chain construction: we create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow.
300+
301+
- Query execution: finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability.
302+
303+
### Query the RAG system with your own prompt template
304+
305+
Personalizing your prompt template allows you to tailor the responses from your RAG (Retrieval-Augmented Generation) system to better fit your specific needs. This can significantly improve the relevance and tone of the answers you receive. Below is a detailed guide on how to create a custom prompt for querying the system.
306+
307+
```python
308+
#rag.py
309+
310+
from langchain.chains.combine_documents import create_stuff_documents_chain
311+
from langchain_core.prompts import PromptTemplate
312+
from langchain_openai import ChatOpenAI
313+
import time
314+
315+
llm = ChatOpenAI(
316+
base_url=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
317+
api_key=os.getenv("SCW_SECRET_KEY"),
318+
model="llama-3.1-8b-instruct",
319+
)
320+
prompt = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Always finish your answer with "Thank you for asking". {context} Question: {question} Helpful Answer:"""
321+
custom_rag_prompt = PromptTemplate.from_template(prompt)
322+
retriever = vector_store.as_retriever()
323+
custom_rag_chain = create_stuff_documents_chain(llm, custom_rag_prompt)
324+
325+
326+
context = retriever.invoke("your question")
327+
for r in custom_rag_chain.stream({"question":"your question", "context": context}):
328+
print(r, end="", flush=True)
329+
time.sleep(0.1)
330+
```
331+
332+
- Prompt template: the prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information.
333+
To make the responses more engaging, consider adding a light-hearted conclusion or a personalized touch. For example, you might modify the closing line to say, "Thank you for asking! I'm here to help with anything else you need!"
334+
Retrieving context:
335+
- The retriever.invoke(new_message) method fetches relevant information from your vector store based on the user’s query. It's essential that this step retrieves high-quality context to ensure that the model's responses are accurate and helpful.
336+
You can enhance the quality of the context by fine-tuning your embeddings and ensuring that the documents in your vector store are relevant and well-structured.
337+
Creating the RAG chain:
338+
- The create_stuff_documents_chain function connects the language model with your custom prompt. This integration allows the model to process the retrieved context effectively and formulate a coherent and context-aware response.
339+
Consider experimenting with different chain configurations to see how they affect the output. For instance, using a different chain type may yield varied responses.
340+
Streaming responses:
341+
- The loop that streams responses from the custom_rag_chain provides a dynamic user experience. Instead of waiting for the entire output, users can see responses as they are generated, enhancing interactivity.
342+
You can customize the streaming behavior further, such as implementing progress indicators or more sophisticated UI elements for applications.
343+
344+
#### Example use cases
345+
- Customer support: use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging.
346+
- Research assistance: tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities.
347+
- Content generation: personalize prompts for creative writing, generating responses that align with specific themes or tones.
348+
349+
## Conclusion
350+
351+
In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets within a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we ensured that our system avoids redundant data handling, allowing for smooth and efficient operations. The use of chunking optimizes document processing, maximizing the performance of the language model. Storing embeddings in PostgreSQL via pgvector enables rapid and scalable retrieval, ensuring quick responses to user queries.
352+
353+
Furthermore, you can continually enhance your RAG system by implementing mechanisms to retain chat history. Keeping track of past interactions allows for more contextually aware responses, fostering a more engaging user experience. This historical data can be used to refine your prompts, adapt to user preferences, and improve the overall accuracy of responses.
354+
355+
By integrating Scaleway Object Storage, Managed Database for PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently.
356+
357+
With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit.

0 commit comments

Comments
 (0)