Skip to content

Commit 8419eb7

Browse files
Laure-dibene2k1
andauthored
Apply suggestions from code review
Co-authored-by: Benedikt Rollik <[email protected]>
1 parent 69c9974 commit 8419eb7

File tree

1 file changed

+38
-38
lines changed

1 file changed

+38
-38
lines changed

tutorials/how-to-implement-rag/index.mdx

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ categories:
1111

1212
Retrieval-Augmented Generation (RAG) supercharges language models by enabling real-time retrieval of relevant information from external datasets. This hybrid approach boosts both the accuracy and contextual relevance of model outputs, making it essential for advanced AI applications.
1313

14-
In this comprehensive guide, you'll learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We'll combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management.
14+
In this comprehensive guide, you will learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We will combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management.
1515

1616
## Why LangChain?
1717
LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation.
1818

19-
## What You’ll Learn
19+
## What you will learn
2020
- How to embed text using a sentence transformer using ***Scaleway Manage Inference***
2121
- How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector
2222
- How to manage large datasets efficiently with ***Scaleway Object Storage***
@@ -40,7 +40,7 @@ Run the following command to install the required packages:
4040
```sh
4141
pip install langchain psycopg2 python-dotenv
4242
```
43-
### Step 2: Create a .env File
43+
### Step 2: Create a .env file
4444

4545
Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values.
4646

@@ -71,26 +71,26 @@ Create a .env file and add the following variables. These will store your API ke
7171

7272
## Setting Up Managed Databases
7373

74-
### Step 1: Connect to Your PostgreSQL Database
74+
### Step 1: Connect to your PostgreSQL database
7575

76-
To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
76+
To perform these actions, you will need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
7777

78-
### Step 2: Install the pgvector Extension
78+
### Step 2: Install the pgvector extension
7979

8080
[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
8181

8282
```sql
8383
CREATE EXTENSION IF NOT EXISTS vector;
8484
```
85-
### Step 3: Create a Table to Track Processed Documents
85+
### Step 3: Create a table to track processed documents
8686

8787
To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization:
8888

8989
```sql
9090
CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT);
9191
```
9292

93-
### Step 4: Connect to PostgreSQL Programmatically
93+
### Step 4: Connect to PostgreSQL programmatically
9494

9595
Connect to your PostgreSQL instance and perform tasks programmatically.
9696

@@ -143,7 +143,7 @@ embeddings = OpenAIEmbeddings(
143143
)
144144
```
145145

146-
#### Key Parameters:
146+
#### Key parameters:
147147
- `openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference.
148148
- `openai_api_base`: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings.
149149
- `model="sentence-transformers/sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems.
@@ -159,7 +159,7 @@ In the context of using Scaleway’s Managed Inference and the `sentence-t5-xxl`
159159
Moreover, leaving `tiktoken_enabled` as `True` causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior.
160160
By setting `tiktoken_enabled=False`, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure.
161161

162-
### Step 3: Create a PGVector Store
162+
### Step 3: Create a PGVector store
163163

164164
Configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings.
165165

@@ -172,11 +172,11 @@ vector_store = PGVector(connection=connection_string, embeddings=embeddings)
172172

173173
PGVector: This creates the vector store in your PostgreSQL database to store the embeddings.
174174

175-
## Load and Process Documents
175+
## Load and process documents
176176

177177
Use the [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database.
178178

179-
### Step 1: Import Required Modules
179+
### Step 1: Import required modules
180180

181181
```python
182182
#rag.py
@@ -188,9 +188,9 @@ from langchain_openai import OpenAIEmbeddings
188188

189189
```
190190

191-
### Step 2: Load Metadata for Improved Efficiency
191+
### Step 2: Load metadata for improved efficiency
192192

193-
Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document.
193+
Load metadata for improved efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document.
194194

195195
```python
196196
# rag.py
@@ -205,14 +205,14 @@ page_iterator = paginator.paginate(Bucket=BUCKET_NAME)
205205
```
206206

207207
In this code sample we:
208-
- Set Up a Boto3 Session: We initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
209-
- Create an S3 Client: We establish an S3 client to interact with the Scaleway Object storage service.
210-
- Set Up Pagination for Listing Objects: We prepare pagination to handle potentially large lists of objects efficiently.
211-
- Iterate Through the Bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
208+
- Set up a Boto3 session: We initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
209+
- Create an S3 client: We establish an S3 client to interact with the Scaleway Object Storage service.
210+
- Set up pagination for listing objects: We prepare pagination to handle potentially large lists of objects efficiently.
211+
- Iterate through the bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
212212

213-
### Step 3: Iterate Through Metadata
213+
### Step 3: Iterate through metadata
214214

215-
Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
215+
Iterate through metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
216216

217217
```python
218218
# rag.py
@@ -245,9 +245,9 @@ conn.commit()
245245

246246
- S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time.
247247
- RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once).
248-
- Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search.
249-
- Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query.
250-
- Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources.
248+
- Embedding the chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search.
249+
- Embedding storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query.
250+
- Avoiding redundant processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources.
251251

252252
#### Why 500 characters?
253253

@@ -262,7 +262,7 @@ When a query is made, the RAG system will retrieve the most relevant embeddings,
262262

263263
### Query the RAG System with a pre-defined prompt template
264264

265-
### Step 1: Import Required Modules
265+
### Step 1: Import required modules
266266

267267
```python
268268
#rag.py
@@ -273,7 +273,7 @@ from langchain_core.runnables import RunnablePassthrough
273273

274274
```
275275

276-
### Step 2: Setup LLM for Querying
276+
### Step 2: Setup LLM for querying
277277

278278
Now, set up the RAG system to handle queries
279279

@@ -301,15 +301,15 @@ for r in rag_chain.stream("Your question"):
301301
print(r, end="", flush=True)
302302
time.sleep(0.1)
303303
```
304-
- LLM Initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name.
304+
- LLM initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name.
305305

306-
- Prompt Setup: The prompt is pulled from the hub using a pre-defined template, ensuring consistent query formatting.
306+
- Prompt setup: The prompt is pulled from the hub using a pre-defined template, ensuring consistent query formatting.
307307

308-
- Retriever Configuration: We set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query.
308+
- Retriever configuration: We set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query.
309309

310-
- RAG Chain Construction: We create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow.
310+
- RAG chain construction: We create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow.
311311

312-
- Query Execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability.
312+
- Query execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability.
313313

314314
### Query the RAG system with you own prompt template
315315

@@ -339,22 +339,22 @@ for r in custom_rag_chain.stream({"question":"your question", "context": context
339339
time.sleep(0.1)
340340
```
341341

342-
- Prompt Template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information.
342+
- Prompt template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information.
343343
To make the responses more engaging, consider adding a light-hearted conclusion or a personalized touch. For example, you might modify the closing line to say, "Thank you for asking! I'm here to help with anything else you need!"
344-
Retrieving Context:
344+
Retrieving context:
345345
- The retriever.invoke(new_message) method fetches relevant information from your vector store based on the user’s query. It's essential that this step retrieves high-quality context to ensure that the model's responses are accurate and helpful.
346346
You can enhance the quality of the context by fine-tuning your embeddings and ensuring that the documents in your vector store are relevant and well-structured.
347-
Creating the RAG Chain:
347+
Creating the RAG chain:
348348
- The create_stuff_documents_chain function connects the language model with your custom prompt. This integration allows the model to process the retrieved context effectively and formulate a coherent and context-aware response.
349349
Consider experimenting with different chain configurations to see how they affect the output. For instance, using a different chain type may yield varied responses.
350-
Streaming Responses:
350+
Streaming responses:
351351
- The loop that streams responses from the custom_rag_chain provides a dynamic user experience. Instead of waiting for the entire output, users can see responses as they are generated, enhancing interactivity.
352352
You can customize the streaming behavior further, such as implementing progress indicators or more sophisticated UI elements for applications.
353353

354-
#### Example Use Cases
355-
- Customer Support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging.
356-
- Research Assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities.
357-
- Content Generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones.
354+
#### Example use cases
355+
- Customer support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging.
356+
- Research assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities.
357+
- Content generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones.
358358

359359
### Conclusion
360360

0 commit comments

Comments
 (0)