Skip to content

Commit 5efc63e

Browse files
authored
fix(genapi): Remove direct connection to database
1 parent c9b993f commit 5efc63e

File tree

1 file changed

+10
-39
lines changed
  • tutorials/how-to-implement-rag-generativeapis

1 file changed

+10
-39
lines changed

tutorials/how-to-implement-rag-generativeapis/index.mdx

Lines changed: 10 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Run the following command to install the required python packages:
4040
pip install langchain langchainhub langchain_openai langchain_community langchain_postgres unstructured "unstructured[pdf]" libmagic psycopg2 python-dotenv boto3
4141
```
4242

43-
If you are on MacOS, run the following command to install dependencies required by `unstructured` package:
43+
If you are on MacOS, run the following command to install dependencies required by the `unstructured` package:
4444
```sh
4545
brew install libmagic poppler tesseract qpdf
4646
```
@@ -78,44 +78,15 @@ Create a .env file and add the following variables. These will store your API ke
7878

7979
## Setting Up Scaleway Managed Database
8080

81-
### Connect to your PostgreSQL database
82-
83-
You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
84-
85-
### Install the pgvector extension
86-
87-
[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
88-
89-
```sql
90-
CREATE EXTENSION IF NOT EXISTS vector;
91-
```
92-
9381
### Connect to PostgreSQL programmatically
9482

9583
Connect to your PostgreSQL instance and perform tasks programmatically.
9684

9785
```python
9886
# rag.py file
9987

100-
from dotenv import load_dotenv
101-
import psycopg2
102-
import os
103-
import logging
104-
105-
# Load environment variables
106-
load_dotenv()
10788

108-
# Establish connection to PostgreSQL database using environment variables
109-
conn = psycopg2.connect(
110-
database=os.getenv("SCW_DB_NAME"),
111-
user=os.getenv("SCW_DB_USER"),
112-
password=os.getenv("SCW_DB_PASSWORD"),
113-
host=os.getenv("SCW_DB_HOST"),
114-
port=os.getenv("SCW_DB_PORT")
115-
)
11689

117-
# Create a cursor to execute SQL commands
118-
cur = conn.cursor()
11990
```
12091

12192
## Embeddings and vector store setup
@@ -124,9 +95,14 @@ cur = conn.cursor()
12495

12596
```python
12697
# rag.py
98+
from dotenv import load_dotenv
99+
import os
127100

128101
from langchain_openai import OpenAIEmbeddings
129102
from langchain_postgres import PGVector
103+
104+
# Load environment variables from .env file
105+
load_dotenv()
130106
```
131107

132108
### Configure embeddings client
@@ -181,19 +157,14 @@ By loading the metadata for all objects in your bucket, you can speed up the pro
181157
# rag.py
182158

183159
session = boto3.session.Session()
184-
client_s3 = session.client(service_name='s3', endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
185-
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
186-
aws_secret_access_key=os.getenv("SCW_SECRET_KEY", ""))
160+
client_s3 = session.client(service_name='s3',
161+
endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
162+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
163+
aws_secret_access_key=os.getenv("SCW_SECRET_KEY", ""))
187164
paginator = client_s3.get_paginator('list_objects_v2')
188165
page_iterator = paginator.paginate(Bucket=os.getenv("SCW_BUCKET_NAME", ""))
189166
```
190167

191-
In this code sample, we:
192-
- Set up a Boto3 session: we initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
193-
- Create an Amazon S3 client: we establish an Amazon client to interact with the Scaleway Object Storage service.
194-
- Set up pagination for listing objects: we prepare pagination to handle potentially large lists of objects efficiently.
195-
- Iterate through the bucket: this initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
196-
197168
### Iterate through metadata
198169

199170
Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.

0 commit comments

Comments
 (0)