fix(genapi): Remove direct connection to database

fpagny · web-flow · commit 5efc63e6da8d · 2024-11-26T12:35:11.000+01:00
diff --git a/tutorials/how-to-implement-rag-generativeapis/index.mdx b/tutorials/how-to-implement-rag-generativeapis/index.mdx
@@ -40,7 +40,7 @@ Run the following command to install the required python packages:
    pip install langchain langchainhub langchain_openai langchain_community langchain_postgres unstructured "unstructured[pdf]" libmagic psycopg2 python-dotenv boto3
    ```
 
-If you are on MacOS, run the following command to install dependencies required by `unstructured` package:
+If you are on MacOS, run the following command to install dependencies required by the `unstructured` package:
    ```sh
    brew install libmagic poppler tesseract qpdf
    ```
@@ -78,44 +78,15 @@ Create a .env file and add the following variables. These will store your API ke
 
 ## Setting Up Scaleway Managed Database
 
-### Connect to your PostgreSQL database
-
-You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
-
-### Install the pgvector extension
-
-[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
-
-```sql
-   CREATE EXTENSION IF NOT EXISTS vector;
-```
-
 ### Connect to PostgreSQL programmatically
 
 Connect to your PostgreSQL instance and perform tasks programmatically.
 
  ```python
  # rag.py file
 
-from dotenv import load_dotenv
-import psycopg2
-import os
-import logging
-
-# Load environment variables
-load_dotenv()
 
-# Establish connection to PostgreSQL database using environment variables
-conn = psycopg2.connect(
-        database=os.getenv("SCW_DB_NAME"),
-        user=os.getenv("SCW_DB_USER"),
-        password=os.getenv("SCW_DB_PASSWORD"),
-        host=os.getenv("SCW_DB_HOST"),
-        port=os.getenv("SCW_DB_PORT")
-    )
 
-# Create a cursor to execute SQL commands
-cur = conn.cursor()
    ```
 
 ## Embeddings and vector store setup
@@ -124,9 +95,14 @@ cur = conn.cursor()
 
 ```python
 # rag.py
+from dotenv import load_dotenv
+import os
 
 from langchain_openai import OpenAIEmbeddings
 from langchain_postgres import PGVector
+
+# Load environment variables from .env file
+load_dotenv()
 ```
 
 ### Configure embeddings client
@@ -181,19 +157,14 @@ By loading the metadata for all objects in your bucket, you can speed up the pro
 # rag.py
 
 session = boto3.session.Session()
-client_s3 = session.client(service_name='s3', endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
-                               aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
-                               aws_secret_access_key=os.getenv("SCW_SECRET_KEY", ""))
+client_s3 = session.client(service_name='s3', 
+                           endpoint_url=os.getenv("SCW_BUCKET_ENDPOINT", ""),
+                           aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
+                           aws_secret_access_key=os.getenv("SCW_SECRET_KEY", ""))
 paginator = client_s3.get_paginator('list_objects_v2')
 page_iterator = paginator.paginate(Bucket=os.getenv("SCW_BUCKET_NAME", ""))
 ```
 
-In this code sample, we:
-- Set up a Boto3 session: we initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
-- Create an Amazon S3 client: we establish an Amazon client to interact with the Scaleway Object Storage service.
-- Set up pagination for listing objects: we prepare pagination to handle potentially large lists of objects efficiently.
-- Iterate through the bucket: this initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
-
 ### Iterate through metadata
 
 Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.