You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are on MacOS, run the following command to install dependencies required by `unstructured` package:
43
+
If you are on MacOS, run the following command to install dependencies required by the `unstructured` package:
44
44
```sh
45
45
brew install libmagic poppler tesseract qpdf
46
46
```
@@ -78,44 +78,15 @@ Create a .env file and add the following variables. These will store your API ke
78
78
79
79
## Setting Up Scaleway Managed Database
80
80
81
-
### Connect to your PostgreSQL database
82
-
83
-
You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
84
-
85
-
### Install the pgvector extension
86
-
87
-
[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
88
-
89
-
```sql
90
-
CREATE EXTENSION IF NOT EXISTS vector;
91
-
```
92
-
93
81
### Connect to PostgreSQL programmatically
94
82
95
83
Connect to your PostgreSQL instance and perform tasks programmatically.
96
84
97
85
```python
98
86
# rag.py file
99
87
100
-
from dotenv import load_dotenv
101
-
import psycopg2
102
-
import os
103
-
import logging
104
-
105
-
# Load environment variables
106
-
load_dotenv()
107
88
108
-
# Establish connection to PostgreSQL database using environment variables
109
-
conn = psycopg2.connect(
110
-
database=os.getenv("SCW_DB_NAME"),
111
-
user=os.getenv("SCW_DB_USER"),
112
-
password=os.getenv("SCW_DB_PASSWORD"),
113
-
host=os.getenv("SCW_DB_HOST"),
114
-
port=os.getenv("SCW_DB_PORT")
115
-
)
116
89
117
-
# Create a cursor to execute SQL commands
118
-
cur = conn.cursor()
119
90
```
120
91
121
92
## Embeddings and vector store setup
@@ -124,9 +95,14 @@ cur = conn.cursor()
124
95
125
96
```python
126
97
# rag.py
98
+
from dotenv import load_dotenv
99
+
import os
127
100
128
101
from langchain_openai import OpenAIEmbeddings
129
102
from langchain_postgres import PGVector
103
+
104
+
# Load environment variables from .env file
105
+
load_dotenv()
130
106
```
131
107
132
108
### Configure embeddings client
@@ -181,19 +157,14 @@ By loading the metadata for all objects in your bucket, you can speed up the pro
- Set up a Boto3 session: we initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests.
193
-
- Create an Amazon S3 client: we establish an Amazon client to interact with the Scaleway Object Storage service.
194
-
- Set up pagination for listing objects: we prepare pagination to handle potentially large lists of objects efficiently.
195
-
- Iterate through the bucket: this initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
196
-
197
168
### Iterate through metadata
198
169
199
170
Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
0 commit comments