You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Will be used to store embeddings of your proprietary data
@@ -83,13 +89,6 @@ You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/doc
83
89
```sql
84
90
CREATE EXTENSION IF NOT EXISTS vector;
85
91
```
86
-
### Create a table to track processed documents
87
-
88
-
To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization:
89
-
90
-
```sql
91
-
CREATETABLEIF NOT EXISTS object_loaded (id SERIALPRIMARY KEY, object_key TEXT);
92
-
```
93
92
94
93
### Connect to PostgreSQL programmatically
95
94
@@ -101,6 +100,7 @@ Connect to your PostgreSQL instance and perform tasks programmatically.
101
100
from dotenv import load_dotenv
102
101
import psycopg2
103
102
import os
103
+
import logging
104
104
105
105
# Load environment variables
106
106
load_dotenv()
@@ -129,30 +129,24 @@ from langchain_openai import OpenAIEmbeddings
129
129
from langchain_postgres import PGVector
130
130
```
131
131
132
-
### Configure OpenAI Embeddings
132
+
### Configure embeddings client
133
133
134
-
We will use the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
134
+
Configure [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain to use your API Secret Key, Generative APIs Endpoint URL and a supported model (`bge-multilingual-gemma2` in our example).
-`openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, hosted by Scaleway’s Generative APIs.
149
-
-`openai_api_base`: This is the base URL that points Scaleway Generative APIs where the embedding model is hosted. This URL serves as the entry point to make API calls for generating embeddings.
150
-
-`model="sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems.
151
-
-`tiktoken_enabled=False`: This parameter disables the use of TikToken for tokenization within the embeddings process.
147
+
### Configure vector store client
152
148
153
-
### Create a pgvector store
154
-
155
-
Configure the connection string for your PostgreSQL instance and create a pgvector store to store these embeddings.
149
+
Configure connection to your PostgreSQL instance storing vectors.
156
150
157
151
```python
158
152
# rag.py
@@ -189,7 +183,7 @@ By loading the metadata for all objects in your bucket, you can speed up the pro
@@ -354,4 +348,4 @@ Furthermore, you can continually enhance your RAG system by implementing mechani
354
348
355
349
By integrating Scaleway Object Storage, Managed Database for PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently.
356
350
357
-
With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit.
351
+
With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit.
0 commit comments