Skip to content

Commit 8704428

Browse files
authored
Merge pull request #3 from pgEdge/BR-378
BR-378 - Added note about installing Vectorizer with PEP packages
2 parents c6fb505 + 662ff08 commit 8704428

File tree

2 files changed

+29
-10
lines changed

2 files changed

+29
-10
lines changed

docs/index.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
# pgEdge Vectorizer
22

3-
pgEdge Vectorizer is a PostgreSQL extension that automatically chunks text content and generates vector embeddings using background workers. Vectorizer provides a seamless integration between your PostgreSQL database and embedding providers like OpenAI, making it easy to build AI-powered search and retrieval applications.
3+
pgEdge Vectorizer is a PostgreSQL extension that automatically chunks text
4+
content and generates vector embeddings using background workers. Vectorizer
5+
provides a seamless integration between your PostgreSQL database and embedding
6+
providers like OpenAI, making it easy to build AI-powered search and retrieval
7+
applications.
48

59
pgEdge Vectorizer:
610

711
- intelligently splits text into optimal-sized chunks.
8-
- handles embedding generation asynchronously using background workers without blocking.
12+
- handles embedding generation asynchronously using background workers
13+
without blocking.
914
- enables easy switching between OpenAI, Voyage AI, and Ollama.
1015
- processes embeddings efficiently in batches for better API usage.
1116
- automatically retries failed operations with exponential backoff.
@@ -14,14 +19,20 @@ pgEdge Vectorizer:
1419

1520
## pgEdge Vectorizer Architecture
1621

17-
pgEdge Vectorizer uses a trigger-based architecture with background workers to process text asynchronously. The following steps describe the processing flow from data insertion to embedding storage:
22+
pgEdge Vectorizer uses a trigger-based architecture with background workers to
23+
process text asynchronously. The following steps describe the processing flow
24+
from data insertion to embedding storage:
1825

1926
1. A trigger detects INSERT or UPDATE operations on the configured table.
20-
2. The chunking module splits the text into chunks using the configured strategy.
21-
3. The system inserts chunk records and queue items into the processing queue.
22-
4. Background workers pick up queue items using SKIP LOCKED for concurrent processing.
27+
2. The chunking module splits the text into chunks using the configured
28+
strategy.
29+
3. The system inserts chunk records and queue items into the processing
30+
queue.
31+
4. Background workers pick up queue items using SKIP LOCKED for concurrent
32+
processing.
2333
5. The configured provider generates embeddings via its API.
24-
6. The storage layer updates the chunk table with the generated embeddings.
34+
6. The storage layer updates the chunk table with the generated
35+
embeddings.
2536

2637

2738
## Component Diagram

docs/installation.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Installing pgEdge Vectorizer
22

3+
pgEdge Vectorizer automatically chunks text content and generates vector
4+
embeddings using background workers. You can install pgEdge Vectorizer with
5+
[pgEdge Enterprise Postgres](https://docs.pgedge.com/enterprise/) packages
6+
or build Vectorizer from source code from the
7+
[pgEdge repository](https://github.com/pgEdge/pgedge-vectorizer).
8+
39
Before installing pgEdge Vectorizer, you need to install:
410

511
* a Postgres server, version 14 or above
@@ -8,7 +14,8 @@ Before installing pgEdge Vectorizer, you need to install:
814

915
Then, to build Vectorizer:
1016

11-
Clone the [pgedge-vectorizer](https://github.com/pgEdge/pgedge-vectorizer) repository, and move into the repository root:
17+
Clone the [pgedge-vectorizer](https://github.com/pgEdge/pgedge-vectorizer)
18+
repository, and move into the repository root:
1219

1320
```bash
1421
git clone https://github.com/pgEdge/pgedge-vectorizer.git
@@ -29,14 +36,15 @@ echo "your-api-key" > ~/.pgedge-vectorizer-llm-api-key
2936
chmod 600 ~/.pgedge-vectorizer-llm-api-key
3037
```
3138

32-
Then, modify the `postgresql.conf` file, adding the Vectorizer extension and API key file details:
39+
Then, modify the `postgresql.conf` file, adding the Vectorizer extension and
40+
API key file details:
3341

3442
```ini
3543
shared_preload_libraries = 'pgedge_vectorizer'
3644
pgedge_vectorizer.provider = 'openai'
3745
pgedge_vectorizer.api_key_file = '~/.pgedge-vectorizer-llm-api-key'
3846
pgedge_vectorizer.model = 'text-embedding-3-small'
39-
pgedge_vectorizer.databases = 'mydb' # Comma-separated list of databases to monitor
47+
pgedge_vectorizer.databases = 'mydb' # Comma-separated list of monitored databases
4048
```
4149

4250
Restart PostgreSQL; then use your Postgres client to create the `vector` and `pgedge-vectorizer` extensions:

0 commit comments

Comments
 (0)