|
2 | 2 |
|
3 | 3 | [](https://github.com/cocoindex-io/cocoindex) |
4 | 4 |
|
5 | | -This example demonstrates the **PostgreSQL table source** feature in CocoIndex. It reads data from existing PostgreSQL tables, generates embeddings, and stores them in a separate CocoIndex database with pgvector for semantic search. |
| 5 | +This example demonstrates how to use Postgres tables as the source for CocoIndex. |
| 6 | +It reads structured data from existing PostgreSQL tables, performs calculations, generates embeddings, and stores them in a separate CocoIndex table. |
6 | 7 |
|
7 | 8 | We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful. |
8 | 9 |
|
9 | | -## What This Example Does |
| 10 | +This example contains two flows: |
10 | 11 |
|
11 | | -### 📊 Data Flow |
12 | | -``` |
13 | | -Source PostgreSQL Table (messages) |
14 | | - ↓ [Postgres Source] |
15 | | -Text Processing & Embedding Generation |
16 | | - ↓ [SentenceTransformer] |
17 | | -CocoIndex Database (message_embeddings) with pgvector |
18 | | - ↓ [Semantic Search] |
19 | | -Query Results |
20 | | -``` |
| 12 | +1. `postgres_message_indexing_flow`: Read from a simpler table `source_messages` (single primary key), and generate embeddings for the `message` column. |
| 13 | +2. `postgres_product_indexing_flow`: Read from a more complex table `source_products` (composite primary key), compute additional fields and generates embeddings. |
21 | 14 |
|
22 | | -### 🔧 Key Features |
23 | | -- **PostgreSQL Source**: Read from existing database tables |
24 | | -- **Separate Databases**: Source data and embeddings stored in different databases |
25 | | -- **Automatic Schema**: CocoIndex creates target tables automatically |
26 | | -- **pgvector Integration**: Store embeddings for semantic search |
27 | 15 |
|
28 | 16 | ## Prerequisites |
29 | 17 |
|
30 | 18 | Before running the example, you need to: |
31 | 19 |
|
32 | | -1. **PostgreSQL with pgvector**: Follow the [CocoIndex PostgreSQL setup guide](https://cocoindex.io/docs/getting_started/quickstart) to install and configure PostgreSQL with pgvector extension. |
33 | | - |
34 | | -2. **Two databases**: You'll need two separate databases (names can be anything you choose): |
35 | | - - One database for your source table data |
36 | | - - One database for storing embeddings |
37 | | - |
38 | | -3. **Environment file**: Create a `.env` file with your database configuration: |
39 | | - ```bash |
40 | | - cp .env.example .env |
41 | | - $EDITOR .env |
42 | | - ``` |
43 | | - |
44 | | -## Installation |
45 | | - |
46 | | -Install dependencies: |
47 | | - |
48 | | -```bash |
49 | | -pip install -e . |
50 | | -``` |
51 | | - |
52 | | -## Quick Start |
53 | | - |
54 | | -### Environment Variables Explained |
55 | | - |
56 | | -The example uses these environment variables to configure the PostgreSQL source: |
57 | | - |
58 | | -- **`SOURCE_DATABASE_URL`**: Connection string to your source database containing the table you want to index |
59 | | -- **`COCOINDEX_DATABASE_URL`**: Connection string to the database where CocoIndex will store embeddings |
60 | | -- **`TABLE_NAME`**: Name of the table in your source database to read from |
61 | | -- **`INDEXING_COLUMN`**: The text column to generate embeddings for (this example focuses on one column, but you can index multiple columns) |
62 | | -- **`KEY_COLUMN_FOR_SINGLE_KEY`**: Primary key column name (for tables with single primary key) |
63 | | -- **`KEY_COLUMNS_FOR_MULTIPLE_KEYS`**: Comma-separated primary key columns (for tables with composite primary key) |
64 | | -- **`INCLUDED_COLUMNS`**: Optional - specify which columns to include (defaults to all) |
65 | | -- **`ORDINAL_COLUMN`**: Optional - use for incremental updates |
66 | | - |
67 | | -### Option A: Test with Sample Data (Recommended for first-time users) |
68 | | - |
69 | | -1. **Setup test database with sample data**: |
70 | | - ```bash |
71 | | - python setup_test_database.py |
72 | | - ``` |
73 | | - This will create both `test_simple` (single primary key) and `test_multiple` (composite primary key) tables with sample data. |
74 | | - |
75 | | -2. **Copy the generated environment configuration** to your `.env` file (the script will show you exactly what to copy). |
76 | | - |
77 | | -3. **Run the example**: |
78 | | - ```bash |
79 | | - python main.py |
80 | | - ``` |
| 20 | +1. Install dependencies: |
81 | 21 |
|
82 | | -4. **Test semantic search** by entering queries in the interactive prompt |
| 22 | + ```bash |
| 23 | + pip install -e . |
| 24 | + ``` |
83 | 25 |
|
84 | | -### Option B: Use Your Existing Database |
| 26 | +2. Follow the [CocoIndex PostgreSQL setup guide](https://cocoindex.io/docs/getting_started/quickstart) to install and configure PostgreSQL with pgvector extension. |
85 | 27 |
|
86 | | -1. **Update your `.env` file** with your database URLs and table configuration: |
87 | | - ```env |
88 | | - # CocoIndex Database (for storing embeddings) |
89 | | - COCOINDEX_DATABASE_URL=postgresql://username:password@localhost:5432/cocoindex |
| 28 | +3. Create source tables `source_messages` and `source_products` with sample data: |
90 | 29 |
|
91 | | - # Source Database (for reading data) |
92 | | - SOURCE_DATABASE_URL=postgresql://username:password@localhost:5432/your_source_db |
| 30 | + ```bash |
| 31 | + $ psql "postgres://cocoindex:cocoindex@localhost/cocoindex" -f ./prepare_source_data.sql |
| 32 | + ``` |
93 | 33 |
|
94 | | - # Table Configuration |
95 | | - TABLE_NAME=your_table_name |
96 | | - KEY_COLUMN_FOR_SINGLE_KEY=id # or KEY_COLUMNS_FOR_MULTIPLE_KEYS=col1,col2 |
97 | | - INDEXING_COLUMN=your_text_column |
98 | | - ORDINAL_COLUMN=your_timestamp_column # optional |
99 | | - ``` |
| 34 | + For simplicity, we use the same database for source and target. You can also setup a separate Postgres database to use as the source database. |
| 35 | + Remember to update the `SOURCE_DATABASE_URL` in `.env` file if you use a separate database. |
100 | 36 |
|
101 | | -2. **Run the example**: |
102 | | - ```bash |
103 | | - python main.py |
104 | | - ``` |
| 37 | +## Run |
105 | 38 |
|
106 | | -## How It Works |
| 39 | +Update index, which will also setup the tables at the first time: |
107 | 40 |
|
108 | | -The example demonstrates a simple flow: |
109 | | - |
110 | | -1. **Read from Source**: Uses `cocoindex.sources.Postgres` to read from your existing table |
111 | | -2. **Generate Embeddings**: Processes text and creates embeddings using SentenceTransformers |
112 | | -3. **Store Embeddings**: Exports to the CocoIndex database with automatic table creation |
113 | | -4. **Search**: Provides interactive semantic search over the stored embeddings |
114 | | - |
115 | | -**Note**: This example indexes one text column for simplicity, but you can modify the flow to index multiple columns or add more complex transformations. |
116 | | - |
117 | | -### Key Benefits |
118 | | - |
119 | | -- **Separate Databases**: Keep your source data separate from embeddings |
120 | | -- **Automatic Setup**: CocoIndex creates target tables automatically |
121 | | -- **Real-time Updates**: Live updates as source data changes |
122 | | -- **Interactive Search**: Built-in search interface for testing |
123 | | - |
124 | | -## Database Configuration |
125 | | - |
126 | | -The example uses two separate databases: |
127 | | - |
128 | | -1. **Source Database**: Contains your existing data table |
129 | | -2. **CocoIndex Database**: Stores generated embeddings with pgvector support |
130 | | - |
131 | | -This separation allows you to: |
132 | | -- Keep your production data unchanged |
133 | | -- Scale embeddings independently |
134 | | -- Use different database configurations for each purpose |
135 | | - |
136 | | -## Advanced Usage |
137 | | - |
138 | | -### Primary Key Configuration |
139 | | - |
140 | | -**Single Primary Key**: |
141 | | -```env |
142 | | -KEY_COLUMN_FOR_SINGLE_KEY=id |
143 | | -``` |
144 | | - |
145 | | -**Composite Primary Key**: |
146 | | -```env |
147 | | -KEY_COLUMNS_FOR_MULTIPLE_KEYS=product_category,product_name |
| 41 | +```bash |
| 42 | +cocoindex update --setup main.py |
148 | 43 | ``` |
149 | 44 |
|
150 | | - |
151 | | - |
152 | 45 | ## CocoInsight |
153 | 46 | CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9). |
154 | 47 |
|
|
0 commit comments