Skip to content

Commit 993387e

Browse files
committed
example: finalize the example and README
1 parent af5c342 commit 993387e

File tree

4 files changed

+30
-136
lines changed

4 files changed

+30
-136
lines changed

examples/paper_metadata/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/c
2929

3030
1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
3131

32-
2. dependencies:
32+
2. Install dependencies:
3333

3434
```bash
3535
pip install -e .

examples/postgres_embedding/.env

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Database Configuration
2-
# CocoIndex Database (for storing embeddings)
2+
3+
# CocoIndex Database, for CocoIndex internal storage and target
34
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
45

5-
# Source Database (for reading data - can be different from CocoIndex DB)
6+
# Source Database, for data source
67
SOURCE_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex

examples/postgres_embedding/README.md

Lines changed: 20 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -2,153 +2,46 @@
22

33
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
44

5-
This example demonstrates the **PostgreSQL table source** feature in CocoIndex. It reads data from existing PostgreSQL tables, generates embeddings, and stores them in a separate CocoIndex database with pgvector for semantic search.
5+
This example demonstrates how to use Postgres tables as the source for CocoIndex.
6+
It reads structured data from existing PostgreSQL tables, performs calculations, generates embeddings, and stores them in a separate CocoIndex table.
67

78
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
89

9-
## What This Example Does
10+
This example contains two flows:
1011

11-
### 📊 Data Flow
12-
```
13-
Source PostgreSQL Table (messages)
14-
↓ [Postgres Source]
15-
Text Processing & Embedding Generation
16-
↓ [SentenceTransformer]
17-
CocoIndex Database (message_embeddings) with pgvector
18-
↓ [Semantic Search]
19-
Query Results
20-
```
12+
1. `postgres_message_indexing_flow`: Read from a simpler table `source_messages` (single primary key), and generate embeddings for the `message` column.
13+
2. `postgres_product_indexing_flow`: Read from a more complex table `source_products` (composite primary key), compute additional fields and generates embeddings.
2114

22-
### 🔧 Key Features
23-
- **PostgreSQL Source**: Read from existing database tables
24-
- **Separate Databases**: Source data and embeddings stored in different databases
25-
- **Automatic Schema**: CocoIndex creates target tables automatically
26-
- **pgvector Integration**: Store embeddings for semantic search
2715

2816
## Prerequisites
2917

3018
Before running the example, you need to:
3119

32-
1. **PostgreSQL with pgvector**: Follow the [CocoIndex PostgreSQL setup guide](https://cocoindex.io/docs/getting_started/quickstart) to install and configure PostgreSQL with pgvector extension.
33-
34-
2. **Two databases**: You'll need two separate databases (names can be anything you choose):
35-
- One database for your source table data
36-
- One database for storing embeddings
37-
38-
3. **Environment file**: Create a `.env` file with your database configuration:
39-
```bash
40-
cp .env.example .env
41-
$EDITOR .env
42-
```
43-
44-
## Installation
45-
46-
Install dependencies:
47-
48-
```bash
49-
pip install -e .
50-
```
51-
52-
## Quick Start
53-
54-
### Environment Variables Explained
55-
56-
The example uses these environment variables to configure the PostgreSQL source:
57-
58-
- **`SOURCE_DATABASE_URL`**: Connection string to your source database containing the table you want to index
59-
- **`COCOINDEX_DATABASE_URL`**: Connection string to the database where CocoIndex will store embeddings
60-
- **`TABLE_NAME`**: Name of the table in your source database to read from
61-
- **`INDEXING_COLUMN`**: The text column to generate embeddings for (this example focuses on one column, but you can index multiple columns)
62-
- **`KEY_COLUMN_FOR_SINGLE_KEY`**: Primary key column name (for tables with single primary key)
63-
- **`KEY_COLUMNS_FOR_MULTIPLE_KEYS`**: Comma-separated primary key columns (for tables with composite primary key)
64-
- **`INCLUDED_COLUMNS`**: Optional - specify which columns to include (defaults to all)
65-
- **`ORDINAL_COLUMN`**: Optional - use for incremental updates
66-
67-
### Option A: Test with Sample Data (Recommended for first-time users)
68-
69-
1. **Setup test database with sample data**:
70-
```bash
71-
python setup_test_database.py
72-
```
73-
This will create both `test_simple` (single primary key) and `test_multiple` (composite primary key) tables with sample data.
74-
75-
2. **Copy the generated environment configuration** to your `.env` file (the script will show you exactly what to copy).
76-
77-
3. **Run the example**:
78-
```bash
79-
python main.py
80-
```
20+
1. Install dependencies:
8121

82-
4. **Test semantic search** by entering queries in the interactive prompt
22+
```bash
23+
pip install -e .
24+
```
8325

84-
### Option B: Use Your Existing Database
26+
2. Follow the [CocoIndex PostgreSQL setup guide](https://cocoindex.io/docs/getting_started/quickstart) to install and configure PostgreSQL with pgvector extension.
8527

86-
1. **Update your `.env` file** with your database URLs and table configuration:
87-
```env
88-
# CocoIndex Database (for storing embeddings)
89-
COCOINDEX_DATABASE_URL=postgresql://username:password@localhost:5432/cocoindex
28+
3. Create source tables `source_messages` and `source_products` with sample data:
9029

91-
# Source Database (for reading data)
92-
SOURCE_DATABASE_URL=postgresql://username:password@localhost:5432/your_source_db
30+
```bash
31+
$ psql "postgres://cocoindex:cocoindex@localhost/cocoindex" -f ./prepare_source_data.sql
32+
```
9333

94-
# Table Configuration
95-
TABLE_NAME=your_table_name
96-
KEY_COLUMN_FOR_SINGLE_KEY=id # or KEY_COLUMNS_FOR_MULTIPLE_KEYS=col1,col2
97-
INDEXING_COLUMN=your_text_column
98-
ORDINAL_COLUMN=your_timestamp_column # optional
99-
```
34+
For simplicity, we use the same database for source and target. You can also setup a separate Postgres database to use as the source database.
35+
Remember to update the `SOURCE_DATABASE_URL` in `.env` file if you use a separate database.
10036

101-
2. **Run the example**:
102-
```bash
103-
python main.py
104-
```
37+
## Run
10538

106-
## How It Works
39+
Update index, which will also setup the tables at the first time:
10740

108-
The example demonstrates a simple flow:
109-
110-
1. **Read from Source**: Uses `cocoindex.sources.Postgres` to read from your existing table
111-
2. **Generate Embeddings**: Processes text and creates embeddings using SentenceTransformers
112-
3. **Store Embeddings**: Exports to the CocoIndex database with automatic table creation
113-
4. **Search**: Provides interactive semantic search over the stored embeddings
114-
115-
**Note**: This example indexes one text column for simplicity, but you can modify the flow to index multiple columns or add more complex transformations.
116-
117-
### Key Benefits
118-
119-
- **Separate Databases**: Keep your source data separate from embeddings
120-
- **Automatic Setup**: CocoIndex creates target tables automatically
121-
- **Real-time Updates**: Live updates as source data changes
122-
- **Interactive Search**: Built-in search interface for testing
123-
124-
## Database Configuration
125-
126-
The example uses two separate databases:
127-
128-
1. **Source Database**: Contains your existing data table
129-
2. **CocoIndex Database**: Stores generated embeddings with pgvector support
130-
131-
This separation allows you to:
132-
- Keep your production data unchanged
133-
- Scale embeddings independently
134-
- Use different database configurations for each purpose
135-
136-
## Advanced Usage
137-
138-
### Primary Key Configuration
139-
140-
**Single Primary Key**:
141-
```env
142-
KEY_COLUMN_FOR_SINGLE_KEY=id
143-
```
144-
145-
**Composite Primary Key**:
146-
```env
147-
KEY_COLUMNS_FOR_MULTIPLE_KEYS=product_category,product_name
41+
```bash
42+
cocoindex update --setup main.py
14843
```
14944

150-
151-
15245
## CocoInsight
15346
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
15447

examples/postgres_embedding/main.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
import os
33

44

5-
@cocoindex.flow_def(name="PostgresMessageEmbedding")
6-
def postgres_message_embedding_flow(
5+
@cocoindex.flow_def(name="PostgresMessageIndexing")
6+
def postgres_message_indexing_flow(
77
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
88
) -> None:
99
"""
@@ -42,7 +42,7 @@ def postgres_message_embedding_flow(
4242
)
4343

4444
message_embeddings.export(
45-
"message_embeddings",
45+
"output",
4646
cocoindex.targets.Postgres(),
4747
primary_key_fields=["id"],
4848
vector_indexes=[
@@ -71,8 +71,8 @@ def make_full_description(
7171
return f"Category: {category}\nName: {name}\n\n{description}"
7272

7373

74-
@cocoindex.flow_def(name="PostgresProductEmbedding")
75-
def postgres_product_embedding_flow(
74+
@cocoindex.flow_def(name="PostgresProductIndexing")
75+
def postgres_product_indexing_flow(
7676
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
7777
) -> None:
7878
"""
@@ -122,7 +122,7 @@ def postgres_product_embedding_flow(
122122
)
123123

124124
product_embeddings.export(
125-
"product_embeddings",
125+
"output",
126126
cocoindex.targets.Postgres(),
127127
primary_key_fields=["product_category", "product_name"],
128128
vector_indexes=[

0 commit comments

Comments
 (0)