cloudwego · ricciii0 · Jan 27, 2026 · Jan 27, 2026 · Jan 29, 2026 · Jan 29, 2026
diff --git a/components/indexer/pgvector/README.md b/components/indexer/pgvector/README.md
@@ -0,0 +1,165 @@
+# PGVector Indexer
+
+pgvector Indexer for Eino framework - store and retrieve documents with vector embeddings in PostgreSQL using the pgvector extension.
+
+## Features
+
+- **Type-safe vector operations** using `pgvector.Vector` from official `pgvector-go` library
+- **Batch processing** for efficient embedding and storage
+- **Automatic conflict resolution** with UPSERT semantics
+- **SQL injection protection** with identifier validation
+- **Connection pooling** support via `pgxpool.Pool`
+- **Eino callbacks** integration for observability
+
+## Installation
+
+```bash
+go get github.com/cloudwego/eino-ext/components/indexer/pgvector
+```
+
+## Prerequisites
+
+1. **PostgreSQL** with pgvector extension installed
+2. **Create table** before using the indexer:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS vector;
+
+CREATE TABLE documents (
+    id TEXT PRIMARY KEY,
+    content TEXT NOT NULL,
+    embedding vector(1536),  -- adjust dimension based on your model
+    metadata JSONB
+);
+
+-- Optional: create index for vector similarity search
+CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
+-- or
+CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
+```
+
+## Usage
+
+### Basic Example
+
+```go
+import (
+    "context"
+    "github.com/jackc/pgx/v5/pgxpool"
+    "github.com/cloudwego/eino-ext/components/indexer/pgvector"
+    "github.com/cloudwego/eino/components/embedding/openai"
+)
+
+func main() {
+    ctx := context.Background()
+
+    // Create connection pool
+    pool, err := pgxpool.New(ctx, "postgres://user:pass@localhost/dbname")
+    if err != nil {
+        panic(err)
+    }
+    defer pool.Close()
+
+    // Create indexer
+    indexer, err := pgvector.NewIndexer(ctx, &pgvector.IndexerConfig{
+        Conn:      pool,
+        TableName: "documents",
+        Embedding: openai.NewEmbedder(), // or any embedding implementation
+        BatchSize: 10,
+    })
+    if err != nil {
+        panic(err)
+    }
+
+    // Store documents
+    docs := []*schema.Document{
+        {
+            ID:      "doc1",
+            Content: "Hello world",
+            MetaData: map[string]any{
+                "category": "greeting",
+            },
+        },
+        // ... more documents
+    }
+
+    ids, err := indexer.Store(ctx, docs)
+    if err != nil {
+        panic(err)
+    }
+
+    fmt.Printf("Stored %d documents\n", len(ids))
+}
+```
+
+## Configuration
+
+### IndexerConfig
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `Conn` | `PgxConn` | *required* | pgx connection or pool |
+| `TableName` | `string` | `"documents"` | Table name for storing documents |
+| `Embedding` | `embedding.Embedder` | *required for Store* | Embedding model for vectorization |
+| `BatchSize` | `int` | `10` | Batch size for embedding operations |
+
+### Table Schema
+
+The indexer expects a table with this schema:
+
+```sql
+CREATE TABLE table_name (
+    id TEXT PRIMARY KEY,
+    content TEXT NOT NULL,
+    embedding vector(N),  -- N = vector dimension
+    metadata JSONB
+);
+```
+
+## Performance Tips
+
+1. **Use connection pooling** - `pgxpool.Pool` for concurrent access
+2. **Adjust BatchSize** - Larger batches (10-100) improve throughput
+3. **Create vector indexes** - Use HNSW or IVFFlat indexes for similarity search
+4. **Tune index parameters** - Adjust `lists` for IVFFlat based on data size
+
+## Dependencies
+
+- `github.com/cloudwego/eino` - Eino framework
+- `github.com/jackc/pgx/v5` - PostgreSQL driver (v5.5.1+)
+- `github.com/pgvector/pgvector-go` - pgvector Go library (v0.3.0+)
+
+## Compatibility
+
+- **PostgreSQL**: 12+
+- **pgvector extension**: 0.5.0+
+- **Go**: 1.23+
+
+## Error Handling
+
+The indexer returns detailed errors with context:
+
+```go
+[NewIndexer] database connection not provided
+[Indexer.Store] documents list is empty
+[Indexer.Store] embedding failed: <cause>
+[Indexer.Store] batch execution failed: <cause>
+```
+
+## Testing
+
+Run tests:
+
+```bash
+go test -v ./...
+```
+
+## License
+
+Apache License 2.0
+
+## See Also
+
+- [pgvector Documentation](https://github.com/pgvector/pgvector)
+- [pgvector-go](https://github.com/pgvector/pgvector-go)
+- [Eino Framework](https://github.com/cloudwego/eino)
diff --git a/components/indexer/pgvector/consts.go b/components/indexer/pgvector/consts.go
@@ -0,0 +1,22 @@
+/*
+ * Copyright 2025 CloudWeGo Authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package pgvector
+
+const (
+	// DefaultTableName is the default table name for storing documents and vectors.
+	DefaultTableName = "documents"
+)
diff --git a/components/indexer/pgvector/examples/main.go b/components/indexer/pgvector/examples/main.go
@@ -0,0 +1,125 @@
+/*
+ * Copyright 2025 CloudWeGo Authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+	"context"
+	"fmt"
+	"log"
+
+	"github.com/cloudwego/eino-ext/components/indexer/pgvector"
+	"github.com/cloudwego/eino/components/embedding"
+	"github.com/cloudwego/eino/schema"
+	"github.com/jackc/pgx/v5/pgxpool"
+)
+
+// This example demonstrates how to use the pgvector indexer.
+// Prerequisites:
+// 1. PostgreSQL installed with pgvector extension
+// 2. Database created: CREATE DATABASE eino_example;
+// 3. Table created:
+//    CREATE EXTENSION IF NOT EXISTS vector;
+//    CREATE TABLE documents (
+//        id TEXT PRIMARY KEY,
+//        content TEXT NOT NULL,
+//        embedding vector(1536),
+//        metadata JSONB
+//    );
+// 4. Connection string matches your database setup
+
+func main() {
+	ctx := context.Background()
+
+	// Connect to PostgreSQL
+	// Update the connection string to match your database configuration
+	connString := "postgres://test_user:test_password@localhost:5433/eino_test?sslmode=disable"
+	pool, err := pgxpool.New(ctx, connString)
+	if err != nil {
+		log.Fatalf("Failed to connect to database: %v", err)
+	}
+	defer pool.Close()
+
+	// Create indexer config
+	config := &pgvector.IndexerConfig{
+		Conn:      pool,
+		TableName: "documents",
+		Embedding: &mockEmbedder{}, // In production, use real embedder
+		BatchSize: 10,
+	}
+
+	// Create indexer
+	idxr, err := pgvector.NewIndexer(ctx, config)
+	if err != nil {
+		log.Fatalf("Failed to create indexer: %v", err)
+	}
+
+	// Sample documents to index
+	docs := []*schema.Document{
+		{
+			ID:      "doc1",
+			Content: "PostgreSQL is a powerful open-source relational database.",
+			MetaData: map[string]any{
+				"category": "database",
+				"tags":     []string{"postgresql", "sql"},
+			},
+		},
+		{
+			ID:      "doc2",
+			Content: "pgvector is an extension for vector similarity search.",
+			MetaData: map[string]any{
+				"category": "database",
+				"tags":     []string{"pgvector", "extension"},
+			},
+		},
+		{
+			ID:      "doc3",
+			Content: "Machine learning models can be embedded as vectors for similarity search.",
+			MetaData: map[string]any{
+				"category": "ml",
+				"tags":     []string{"ml", "embedding", "search"},
+			},
+		},
+	}
+
+	// Store documents
+	ids, err := idxr.Store(ctx, docs)
+	if err != nil {
+		log.Fatalf("Failed to store documents: %v", err)
+	}
+
+	fmt.Printf("Successfully indexed %d documents\n", len(ids))
+	for _, id := range ids {
+		fmt.Printf("  - %s\n", id)
+	}
+}
+
+// mockEmbedder is a mock embedding implementation for demonstration.
+// In production, replace with real embedder like:
+//
+//	import "github.com/cloudwego/eino/components/embedding/openai"
+//	embedding := openai.NewEmbedder()
+type mockEmbedder struct{}
+
+func (m *mockEmbedder) EmbedStrings(ctx context.Context, texts []string, opts ...embedding.Option) ([][]float64, error) {
+	// Return mock 3-dimensional vectors for demonstration
+	// In production, your embedder should return vectors matching your model's dimensions
+	result := make([][]float64, len(texts))
+	for i := range result {
+		result[i] = []float64{0.1, 0.2, 0.3}
+	}
+	return result, nil
+}
diff --git a/components/indexer/pgvector/examples/setup.sql b/components/indexer/pgvector/examples/setup.sql
@@ -0,0 +1,19 @@
+-- Setup script for pgvector example
+-- Run this with: psql -h localhost -p 5433 -U test_user -d eino_test -f setup.sql
+
+-- Create pgvector extension
+CREATE EXTENSION IF NOT EXISTS vector;
+
+-- Create documents table
+CREATE TABLE IF NOT EXISTS documents (
+    id TEXT PRIMARY KEY,
+    content TEXT NOT NULL,
+    embedding vector(3),  -- 3 dimensions for the mock embedder
+    metadata JSONB
+);
+
+-- Create index for vector similarity search (optional but recommended)
+CREATE INDEX IF NOT EXISTS documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops);
+
+-- Verify setup
+\d documents
diff --git a/components/indexer/pgvector/go.mod b/components/indexer/pgvector/go.mod
@@ -0,0 +1,49 @@
+module github.com/cloudwego/eino-ext/components/indexer/pgvector
+
+go 1.23.0
+
+require (
+	github.com/cloudwego/eino v0.6.0
+	github.com/jackc/pgx/v5 v5.7.2
+	github.com/pgvector/pgvector-go v0.3.0
+	github.com/stretchr/testify v1.10.0
+)
+
+require (
+	github.com/bahlo/generic-list-go v0.2.0 // indirect
+	github.com/buger/jsonparser v1.1.1 // indirect
+	github.com/bytedance/gopkg v0.1.3 // indirect
+	github.com/bytedance/sonic v1.14.1 // indirect
+	github.com/bytedance/sonic/loader v0.3.0 // indirect
+	github.com/cloudwego/base64x v0.1.6 // indirect
+	github.com/davecgh/go-spew v1.1.1 // indirect
+	github.com/dustin/go-humanize v1.0.1 // indirect
+	github.com/eino-contrib/jsonschema v1.0.2 // indirect
+	github.com/goph/emperror v0.17.2 // indirect
+	github.com/jackc/pgpassfile v1.0.0 // indirect
+	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
+	github.com/jackc/puddle/v2 v2.2.2 // indirect
+	github.com/json-iterator/go v1.1.12 // indirect
+	github.com/klauspost/cpuid/v2 v2.2.9 // indirect
+	github.com/mailru/easyjson v0.7.7 // indirect
+	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
+	github.com/modern-go/reflect2 v1.0.2 // indirect
+	github.com/nikolalohinski/gonja v1.5.3 // indirect
+	github.com/pelletier/go-toml/v2 v2.0.9 // indirect
+	github.com/pkg/errors v0.9.1 // indirect
+	github.com/pmezard/go-difflib v1.0.0 // indirect
+	github.com/rogpeppe/go-internal v1.14.1 // indirect
+	github.com/sirupsen/logrus v1.9.3 // indirect
+	github.com/slongfield/pyfmt v0.0.0-20220222012616-ea85ff4c361f // indirect
+	github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
+	github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
+	github.com/x448/float16 v0.8.4 // indirect
+	github.com/yargevad/filepathx v1.0.0 // indirect
+	golang.org/x/arch v0.11.0 // indirect
+	golang.org/x/crypto v0.36.0 // indirect
+	golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 // indirect
+	golang.org/x/sync v0.12.0 // indirect
+	golang.org/x/sys v0.31.0 // indirect
+	golang.org/x/text v0.23.0 // indirect
+	gopkg.in/yaml.v3 v3.0.1 // indirect
+)