Skip to content

feat(components): add pgvector indexer and retriever with pgvector-go#672

Open
ricciii0 wants to merge 4 commits intocloudwego:mainfrom
ricciii0:feat/pgvector-components
Open

feat(components): add pgvector indexer and retriever with pgvector-go#672
ricciii0 wants to merge 4 commits intocloudwego:mainfrom
ricciii0:feat/pgvector-components

Conversation

@ricciii0
Copy link
Copy Markdown

Summary

Add complete pgvector indexer and retriever implementations for Eino framework with production-ready features including SQL injection protection, type-safe vector operations, and comprehensive testing.

Changes

📦 New Components

  • components/indexer/pgvector: Store documents with vector embeddings
  • components/retriever/pgvector: Retrieve documents by vector similarity

✨ Features

Indexer

  • ✅ Batch processing (configurable batch size)
  • ✅ Type-safe vectors using pgvector-go
  • ✅ UPSERT semantics (automatic conflict resolution)
  • ✅ SQL injection prevention (identifier validation)
  • ✅ Connection pooling support (pgxpool.Pool)
  • ✅ Eino callbacks integration

Retriever

  • ✅ Multiple distance functions (cosine, L2, inner product)
  • ✅ Score threshold filtering
  • ✅ Custom WHERE clause support
  • ✅ SQL injection protection
  • ✅ Optimized single-pass query execution
  • ✅ Eino callbacks integration

🔒 Security

  • Validate all table names and identifiers
  • Quote identifiers to prevent SQL injection
  • Parameter binding for all user inputs
  • Comprehensive security tests

🧪 Testing

  • Indexer: 11 tests (all passing)
    • Basic functionality tests
    • SQL injection prevention
    • Edge cases (empty docs, nil docs, missing embedding)
    • Invalid input validation
  • Retriever: 11 tests (all passing)
    • Distance function validation
    • Score calculation tests
    • Threshold calculation tests
    • SQL injection prevention
    • Invalid input validation

📚 Documentation

  • Complete README files for both components
  • Usage examples with all features
  • Performance tuning guidelines
  • Distance function comparison guide
  • Error handling documentation

📦 Dependencies

  • github.com/cloudwego/eino v0.6.0
  • github.com/jackc/pgx/v5 v5.7.2
  • github.com/pgvector/pgvector-go v0.3.0
  • github.com/stretchr/testify v1.10.0

Test Plan

  • All 22 unit tests passing
  • SQL injection prevention validated
  • Example code tested with real PostgreSQL database
  • Connection pooling verified
  • Distance functions validated
  • Score calculation verified
  • WHERE clause filtering tested
  • Score threshold filtering tested

Example Usage

Indexer

indexer, err := pgvector.NewIndexer(ctx, &pgvector.IndexerConfig{
    Conn:      pool,
    TableName: "documents",
    Embedding: embedder,
    BatchSize: 10,
})

docs := []*schema.Document{...}
ids, err := indexer.Store(ctx, docs)

Retriever

retriever, err := pgvector.NewRetriever(ctx, &pgvector.RetrieverConfig{
    Conn:             pool,
    TableName:        "documents",
    Embedding:        embedder,
    DistanceFunction: pgvector.DistanceCosine,
    TopK:             5,
})

docs, err := retriever.Retrieve(ctx, "search query",
    pgvector.WithWhereClause("metadata->>'category' = 'tech'"),
    retriever.WithScoreThreshold(0.8),
)

Performance Notes

- Vector dimensions: float32 (50% memory reduction vs float64)
- Batch processing: 10-100 docs per batch
- Index recommendations: HNSW for similarity search
- Connection pooling: Required for concurrent access

Compatibility

- PostgreSQL: 12+
- pgvector extension: 0.5.0+
- Go: 1.23+

Checklist

- Code follows project style guidelines
- All tests passing
- Documentation updated
- No breaking changes to existing APIs
- Security review passed (SQL injection prevention)

  Add pgvector support for Eino framework with type-safe vector operations,
  SQL injection protection, and comprehensive testing.

  - Indexer: batch processing, UPSERT, connection pooling
  - Retriever: multiple distance functions, score threshold, WHERE filtering
  - Security: identifier validation, quoted identifiers
  - Testing: 22 tests covering security and functionality
  - Docs: complete README and examples
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 27, 2026

CLA assistant check
All committers have signed the CLA.

@shentongmartin shentongmartin added C-enhancement This is a PR that adds a new feature or fixes a bug. D-integration Domain: This is an issue related to 3rd party service integrations, excluding LLM providers labels Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-enhancement This is a PR that adds a new feature or fixes a bug. D-integration Domain: This is an issue related to 3rd party service integrations, excluding LLM providers

Development

Successfully merging this pull request may close these issues.

4 participants