Skip to content

Feature: Batch Embedding Generation for RAG Ingestion #5320

@renecannao

Description

@renecannao

Summary

Implement batch embedding generation in `rag_ingest` to significantly reduce ingestion time by grouping multiple chunks into single HTTP API calls instead of making one call per chunk.

Problem

The current implementation of `rag_ingest` generates vector embeddings one chunk at a time.

Performance Impact:

  • 1000 chunks = 1000 HTTP API calls
  • At ~100ms per HTTP call (typical for embedding APIs)
  • Total: ~100 seconds just for embeddings

Solution

Collect chunks into batches and process multiple embeddings per HTTP call.

Performance Improvement:

  • 1000 chunks with `batch_size=16` = ~63 HTTP API calls
  • At ~100ms per HTTP call
  • Total: ~6.3 seconds for embeddings
  • Speedup: ~16x faster

Implementation Details

Changes Made

  1. Added `PendingEmbedding` struct - holds chunk metadata (chunk_id, doc_id, source_id, input_text) for batched processing

  2. Added `flush_embedding_batch()` function - generates embeddings for multiple chunks in a single API call and inserts all vectors into the database

  3. Modified ingestion loop - collects chunks into pending batch, flushes when batch_size is reached, and flushes remaining chunks at the end

Configuration

Uses existing `batch_size` from `embedding_json` (default: 16):

```json
{
"enabled": true,
"model": "text-embedding-3-large",
"dim": 1536,
"batch_size": 16,
...
}
```

Testing Requirements

Unit Testing

  • Batch Size Boundary Tests

    • Test with exactly `batch_size` chunks (should flush exactly once)
    • Test with `batch_size + 1` chunks (should flush twice)
    • Test with < `batch_size` chunks (should flush once at end)
  • Empty Batch Handling

    • Test with 0 chunks (no embedding calls)
    • Test with `enabled=false` (no pending embeddings)
  • Document Boundary Behavior

    • Multiple documents with varying chunk counts
    • Verify pending embeddings carry over between documents
    • Verify final flush processes all remaining embeddings

Integration Testing

  • End-to-End Ingestion

    • Ingest from MySQL with embeddings enabled
    • Verify all chunks have corresponding vectors in `rag_vec_chunks`
    • Compare vector count with chunk count (should match)
  • Performance Benchmarking

    • Time ingestion with 100, 1000, 10000 chunks
    • Compare before/after batching implementation
    • Verify ~16x speedup with `batch_size=16`
  • API Validation

    • Verify batch requests use correct format (OpenAI API: array of inputs)
    • Verify responses contain correct number of embeddings

Verification SQL

```sql
-- Verify all chunks have embeddings
SELECT COUNT(*) FROM rag_chunks c
LEFT JOIN rag_vec_chunks v ON c.chunk_id = v.chunk_id
WHERE v.chunk_id IS NULL;
-- Expected: 0
```

Acceptance Criteria

  • Batching implementation compiles successfully
  • Unit tests pass for batch boundary conditions
  • Integration test with real MySQL source succeeds
  • All chunks have corresponding vectors in `rag_vec_chunks`
  • Performance improvement measured and documented
  • Documentation updated with batching behavior

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions