-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
Implement batch embedding generation in `rag_ingest` to significantly reduce ingestion time by grouping multiple chunks into single HTTP API calls instead of making one call per chunk.
Problem
The current implementation of `rag_ingest` generates vector embeddings one chunk at a time.
Performance Impact:
- 1000 chunks = 1000 HTTP API calls
- At ~100ms per HTTP call (typical for embedding APIs)
- Total: ~100 seconds just for embeddings
Solution
Collect chunks into batches and process multiple embeddings per HTTP call.
Performance Improvement:
- 1000 chunks with `batch_size=16` = ~63 HTTP API calls
- At ~100ms per HTTP call
- Total: ~6.3 seconds for embeddings
- Speedup: ~16x faster
Implementation Details
Changes Made
-
Added `PendingEmbedding` struct - holds chunk metadata (chunk_id, doc_id, source_id, input_text) for batched processing
-
Added `flush_embedding_batch()` function - generates embeddings for multiple chunks in a single API call and inserts all vectors into the database
-
Modified ingestion loop - collects chunks into pending batch, flushes when batch_size is reached, and flushes remaining chunks at the end
Configuration
Uses existing `batch_size` from `embedding_json` (default: 16):
```json
{
"enabled": true,
"model": "text-embedding-3-large",
"dim": 1536,
"batch_size": 16,
...
}
```
Testing Requirements
Unit Testing
-
Batch Size Boundary Tests
- Test with exactly `batch_size` chunks (should flush exactly once)
- Test with `batch_size + 1` chunks (should flush twice)
- Test with < `batch_size` chunks (should flush once at end)
-
Empty Batch Handling
- Test with 0 chunks (no embedding calls)
- Test with `enabled=false` (no pending embeddings)
-
Document Boundary Behavior
- Multiple documents with varying chunk counts
- Verify pending embeddings carry over between documents
- Verify final flush processes all remaining embeddings
Integration Testing
-
End-to-End Ingestion
- Ingest from MySQL with embeddings enabled
- Verify all chunks have corresponding vectors in `rag_vec_chunks`
- Compare vector count with chunk count (should match)
-
Performance Benchmarking
- Time ingestion with 100, 1000, 10000 chunks
- Compare before/after batching implementation
- Verify ~16x speedup with `batch_size=16`
-
API Validation
- Verify batch requests use correct format (OpenAI API: array of inputs)
- Verify responses contain correct number of embeddings
Verification SQL
```sql
-- Verify all chunks have embeddings
SELECT COUNT(*) FROM rag_chunks c
LEFT JOIN rag_vec_chunks v ON c.chunk_id = v.chunk_id
WHERE v.chunk_id IS NULL;
-- Expected: 0
```
Acceptance Criteria
- Batching implementation compiles successfully
- Unit tests pass for batch boundary conditions
- Integration test with real MySQL source succeeds
- All chunks have corresponding vectors in `rag_vec_chunks`
- Performance improvement measured and documented
- Documentation updated with batching behavior
Related
- Design doc: `RAG_POC/embeddings-design.md` (Section 11.2)
- Original PR: Add RAG ingestion with vector embeddings #5318 (RAG ingestion feature)
- Branch: `v4.0_rag_ingest_2` (contains implementation)