-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Description
The DocumentEmbedder.embed_from_directory() method in src/rag/rag.py processes documents sequentially, which is slow when embedding large document collections. We should implement multi-threading to parallelize file processing, embedding generation, and database insertion operations to significantly improve performance.
Proposed Solution
Add multi-threading support to the DocumentEmbedder class to process multiple files concurrently. This would involve:
- Using a thread pool to process files in parallel
- Batching embeddings and database insertions efficiently
- Maintaining thread safety for database operations
Related Code
src/rag/rag.py-DocumentEmbedderclass (lines 328-464)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels