File tree Expand file tree Collapse file tree 3 files changed +10
-2
lines changed
Expand file tree Collapse file tree 3 files changed +10
-2
lines changed Original file line number Diff line number Diff line change @@ -7,6 +7,12 @@ and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.
77
88## [ Unreleased]
99
10+ ## [ 1.3.1] - 2025-12-04
11+
12+ ### Fixed
13+ - Reduced default EMBEDDING_BATCH_SIZE from 32 to 10 to prevent Ollama "cannot decode batches" errors with large uploads
14+ - New ` EMBEDDING_BATCH_SIZE ` env var allows tuning for different Ollama configurations
15+
1016## [ 1.3.0] - 2025-12-04
1117
1218### Changed
Original file line number Diff line number Diff line change @@ -94,6 +94,7 @@ docker compose up -d
9494| ` OLLAMA_MODEL ` | ` nomic-embed-text ` | Embedding model |
9595| ` CHUNK_SIZE ` | ` 400 ` | Target chunk size in tokens |
9696| ` CHUNK_MAX_TOKENS ` | ` 1500 ` | Maximum chunk size (safe margin for nomic-embed-text 2048 limit) |
97+ | ` EMBEDDING_BATCH_SIZE ` | ` 10 ` | Chunks per embedding API call (reduce if Ollama errors) |
9798
9899## Features
99100
Original file line number Diff line number Diff line change 1717OLLAMA_URL = os .getenv ('OLLAMA_URL' , 'http://localhost:11434' )
1818EMBEDDING_MODEL = os .getenv ('EMBEDDING_MODEL' , 'nomic-embed-text' )
1919MAX_TOKENS = 2048 # nomic-embed-text context limit (configurable models may differ)
20+ EMBEDDING_BATCH_SIZE = int (os .getenv ('EMBEDDING_BATCH_SIZE' , '10' )) # Reduced from 32 to avoid Ollama batch decode errors
2021
2122
2223def get_embedding (text : str , timeout : int = 60 , max_retries : int = 3 ) -> Optional [list [float ]]:
@@ -240,7 +241,7 @@ def safe_embed_chunk(
240241def batch_embed_chunks (
241242 chunks : list [dict ],
242243 max_tokens : int = MAX_TOKENS ,
243- batch_size : int = 32
244+ batch_size : int = EMBEDDING_BATCH_SIZE
244245) -> list [dict ]:
245246 """
246247 Embed multiple chunks using batch API for better performance.
@@ -251,7 +252,7 @@ def batch_embed_chunks(
251252 Args:
252253 chunks: List of chunk dictionaries with 'text' key
253254 max_tokens: Maximum tokens per chunk
254- batch_size: Number of texts to embed in a single API call (default: 32 )
255+ batch_size: Number of texts to embed per API call (default: EMBEDDING_BATCH_SIZE env or 10 )
255256
256257 Returns:
257258 List of successfully embedded chunks (flattened if re-chunking occurred)
You can’t perform that action at this time.
0 commit comments