Skip to content

Commit f6f317b

Browse files
fix(embedding): riduce EMBEDDING_BATCH_SIZE a 10 per evitare errori Ollama con upload grandi
1 parent 6a94423 commit f6f317b

File tree

3 files changed

+10
-2
lines changed

3 files changed

+10
-2
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@ and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.
77

88
## [Unreleased]
99

10+
## [1.3.1] - 2025-12-04
11+
12+
### Fixed
13+
- Reduced default EMBEDDING_BATCH_SIZE from 32 to 10 to prevent Ollama "cannot decode batches" errors with large uploads
14+
- New `EMBEDDING_BATCH_SIZE` env var allows tuning for different Ollama configurations
15+
1016
## [1.3.0] - 2025-12-04
1117

1218
### Changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ docker compose up -d
9494
| `OLLAMA_MODEL` | `nomic-embed-text` | Embedding model |
9595
| `CHUNK_SIZE` | `400` | Target chunk size in tokens |
9696
| `CHUNK_MAX_TOKENS` | `1500` | Maximum chunk size (safe margin for nomic-embed-text 2048 limit) |
97+
| `EMBEDDING_BATCH_SIZE` | `10` | Chunks per embedding API call (reduce if Ollama errors) |
9798

9899
## Features
99100

lib/embedding.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
1818
EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL', 'nomic-embed-text')
1919
MAX_TOKENS = 2048 # nomic-embed-text context limit (configurable models may differ)
20+
EMBEDDING_BATCH_SIZE = int(os.getenv('EMBEDDING_BATCH_SIZE', '10')) # Reduced from 32 to avoid Ollama batch decode errors
2021

2122

2223
def get_embedding(text: str, timeout: int = 60, max_retries: int = 3) -> Optional[list[float]]:
@@ -240,7 +241,7 @@ def safe_embed_chunk(
240241
def batch_embed_chunks(
241242
chunks: list[dict],
242243
max_tokens: int = MAX_TOKENS,
243-
batch_size: int = 32
244+
batch_size: int = EMBEDDING_BATCH_SIZE
244245
) -> list[dict]:
245246
"""
246247
Embed multiple chunks using batch API for better performance.
@@ -251,7 +252,7 @@ def batch_embed_chunks(
251252
Args:
252253
chunks: List of chunk dictionaries with 'text' key
253254
max_tokens: Maximum tokens per chunk
254-
batch_size: Number of texts to embed in a single API call (default: 32)
255+
batch_size: Number of texts to embed per API call (default: EMBEDDING_BATCH_SIZE env or 10)
255256
256257
Returns:
257258
List of successfully embedded chunks (flattened if re-chunking occurred)

0 commit comments

Comments
 (0)