Skip to content

Commit fec8763

Browse files
committed
Reduce embedding encode batch size to prevent GPU OOM
The hardcoded batch_size=256 caused CUDA OOM on the 8GB Vast.ai GPU when the backend already has models loaded for serving.
1 parent b48d733 commit fec8763

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/lean_explore/util/embedding_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def _encode():
7878
encode_kwargs = {
7979
"show_progress_bar": False,
8080
"convert_to_numpy": True,
81-
"batch_size": 256, # Larger batches for GPU utilization
81+
"batch_size": 32,
8282
}
8383
if is_query:
8484
encode_kwargs["prompt_name"] = "query"

0 commit comments

Comments
 (0)