Skip to content

Commit ef7612b

Browse files
authored
fix: Always set word embeddings (IBM#41)
A previous OOM fix mistakenly left model.word_embeddings always unset when the prompt cache was disabled. This causes inference without a prompt cache to fail. Our tests always set up a prompt cache so they did not catch this case. Fix: Always invoke self._setup_prompt_encoder() in model.py again. Result: Inference without a prompt cache works. ----- Signed-off-by: Joe Runde <[email protected]>
1 parent 7a97fd6 commit ef7612b

File tree

1 file changed

+4
-3
lines changed
  • server/text_generation_server/models

1 file changed

+4
-3
lines changed

server/text_generation_server/models/model.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,11 @@ def __init__(self, engine: BaseInferenceEngine, dtype: torch.dtype, max_seq_leng
4545
# Check whether model supports position_ids
4646
self.use_position_ids = "position_ids" in inspect.signature(self.model.forward).parameters
4747

48-
# Short-circuit: Don't set up the prompt encoder if the prompt cache is not set
49-
prompt_prefix_supported = self.prompt_cache_set() and self._setup_prompt_encoder()
48+
# 🌶️🌶️🌶️ self._setup_prompt_encoder must be called even if the prompt cache is not used.
49+
# A required side-effect is that it sets self.word_embeddings.
50+
prompt_prefix_supported = self._setup_prompt_encoder()
5051

51-
if prompt_prefix_supported:
52+
if prompt_prefix_supported and self.prompt_cache_set():
5253
# Set up prefix cache
5354

5455
if max_seq_length is None:

0 commit comments

Comments
 (0)