Skip to content

Commit ddc56ee

Browse files
authored
Set TP argument correctly when instantiating PagedKVCacheManager (IBM#94)
#### Motivation Users are seeing runtime errors when trying to use TP>1 with speculative decoding. #### Modifications We need to set the tensor parallel argument correctly when we instantiate the PagedKVCacheManager. #### Result I have verified that this change resolves the reported issue. #### Related Issues https://huggingface.co/ibm-fms/llama3-8b-accelerator/discussions/1 Signed-off-by: Thomas Parnell <[email protected]>
1 parent e87d462 commit ddc56ee

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

server/text_generation_server/models/paged_causal_lm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,7 @@ def __init__(
327327
model_config.num_attention_heads,
328328
model_config.hidden_size,
329329
kv_heads=model_config.num_key_value_heads,
330-
tensor_parallel_size=1,
330+
tensor_parallel_size=self.engine.world_size,
331331
dtype=dtype,
332332
device=self.device,
333333
total_num_gpu_blocks=total_num_gpu_blocks,

0 commit comments

Comments
 (0)