Skip to content

Commit f091ad5

Browse files
authored
Log number of KVCacheManager blocks at init (#87)
#### Motivation Users are encountering problems running out of blocks on GPUs with less than 80GB memory. #### Modifications This PR simply adds a print out of the number of free blocks at start-up time. #### Result This will help us debug the issue with users, e.g., we could suggest to them to change the environment variable `KV_CACHE_MANAGER_NUM_GPU_BLOCKS` to manually increase the number of blocks, but we need to first know what they are starting from. #### Related Issues https://huggingface.co/ibm-fms/granite-7b-lab-accelerator/discussions/1 Signed-off-by: Thomas Parnell <[email protected]>
1 parent deb99f6 commit f091ad5

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

server/text_generation_server/models/paged_causal_lm.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,9 @@ def __init__(
333333
total_num_gpu_blocks=total_num_gpu_blocks,
334334
)
335335

336+
# log number of free blocks at init
337+
print("[PagedKVCacheManager] number of free blocks: %d" % (len(self.kv_cache_manager.free_blocks)))
338+
336339
@property
337340
def batch_type(self) -> Type[PagedCausalLMBatch]:
338341
return self._batch_type

0 commit comments

Comments
 (0)