Skip to content

Commit d5c1d1d

Browse files
committed
[None][feat] Add TRTLLM_KVCACHE_AWARE_ROUTER_HASH_TOKENS_PER_BLOCK env var
Allow overriding the KvCacheAwareRouter tokens_per_block via environment variable. When set, the env var takes precedence over the constructor default (32) and the YAML config value. The effective value is logged at router initialization. Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
1 parent bab4fd2 commit d5c1d1d

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

tensorrt_llm/serve/router.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import asyncio
22
import heapq
3+
import os
34
from abc import ABC, abstractmethod
45
from typing import Awaitable, Callable, Dict, Iterable, List, Optional, Union
56

@@ -626,7 +627,14 @@ def __init__(self,
626627
self._tokenizers = {}
627628
# TODO: use max_num_tokens? per server?
628629
self._max_batch_size = max_batch_size
630+
env_tokens_per_block = os.environ.get(
631+
"TRTLLM_KVCACHE_AWARE_ROUTER_HASH_TOKENS_PER_BLOCK")
632+
if env_tokens_per_block is not None:
633+
tokens_per_block = int(env_tokens_per_block)
629634
self._tokens_per_block = tokens_per_block
635+
logger.info(
636+
f"KvCacheAwareRouter: tokens_per_block={self._tokens_per_block}"
637+
)
630638

631639
def _get_tokenizer(self, model: str):
632640
if model not in self._tokenizers:

0 commit comments

Comments
 (0)