Does trtllm-serve enables prefix caching automatically ?
I want to serve Deepseek-R1 with prefix caching enabled. I am deploying as follow:
trtllm-serve
--backend pytorch
--max_batch_size $MAX_BATCH_SIZE
--max_num_tokens $MAX_NUM_TOKENS
--max_seq_len $MAX_SEQ_LENGTH
--tp_size 8
--ep_size 4
--pp_size 1
deepseek