Your environment
910c
π Describe the bug
After initialization of xllm, the first four requests take a lot of time, and the tokenizer of these requests takes more than 200 milliseconds to encode and decode, respectively. The number of affected requests is related to the num_dequest_mandling_threads configuration, which defaults to 4. This way, normally only one request needs to be warm up, but xllm needs to warm up four requests.