[Bug]: The first four LLM requests are time-consuming

### Your environment

910c

### 🐛 Describe the bug

After initialization of xllm, the first four requests take a lot of time, and the tokenizer of these requests takes more than 200 milliseconds to encode and decode, respectively. The number of affected requests is related to the num_dequest_mandling_threads configuration, which defaults to 4. This way, normally only one request needs to be warm up, but xllm needs to warm up four requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: The first four LLM requests are time-consuming #530

Your environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: The first four LLM requests are time-consuming #530

Description

Your environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions