Skip to content

[Bug]: The first four LLM requests are time-consumingΒ #530

@magicheng0816

Description

@magicheng0816

Your environment

910c

πŸ› Describe the bug

After initialization of xllm, the first four requests take a lot of time, and the tokenizer of these requests takes more than 200 milliseconds to encode and decode, respectively. The number of affected requests is related to the num_dequest_mandling_threads configuration, which defaults to 4. This way, normally only one request needs to be warm up, but xllm needs to warm up four requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions