Improve time delays calculation based on load

Start work on this after implementing issue #157 

**What would you like to be added**:
Request processing time is calculated when request arrives (for non-streaming requests) and uses without adjusting during request "processing"
For example request arrives when no other requests are in process, it processing time is calculated by `ttft + inter-token-latency*num-of-tokens`. If during the "processing time" more request arrived - the inter-token-latency should become higher.
Convert all sleep commands to "active wait", means wait time for each token independently, for each token get timeout to be used which is based on the current load.

<img width="907" height="411" alt="Image" src="https://github.com/user-attachments/assets/afb6aea9-95d7-4e2b-9119-abd2695fc43f" />
 

**Why is this needed**:
To mimic vLLM behavior in more realistic way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve time delays calculation based on load #159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve time delays calculation based on load #159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions