Skip to content

Unexpected TTFT across several instances #267

@alonsocamaro

Description

@alonsocamaro

What happened:

According to the documentation by default the seed is different for each instance because it is taken from the nanoseconds part of the time the instance is run.

I´m running 3 instances and I´m getting exactly the same TTFT for all 3 instances. More precisely, using the following promql:

histogram_quantile(0.3,
  sum by(le, instance) (
    rate(vllm:time_to_first_token_seconds_bucket[30s])
  )
)

I get the same values or with minimal differences at least. I have tried changing the percentile and the time frame but I always get the same values of TTFT across all instances. How is this possible?

What you expected to happen:

Different values in the last X seconds

How to reproduce it (as minimally and precisely as possible):

I´m running 3 instances with the following parameters:

        - args:
            - --model
            - TinyLlama/TinyLlama-1.1B-Chat-v1.0
            #- --max-model-len
            #- "2048"
            - --served-model-name=HighEndLLM
            - --port
            - "8000"
            - --mode=random
            - --time-to-first-token=5000
            - --enable-kvcache
            - --max-num-seqs=25
            - --time-factor-under-load=3
            - --inter-token-latency=100
            # only if prefill/decode dissagregation enabled
            #- --kv-cache-transfer-latency=10
            # can't be more than 30%
            - --time-to-first-token-std-dev=1500
            - --inter-token-latency-std-dev=30
            #- --kv-cache-transfer-time-std-dev=3

And sending the following workload:

ab -v 1 -n 10000 -c 200 -T application/json -p /tmp/request.json http://$m/v1/completions

Anything else we need to know?:

Thanks!

Environment:

ghcr.io/llm-d/llm-d-inference-sim:v0.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions