Skip to content

Over-Saturation stopping #242

@AlonKellner-RedHat

Description

@AlonKellner-RedHat

@markurtz This issue is dependent on #238.
When running a benchmark there are many reasons that could cause the server to Over-Saturate, meaning that the rate of guidellm generated requests exceeds the rate of server responses.
Usually this causes a major skew in the measured metrics, making them false and misleading.

The proposed feature is to integrate an online Over-Saturation detection algorithm into guidellm.
The algorithm will be used in 3 ways:

  1. Early stopping a benchmark which detects Over-Saturation
  2. Early stopping a sweep which detects Over-Saturation
  3. Output report indication of the detection of Over-Saturation

Over-Saturation Detection Algorithm

I have evaluated an algorithm (see internal RedHat slack for links and documents) which achieves near perfect detection, both in terms of accuracy and time waste minimization.
The algorithm basically goes like this:

  1. Wait at least 30 seconds (hyper-param, tunable)
  2. Wait for median TTFT to get larger than 2.5 seconds (hyper-param, tunable)
  3. Check the slope of concurrent requests over time
  4. Check the slope of TTFT over time
  5. If both are consistently positive (reasonable margin of error) - stop the benchmark.

Is this issue addressed by an existing feature?

Currently, Over-Saturation is partially addressed by:

  1. Throughput/sweep mode measure the max load which is achievable by a server. In practice, the current throughput mode detected RPS is very noisy and usually over-estimates it by large, therefore a sweep actually usually Over-Saturates the server in the last few constant benchmarks.
  2. Max error stopping Feat/max error rate - continued #238 will also help when an Over-Saturated server starts not responding and errors accumulate, but it takes a while for timeouts to start coming, and that time is wasted.

Implementation Discussion

It seems very natural to me to put this logic somewhere between the benchmarker.py, aggregator.py and potentially a new class:

aggregator.add_result(result)

The aggregator.py or the new class could accumulate the necessary information for calculating the slopes and margin of errors (TTFT and concurrent requests over some window of time, e.g 1m), and if an over-saturation is detected, then the benchmarker.py will send a stop signal to the scheduler, set the termination reason to be "over-saturated-server", complete the benchmark gracefully and if it is a sweep - break the profile strategy loop.
@markurtz, what do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    internalfiled by core contributor or associate

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions