Understanding the intuition behind `request-rate`

I have conducted an analysis of the `request-rate` and `interval` variables in the `benchmarking_script.py` and would like to ensure that my understanding is correct.

My understanding is that the `request-rate` parameter introduces delays between each request to mimic the `queries-per-second` (`QPS`) rate. For example, if it is set to 5, then 5 requests are sent within a 1-second window for a sufficiently large number of samples.

That being said, the delay implementation generates random delay values with slightly high variance, but the average is fairly consistent:

[`interval = np.random.exponential(1.0 / request_rate)`](https://github.com/google/JetStream/blob/main/benchmarks/benchmark_serving.py#L351)


<img width="856" alt="image" src="https://github.com/user-attachments/assets/4ec1e285-66cb-47ec-9bb1-18dfb215226d">

*The graph of `exponential distribution` from which `1/request-rate` is being sampled*

___

When I plot the `interval` values for a given request-rate (e.g., 5), I get the following plot after running it 1000 times:

<img width="871" alt="image" src="https://github.com/user-attachments/assets/04764399-23f8-4f53-83d4-482b75e7848a">

With these statistics:

```
Mean: 0.19030281331325313
Variance: 0.03450950095960781
Standard Deviation: 0.18576733017300917
Minimum: 6.344257769502491e-05
Maximum: 1.2152877332855887
Sum: 190.30281331325313
```

Given that the mean is around 0.2, the overall QPS is 5 requests per second (since 5 × 0.2 = 1 second).

___

Here are the statistics for `request-rate = 10`:

<img width="898" alt="image" src="https://github.com/user-attachments/assets/ebf3a129-9a05-46e0-b8ba-ee1508aaa537">

```
**Mean: 0.0983383984114717**
Variance: 0.008309511474488432
Standard Deviation: 0.0911565218428634
Minimum: 3.9151506088517006e-05
Maximum: 0.6463796229426464
Sum: 98.3383984114717
```

___

In conclusion, the `request-rate` parameter effectively mimics the QPS (queries per second) metric when the number of samples is large enough. 

I would like to confirm that my understanding is correct and document this in the issues section for anyone else who might have the same question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding the intuition behind `request-rate` #137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding the intuition behind request-rate #137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Understanding the intuition behind `request-rate` #137