Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions tests/results/dp-perf/2.2.0/2.2.0-oss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Results

## Test environment

NGINX Plus: false

NGINX Gateway Fabric:

- Commit: 9fbef714ea22a35c4f1a8c97bd5b4e406ae0c1e9
- Date: 2025-10-21T10:57:37Z
- Dirty: false

GKE Cluster:

- Node count: 12
- k8s version: v1.33.5-gke.1080000
- vCPUs per node: 16
- RAM per node: 65851524Ki
- Max pods per node: 110
- Zone: us-west1-b
- Instance Type: n2d-standard-16

## Summary:

- 4 out of 5 tests showed slight latency increases, consistent with the trend noted in the 2.1.0 summary
- The latency differences are minimal overall, with most changes under 1%.
- The POST method routing increase of ~2.2% is the most significant change, though still relatively small in absolute terms (~21µs).
- All tests maintained 100% success rates with similar throughput (~1000 req/s), indicating that the slight latency variations are likely within normal performance variance.

## Test1: Running latte path based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 925.889µs
Latencies [min, mean, 50, 90, 95, 99, max] 681.943µs, 926.463µs, 901.993µs, 1.011ms, 1.053ms, 1.244ms, 30.638ms
Bytes In [total, mean] 4770000, 159.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test2: Running coffee header based routing

```text
Requests [total, rate, throughput] 30000, 1000.01, 999.98
Duration [total, attack, wait] 30.001s, 30s, 905.82µs
Latencies [min, mean, 50, 90, 95, 99, max] 733.55µs, 951.898µs, 926.202µs, 1.037ms, 1.082ms, 1.248ms, 24.506ms
Bytes In [total, mean] 4800000, 160.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test3: Running coffee query based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 885.866µs
Latencies [min, mean, 50, 90, 95, 99, max] 742.259µs, 965.539µs, 933.535µs, 1.04ms, 1.087ms, 1.345ms, 26.261ms
Bytes In [total, mean] 5040000, 168.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test4: Running tea GET method based routing

```text
Requests [total, rate, throughput] 30000, 1000.01, 999.98
Duration [total, attack, wait] 30.001s, 30s, 879.736µs
Latencies [min, mean, 50, 90, 95, 99, max] 732.423µs, 938.723µs, 917.416µs, 1.022ms, 1.066ms, 1.241ms, 21.039ms
Bytes In [total, mean] 4710000, 157.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test5: Running tea POST method based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 880.839µs
Latencies [min, mean, 50, 90, 95, 99, max] 725.559µs, 962.748µs, 938.978µs, 1.053ms, 1.098ms, 1.261ms, 23.289ms
Bytes In [total, mean] 4710000, 157.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```
96 changes: 96 additions & 0 deletions tests/results/dp-perf/2.2.0/2.2.0-plus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Results

## Test environment

NGINX Plus: true

NGINX Gateway Fabric:

- Commit: 9fbef714ea22a35c4f1a8c97bd5b4e406ae0c1e9
- Date: 2025-10-21T10:57:37Z
- Dirty: false

GKE Cluster:

- Node count: 12
- k8s version: v1.33.5-gke.1080000
- vCPUs per node: 16
- RAM per node: 65851524Ki
- Max pods per node: 110
- Zone: us-west1-b
- Instance Type: n2d-standard-16

## Summary:

- Average latency increased across all tests
- Largest Increase: Header-based routing (+76.461µs, +8.60%)
- Smallest Increase: Path-based routing (+28.988µs, +3.26%)
- Average Overall Increase: ~51.1µs (+5.69% average across all tests)
- Most Impacted: Header and query-based routing (8.60% and 5.91% respectively)
- Method Routing: GET and POST both increased by ~5.3%
- All tests maintained 100% success rate, similar throughput and similar max latencies

## Test1: Running latte path based routing

```text
Requests [total, rate, throughput] 30000, 1000.09, 1000.06
Duration [total, attack, wait] 29.998s, 29.997s, 893.093µs
Latencies [min, mean, 50, 90, 95, 99, max] 702.667µs, 917.554µs, 892.32µs, 1.016ms, 1.066ms, 1.254ms, 21.001ms
Bytes In [total, mean] 4740000, 158.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test2: Running coffee header based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 883.984µs
Latencies [min, mean, 50, 90, 95, 99, max] 752.053µs, 964.976µs, 939.422µs, 1.067ms, 1.123ms, 1.313ms, 16.259ms
Bytes In [total, mean] 4770000, 159.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test3: Running coffee query based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 916.972µs
Latencies [min, mean, 50, 90, 95, 99, max] 745.707µs, 955.274µs, 931.109µs, 1.052ms, 1.102ms, 1.287ms, 17.84ms
Bytes In [total, mean] 5010000, 167.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test4: Running tea GET method based routing

```text
Requests [total, rate, throughput] 30000, 1000.01, 999.98
Duration [total, attack, wait] 30.001s, 30s, 938.936µs
Latencies [min, mean, 50, 90, 95, 99, max] 723.854µs, 955.401µs, 930.464µs, 1.057ms, 1.114ms, 1.306ms, 18.287ms
Bytes In [total, mean] 4680000, 156.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```

## Test5: Running tea POST method based routing

```text
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
Duration [total, attack, wait] 30s, 29.999s, 888.406µs
Latencies [min, mean, 50, 90, 95, 99, max] 736.512µs, 956.475µs, 925.958µs, 1.049ms, 1.105ms, 1.293ms, 21.232ms
Bytes In [total, mean] 4680000, 156.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
```
92 changes: 92 additions & 0 deletions tests/results/longevity/2.2.0/2.2.0-oss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Results

## Test environment

NGINX Plus: false

NGINX Gateway Fabric:

- Commit: e4eed2dad213387e6493e76100d285483ccbf261
- Date: 2025-10-17T14:41:02Z
- Dirty: false

GKE Cluster:

- Node count: 3
- k8s version: v1.33.5-gke.1080000
- vCPUs per node: 2
- RAM per node: 4015668Ki
- Max pods per node: 110
- Zone: europe-west2-a
- Instance Type: e2-medium

## Summary:

- Still a lot of non-2xx or 3xx responses, but vastly improved on the last test run.
- This indicates that while most of the Agent - control plane connection issues have been resolved, some issues remain.
- All the observed 502s happened within the one window of time, which at least indicates the system was able to recover - although it is unclear what triggered Agent
- The increase in memory usage for NGF seen in the previous test run appears to have been resolved.
- We observe a steady increase in NGINX memory usage over time which could indicate a memory leak.
- CPU usage remained consistent with past results.
- Errors seem to be related to cluster upgrade or some other external factor (excluding the resolved inferences pool status error).

## Traffic

HTTP:

```text
Running 5760m test @ http://cafe.example.com/coffee
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 202.19ms 150.51ms 2.00s 83.62%
Req/Sec 272.67 178.26 2.59k 63.98%
183598293 requests in 5760.00m, 62.80GB read
Socket errors: connect 0, read 338604, write 82770, timeout 57938
Non-2xx or 3xx responses: 33893
Requests/sec: 531.24
Transfer/sec: 190.54KB
```

HTTPS:

```text
Running 5760m test @ https://cafe.example.com/tea
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 189.21ms 108.25ms 2.00s 66.82%
Req/Sec 271.64 178.03 1.96k 63.33%
182905321 requests in 5760.00m, 61.55GB read
Socket errors: connect 10168, read 332301, write 0, timeout 96
Requests/sec: 529.24
Transfer/sec: 186.76KB
```

## Key Metrics

### Containers memory

![oss-memory.png](oss-memory.png)

### Containers CPU

![oss-cpu.png](oss-cpu.png)

## Error Logs

### nginx-gateway

- msg: Config apply failed, rolling back config; error: error getting file data for name:"/etc/nginx/conf.d/http.conf" hash:"Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew=" permissions:"0644" size:5430: rpc error: code = NotFound desc = file not found -> happened twice in the 4 days, related to agent reconciliation during token rotation
- {hashFound: jmeyy1p+6W1icH2x2YGYffH1XtooWxvizqUVd+WdzQ4=, hashWanted: Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew=, level: debug, logger: nginxUpdater.fileService, msg: File found had wrong hash, ts: 2025-10-18T18:11:24Z}
- The error indicates Agent requested a file that had since changed

- msg: Failed to update lock optimistically: the server was unable to return a response in the time allotted, but may still be processing the request (put leases.coordination.k8s.io ngf-longevity-nginx-gateway-fabric-leader-election), falling back to slow path -> same leader election error as on plus, seems out of scope of our product

- msg: no matches for kind "InferencePool" in version "inference.networking.k8s.io/v1" -> Thousands of these, but fixed in PR 4104

### nginx

Traffic: nearly 34000 502s

- These all happened in the same window of less than a minute (approx 2025-10-18T18:11:11 - 2025-10-18T18:11:50), and resolved once NGINX restarted
- It's unclear what triggered NGINX to restart, though it does appear a memory spike was observed around this time
- The outage correlates with the config apply error seen in the control plane logs
96 changes: 96 additions & 0 deletions tests/results/longevity/2.2.0/2.2.0-plus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Results

## Test environment

NGINX Plus: true

NGINX Gateway Fabric:

- Commit: e4eed2dad213387e6493e76100d285483ccbf261
- Date: 2025-10-17T14:41:02Z
- Dirty: false

GKE Cluster:

- Node count: 3
- k8s version: v1.33.5-gke.1080000
- vCPUs per node: 2
- RAM per node: 4015668Ki
- Max pods per node: 110
- Zone: europe-west2-a
- Instance Type: e2-medium

## Summary:

- Total of 5 502s observed across the 4 days of the test run
- The increase in memory usage for NGF seen in the previous test run appears to have resolved.
- We observe a steady increase in NGINX memory usage over time which could indicate a memory leak.
- CPU usage remained consistant with past results.
- Errors seem to be related to cluster upgrade or some other external factor (excluding the resolved inferences pool status error).

## Key Metrics

### Containers memory

![plus-memory.png](oss-memory.png)

### Containers CPU

![plus-cpu.png](oss-cpu.png)

## Traffic

HTTP:

```text
Running 5760m test @ http://cafe.example.com/coffee
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 203.71ms 108.67ms 2.00s 66.92%
Req/Sec 257.95 167.36 1.44k 63.57%
173901014 requests in 5760.00m, 59.64GB read
Socket errors: connect 0, read 219, write 55133, timeout 27
Non-2xx or 3xx responses: 4
Requests/sec: 503.19
Transfer/sec: 180.96KB
```

HTTPS:

```text
Running 5760m test @ https://cafe.example.com/tea
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 203.89ms 108.72ms 1.89s 66.92%
Req/Sec 257.52 167.02 1.85k 63.64%
173632748 requests in 5760.00m, 58.61GB read
Socket errors: connect 7206, read 113, write 0, timeout 0
Non-2xx or 3xx responses: 1
Requests/sec: 502.41
Transfer/sec: 177.84KB
```


## Error Logs

### nginx-gateway

msg: Failed to update lock optimistically: the server was unable to return a response in the time allotted, but may still be processing the request (put leases.coordination.k8s.io ngf-longevity-nginx-gateway-fabric-leader-election), falling back to slow path -> same leader election error as on oss, seems out of scope of our product

msg: Get "https://34.118.224.1:443/apis/gateway.networking.k8s.io/v1beta1/referencegrants?allowWatchBookmarks=true&resourceVersion=1760806842166968999&timeout=10s&timeoutSeconds=435&watch=true": context canceled -> possible cluster upgrade?

msg: no matches for kind "InferencePool" in version "inference.networking.k8s.io/v1" -> Thousands of these, but fixed in PR 4104

### nginx

Traffic: 5 502s

```
INFO 2025-10-19T00:12:04.220541710Z [resource.labels.containerName: nginx] 10.154.15.240 - - [19/Oct/2025:00:12:04 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
INFO 2025-10-19T18:38:18.651520548Z [resource.labels.containerName: nginx] 10.154.15.240 - - [19/Oct/2025:18:38:18 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
INFO 2025-10-20T21:49:05.008076073Z [resource.labels.containerName: nginx] 10.154.15.240 - - [20/Oct/2025:21:49:04 +0000] "GET /tea HTTP/1.1" 502 150 "-" "-"
INFO 2025-10-21T06:43:10.256327990Z [resource.labels.containerName: nginx] 10.154.15.240 - - [21/Oct/2025:06:43:10 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
INFO 2025-10-21T12:13:05.747098022Z [resource.labels.containerName: nginx] 10.154.15.240 - - [21/Oct/2025:12:13:05 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
```

No other errors identified in this test run.
Binary file added tests/results/longevity/2.2.0/oss-cpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/results/longevity/2.2.0/oss-memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/results/longevity/2.2.0/plus-cpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/results/longevity/2.2.0/plus-memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading