Documentation updates for the v3.0.0 release

denschub · denschub · commit 073e10fdab74 · 2025-12-15T01:53:17.000+01:00
diff --git a/Changelog.md b/Changelog.md
@@ -1,4 +1,4 @@
-# 3.0.0 (unreleased)
+# 3.0.0
 
 This version moves away from individual `INSERT` statements for each individual request. Instead, database insertions are batched via large `INSERT INTO ... SELECT * FROM UNNEST` queries. This significantly increases the throughput, while also reducing database server load.
 
diff --git a/README.md b/README.md
@@ -20,11 +20,9 @@ All data sent to this application is sent unencrypted over UDP. While there are
 
 ## Performance considerations
 
-Because nginx is just firing UDP datagrams towards this application with no regard for anything, this application is designed to process incoming UDP traffic as fast as possible. Each incoming UDP datagram is immediately spawned off into a different task to make room for more UDP traffic. This results in an application that can handle pretty much all traffic - but due to background processing queues and the latency of storing things in a database, memory usage can grow. The `queue-size` setting limits how many valid log entries can be stored asynchronously. The default value of 10k entries guarantees that even in the worst case of 10k log entries, each using 64KiB (which, in practice, is impossible), the app is limited to ~650MiB memory usage.
+Because nginx is just firing UDP datagrams towards this application with no regard for anything, this application is designed to process incoming UDP traffic as fast as possible. Each incoming UDP datagram is immediately spawned off into a different task to make room for more UDP traffic. It's then sent to another thread for parsing and batch-inserting.
 
-In a local benchmark, with nginx, PostgreSQL, and this application sharing a Docker environment with 5 CPU cores of an Apple M1 Max, I was able to achieve a peak traffic of ~22k req/s. This bridge was able to handle all the incoming log entries without issues. However, during the 30 seconds of the burn-in test, a total of ~660k requests have been responded to. Most of them did not immediately end up in the database, as PostgreSQL inserts in their current form are rather slow, so the memory usage peaked at ~480 MiB. Backfilling those log entries into the database took roughly 8 minutes. It's therefore not a good idea to expose this application to a constant load of more than 1k req/s.
-
-For constantly high loads, this application can be optimized by a) batching `INSERT` queries in transactions and b) running transactions in parallel. However, as the currently possible load exceeds any load realistically expected in its environment, these optimizations are ignored for now.
+The default settings can easily handle over 5000 requests per second with little resource use, so you should pretty much never have a reason to adjust limits. Check out [the benchmark document in this repo](./docs/benchmark.md) for more details. If you have to increase the limits, I recommend you to keep `QUEUE_SIZE` roughly two times `INSERT_BATCH_SIZE` for constant load. There isn't much point in storing more than you can insert, so the queue should only be a buffer for whenever the bridge is writing. For spiky loads, you can increase `QUEUE_SIZE` further, which will create a bit of a "backlog" of log entries that get stored in the database later.
 
 ## Required nginx configuration
 
diff --git a/docs/benchmark.md b/docs/benchmark.md
@@ -0,0 +1,45 @@
+# Benchmarking the Bridge for Fun and Profit
+
+This document shouldn't be taken as a source of "what you can expect if you run this in production", but more a general showcase in _one_ possible scenario.
+
+Benchmarks were run on 2025-12-14. I used two of the smallest available Hetzner Cloud VMs with dedicated CPU cores. These were called CCX13, had two dedicated CPU cores, 8 GiB of RAM, and 80 GiB of local NVMe storage. According to `/proc/cpuinfo`, these VMs ran on an AMD EPYC cpu with 2.4 GHz, but I don't know the exact cores.
+
+The first machine ran nginx version 1.29.4, and also ran the benchmarks based on `wrk` 4.1.0. The second machine ran PostgreSQL 18.1, and also ran v3.0.0 of the bridge. I decided to have the bridge and database on the same machine to avoid testing server bandwidth. However, both servers were connected using Hetzner's "private network" feature, which held 6 Gbit/s consistently. PostgreSQL, nginx, and the bridge all ran in Docker containers for ease of testing. The CPU and memory provided are from `docker stats`.
+
+To give nginx an as-high-as-possible throughput, I used a simple server that just returned a static string:
+
+```plain
+server {
+  listen 80 default_server reuseport;
+  listen [::]:80 default_server reuseport;
+  server_name _;
+
+  add_header Content-Type text/plain;
+  return 200 "meow";
+}
+```
+
+## Default settings
+
+This test ran with the bridge's default settings and no custom queue sizes, so `INSERT_BATCH_SIZE=10` and `QUEUE_SIZE=50`. I used a `wrk` lua script to throttle requests, let it run for 60 seconds, and checked the peak values for the containers;
+
+| Throughput | Bridge CPU peak | Bridge RAM peak | DB CPU peak | DB RAM peak |
+| ---------- | --------------- | --------------- | ----------- | ----------- |
+| 10 req/s   | 0.39%           | 840 KiB         | 1.54%       | 168.4 MiB   |
+| 100 req/s  | 1.22%           | 860 KiB         | 1.32%       | 168.4 MiB   |
+| 1000 req/s | 9.55%           | 1.14 MiB        | 11.93%      | 171.4 MiB   |
+| 5400 req/s | 49.77 %         | 1.16 MiB        | 64.73 %     | 194.6 MiB   |
+
+Pushing it higher than 5.5k req/s resulted in requests that weren't logged anymore. That was not a system resource limitation per se, but the insert batches of 10 rows didn't complete fast enough, so the `QUEUE_SIZE` filled up and dropped input packets.
+
+## Pushing the limits
+
+If I let `wrk` run as fast as it can, it fired around 47k req/s to nginx. I ran a 60 seconds test, in which wrk fired 2.80 million requests. Only 454k ended up in the database so only 16.1% of request got logged. The bridge peaked at 80.8% CPU and 1.53 MiB RAM. The DB peaked at 58.77% CPU and 188.5 MiB RAM.
+
+To increase throughput, I set `INSERT_BATCH_SIZE=2000` and `QUEUE_SIZE=4000`. In general, having `QUEUE_SIZE` a two-times your `INSERT_BATCH_SIZE` is a good idea if you're optimizing for constant load - there really isn't a point in keeping more than two batches in your input queue - if you can't write to the DB fast enough, there's no point in keeping much more than that around.
+
+With those settings, I ran another 60 second `wrk` test, and fired 2.87 million requests. Out of those, 2.52 made it into the databasse, or 87.1% of everything. The bridge peaked at 97.34% CPU and 6.77 MiB RAM. The database reached 67.03 % CPU and 217.9 MiB RAM. These results show that the bottleneck in v3.0.0 of the bridge is the parsing-side of things, and not directly the database layer.
+
+Based on the 87% storage rate, one could estimate that a 40k req/s is a handle'able load. I couldn't get a `wrk` delay script to be precise enough to throttle to that, and I also couldn't get other benchmark tools to work - `autocannon`, for example, was always too slow. So I couldn't 100% verify the 40k req/s throughput, but since this heavily depends on your individual CPU anyway, there isn't too much point in capturing precise numbers.
+
+This document should, at least, demonstrate that this bridge is easily to handle tens of thousands of requests with a really small resource footprint, and I hope you have an idea how to benchmark it on your own infrastructure if needed.

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# 3.0.0 (unreleased)`
	`1`	`+# 3.0.0`
`2`	`2`
`3`	`3`	This version moves away from individual `INSERT` statements for each individual request. Instead, database insertions are batched via large `INSERT INTO ... SELECT * FROM UNNEST` queries. This significantly increases the throughput, while also reducing database server load.
`4`	`4`