-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Since Linux 5.1 a new interface for asynchronous IO was released, making it possible to do IO with zero (or very few) system calls. It works by keeping a lock-free IO submission queue and a lock-free IO completion queue in a memory shared between the kernel and the application (thus reducing context switches). The performance improvements are impressive.
I began experimenting with replacing LibEvent with IoUring in Crystal, and the results were quite good. I'm currently benchmarking with the sample HTTP Server (without preview_mt):
require "http/server"
{% if flag?(:preview_iouring) %}
require "./stdlib_patch"
{% end %}
server = HTTP::Server.new do |context|
context.response.content_type = "text/plain"
context.response.print "Hello world!"
end
address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listenWith LibEvent2 (Crystal 1.0.0 unchanged)
$ wrk -t12 -c400 -d30s http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.91ms 2.27ms 207.65ms 71.96%
Req/Sec 6.77k 749.15 11.23k 79.17%
2426542 requests in 30.05s, 233.73MB read
Requests/sec: 80758.47
Transfer/sec: 7.78MB
With io_uring (Crystal 1.0.0, with -Dpreview_iouring)
$ wrk -t12 -c400 -d30s http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 653.79us 357.50us 37.98ms 98.11%
Req/Sec 26.79k 17.49k 185.83k 58.06%
5574025 requests in 30.10s, 536.90MB read
Requests/sec: 185189.62
Transfer/sec: 17.84MB
This is more than double the performance! And it can probably be tuned even further.
But...
io_uring only got support for sockets with Linux 5.6 (May 2020) and this version definitely isn't used everywhere yet. Even worse: WSL2 used a 5.6 kernel with io_uring disabled (for no reason at all).
That said I see three paths going forward:
- io_uring is added to the standard library and is enabled behind a
-Dpreview_iouringflag. That way only those who can afford it will enable it. The resulting binary wouldn't need to link to libevent2 at all. - io_uring is added to the standard library and the runtime would detect Kernel support for it at initialization, falling back to libevent2 as needed.
- io_uring is released as a Shard that does monkey-patching into internal bits of the
Crystalmodule. Its versions would be tied to specific Crystal versions since that module is not protected by SemVer.
Here I would like to understand if options 1 or 2 are desirable (if so, I'll work on a PR).