Skip to content

Linux's IO_Uring interface (2x IO performance!) #10740

@lbguilherme

Description

@lbguilherme

Since Linux 5.1 a new interface for asynchronous IO was released, making it possible to do IO with zero (or very few) system calls. It works by keeping a lock-free IO submission queue and a lock-free IO completion queue in a memory shared between the kernel and the application (thus reducing context switches). The performance improvements are impressive.

I began experimenting with replacing LibEvent with IoUring in Crystal, and the results were quite good. I'm currently benchmarking with the sample HTTP Server (without preview_mt):

require "http/server"

{% if flag?(:preview_iouring) %}
  require "./stdlib_patch"
{% end %}

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!"
end

address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen

With LibEvent2 (Crystal 1.0.0 unchanged)

$ wrk -t12 -c400 -d30s http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.91ms    2.27ms 207.65ms   71.96%
    Req/Sec     6.77k   749.15    11.23k    79.17%
  2426542 requests in 30.05s, 233.73MB read
Requests/sec:  80758.47
Transfer/sec:      7.78MB

With io_uring (Crystal 1.0.0, with -Dpreview_iouring)

$ wrk -t12 -c400 -d30s http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   653.79us  357.50us  37.98ms   98.11%
    Req/Sec    26.79k    17.49k  185.83k    58.06%
  5574025 requests in 30.10s, 536.90MB read
Requests/sec: 185189.62
Transfer/sec:     17.84MB

This is more than double the performance! And it can probably be tuned even further.

But...

io_uring only got support for sockets with Linux 5.6 (May 2020) and this version definitely isn't used everywhere yet. Even worse: WSL2 used a 5.6 kernel with io_uring disabled (for no reason at all).

That said I see three paths going forward:

  1. io_uring is added to the standard library and is enabled behind a -Dpreview_iouring flag. That way only those who can afford it will enable it. The resulting binary wouldn't need to link to libevent2 at all.
  2. io_uring is added to the standard library and the runtime would detect Kernel support for it at initialization, falling back to libevent2 as needed.
  3. io_uring is released as a Shard that does monkey-patching into internal bits of the Crystal module. Its versions would be tied to specific Crystal versions since that module is not protected by SemVer.

Here I would like to understand if options 1 or 2 are desirable (if so, I'll work on a PR).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions