-
-
Notifications
You must be signed in to change notification settings - Fork 609
Description
The commit 76ad946 greatly improved the speed of reading and writing metadata blocks, where, for example, the i-node tables are stored. It did so by enabling the write-back cache,bcache, implemented by the lwext4 library.
Reading and writing regular data blocks, or simply reading and writing files' content, involves direct access to the block device without any cache and can be, in some workflows, dramatically slow. For example, executing one of the java unit tests, io.osv.TestDomainPermissions, reveals that it takes 5-6 times longer to run the test on the ext image compared to zfs one (btw this can only be observed when running on qemu with cache=none,aio=native and normally unit tests run with test.py execute with cache=unsafe,aio=thread). After capturing the block device strategy tracepoints, we see almost 7K of those on ext compared to ~500 on zfs.
Looking more closely, reveals many 4K reads, most likely triggered by page faults when loading memory-mapped ELF files:
0x0000400001f50040 >/usr/lib/jvm/j 1 3.781554322 0.009 virtio_blk_strategy cmd=1, offset=110c9c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.782283354 0.009 virtio_blk_strategy cmd=1, offset=110cac00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.783093732 0.011 virtio_blk_strategy cmd=1, offset=110cbc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.783909252 0.011 virtio_blk_strategy cmd=1, offset=110ccc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.784708002 0.009 virtio_blk_strategy cmd=1, offset=111edc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.785320493 0.009 virtio_blk_strategy cmd=1, offset=110c3c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.786140638 0.010 virtio_blk_strategy cmd=1, offset=111e6c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.786949947 0.010 virtio_blk_strategy cmd=1, offset=110c4c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.787837132 0.011 virtio_blk_strategy cmd=1, offset=111e7c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.788530242 0.009 virtio_blk_strategy cmd=1, offset=111e8c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.789066929 0.009 virtio_blk_strategy cmd=1, offset=110c5c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.789898191 0.011 virtio_blk_strategy cmd=1, offset=111e9c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.790595935 0.009 virtio_blk_strategy cmd=1, offset=111eac00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.791300200 0.008 virtio_blk_strategy cmd=1, offset=110c6c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.791955380 0.008 virtio_blk_strategy cmd=1, offset=110c7c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.792695495 0.008 virtio_blk_strategy cmd=1, offset=110bcc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.793365419 0.008 virtio_blk_strategy cmd=1, offset=110bdc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.794072225 0.008 virtio_blk_strategy cmd=1, offset=110b4c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.794635442 0.008 virtio_blk_strategy cmd=1, offset=110bac00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.795311067 0.008 virtio_blk_strategy cmd=1, offset=110b9c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.796014469 0.008 virtio_blk_strategy cmd=1, offset=110b7c00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.796838057 0.010 virtio_blk_strategy cmd=1, offset=110bfc00, bcount=1000
0x0000400001f50040 >/usr/lib/jvm/j 1 3.797541816 0.009 virtio_blk_strategy cmd=1, offset=110b6c00, bcount=1000
Clearly, libext would benefit from a read-ahead, maybe write-back (or write-through) cache. Ideally integrated with the core/pagecache.cc.
The zfs is a much better alternative in this case, but it has some drawbacks (large, many threads, slow to boot, not as user-friendly as ext on most Linux distributions - read more in this Wiki page). So it would be nice to improve libext in this aspect.