ZFS's zstd compression ratio is much less than /usr/bin/zstd #11293

stuartthebruce · 2020-12-06T01:56:47Z

stuartthebruce
Dec 6, 2020

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	8.2.2004
Linux Kernel	4.18.0-193.28.1.el8_2
Architecture	x86_64
ZFS Version	2.0.0-1
SPL Version	2.0.0-1

Describe the problem you're observing

zstd compression within ZFS generates far less compression than /usr/bin/zstd for a single 1GB file test.

Describe how to reproduce the problem

Create a compressible 1G file in /dev/shm and compare the results of running /usr/bin/lz4 and /usr/bin/zstd versus copying that file to ZFS filesystems with the same compression algorithms enabled.

[root@zfs1 shm]# du -k tst.dat
1086004	tst.dat

[root@zfs1 shm]# zfs create home2/lz4
[root@zfs1 shm]# zfs set compression=lz4 home2/lz4
[root@zfs1 shm]# zfs create home2/zstd
[root@zfs1 shm]# zfs set compression=zstd home2/zstd

[root@zfs1 shm]# cp tst.dat /home2/lz4
[root@zfs1 shm]# cp tst.dat /home2/zstd

[root@zfs1 shm]# du -sk /home2/*/tst.dat
1047855	/home2/lz4/tst.dat
920768	/home2/zstd/tst.dat

[root@zfs1 shm]# zfs get compressratio home2/lz4 home2/zstd
NAME        PROPERTY       VALUE  SOURCE
home2/lz4   compressratio  1.03x  -
home2/zstd  compressratio  1.17x  -

so far so good with zstd outperforming lz4 within ZFS. However, while /usr/bin/lz4 reports similar compression to ZFS+lz4, /usr/bin/zstd outperforms ZFS+zstd by a significant amount (300% vs 17%)

[root@zfs1 shm]# lz4 tst.dat
Compressed filename will be : tst.dat.lz4
Compressed 1112228615 bytes into 1048735490 bytes ==> 94.29%
[root@zfs1 shm]# zstd tst.dat
tst.dat              : 31.37%   (1112228615 => 348859160 bytes, tst.dat.zst)

[root@zfs1 shm]# du -k tst.dat*
1086004	tst.dat
1024156	tst.dat.lz4
340684	tst.dat.zst

What am I doing wrong?

Include any warning/errors/backtraces from the system logs

-->

Answered by PrivatePuffin

Dec 6, 2020

Some background info on ZSTD:
Simply put: in normal ZSTD compression, zstd handles all aspects of compression. From cutting your file into blocks, compressing them and so forth.
With ZSTD on ZFS, Only the ZSTD compression function is run over the blocks that are generated by ZFS.

The compression results of those two is not 1-to-1 comparable or at least will vary significantly depending on the recordsize set in ZFS. Why? Because compression logic gets more efficient the more data it has too compress.

Due to above reasons, it's always best to compare compression algorithms through zfs. Because you can't ignore the effect of the ZFS codestack itself on compression.

It appears that zstd is m…

View full answer

amotin · 2020-12-06T02:00:42Z

amotin
Dec 6, 2020
Collaborator

What is the compression block of the standalone zstd tool? Have you tries to set at least recordsize=1m for the datasets?

0 replies

stuartthebruce · 2020-12-06T02:28:26Z

stuartthebruce
Dec 6, 2020
Author

Changing /usr/bin/zstd block size from the default of "no split" to 16k makes very little difference,

[root@zfs1 shm]# zstd -B16384 tst.dat
tst.dat              : 31.39%   (1112228615 => 349075069 bytes, tst.dat.zst)

However, setting recordsize=1m generates makes a significant difference and ZFS achieves essentially the same level of compression as /usr/bin/zstd (both with a default level of 3 as I understand the documentation),

[root@zfs1 shm]# du -k /home2/zstd/tst.dat
382157	/home2/zstd/tst.dat
[root@zfs1 shm]# zfs get compressratio home2/zstd
NAME        PROPERTY       VALUE  SOURCE
home2/zstd  compressratio  2.63x  -

Looks like I need to run some larger scale tests to tune the value of recordsize for my application.

And I don't understand what /usr/bin/zstd -B does since I get the exact same compression for -B# for a few different values (including 1!), but it is different than not setting it at all,

[root@zfs1 shm]# zstd -B1 tst.dat
tst.dat              : 31.39%   (1112228615 => 349075069 bytes, tst.dat.zst)

0 replies

stuartthebruce · 2020-12-06T02:34:12Z

stuartthebruce
Dec 6, 2020
Author

FWIW, here are the compression values for recordsize=256k

[root@zfs1 shm]# du -k /home2/zstd/tst.dat
603119	/home2/zstd/tst.dat
[root@zfs1 shm]# zfs get compressratio home2/zstd
NAME        PROPERTY       VALUE  SOURCE
home2/zstd  compressratio  1.72x  -

and recordsize=512k

[root@zfs1 shm]# du -k /home2/zstd/tst.dat
456502	/home2/zstd/tst.dat
[root@zfs1 shm]# zfs get compressratio home2/zstd
NAME        PROPERTY       VALUE  SOURCE
home2/zstd  compressratio  2.24x  -

and almost no compression for recordisize=64k

[root@zfs1 shm]# du -k /home2/zstd/tst.dat
1144908	/home2/zstd/tst.dat
[root@zfs1 shm]# zfs get compressratio home2/zstd
NAME        PROPERTY       VALUE  SOURCE
home2/zstd  compressratio  1.03x  -

0 replies

stuartthebruce · 2020-12-06T02:37:16Z

stuartthebruce
Dec 6, 2020
Author

It appears that zstd is much more sensitive to recordsize than lz4, e.g., lz4 with recordsize=1m makes very little difference over the default value of 128k

[root@zfs1 shm]# du -k /home2/lz4/tst.dat
964348	/home2/lz4/tst.dat
[root@zfs1 shm]# zfs get compressratio home2/lz4
NAME       PROPERTY       VALUE  SOURCE
home2/lz4  compressratio  1.03x  -

0 replies

stuartthebruce · 2020-12-06T03:00:35Z

stuartthebruce
Dec 6, 2020
Author

Given the significant dependency of compression=zstd on recordsize it might be worth updating the zfsprops man page,

     recordsize=size
       Specifies a suggested block size for files in the file system.  This
       property is designed solely for use with database workloads that access
       files in fixed-size records.  ZFS automatically tunes block sizes
       according to internal algorithms optimized for typical access patterns.

Of course even better would be if ZFS would auto-tune for this.

What is the downside of running ta raditional home directory dataset with recordsize=1m even if it has lots of <1MB files? From a quick test it appears that logicalused increases significantly, but used decreases.

0 replies

amotin · 2020-12-06T03:13:24Z

amotin
Dec 6, 2020
Collaborator

Large recordsize is good for files that are written and read sequentially or in respectively large chunks, but it is pretty bad for random access, especially random rewrite, causing read/write inflation and read-modify-write patterns. Files that are smaller than recordsize are stored as-is, so file size is not a big factor for recordsize, but obviously setting it above the typical file size makes no difference.

0 replies

amotin · 2020-12-06T03:16:42Z

amotin
Dec 6, 2020
Collaborator

It appears that zstd is much more sensitive to recordsize than lz4, e.g., lz4 with recordsize=1m makes very little difference over the default value of 128k

It very much depends on your specific test file. If you are choosing parameters for your home directory -- copy there your real home directory and see what happen. No any artificial test file give you the right answer, since it does not represent your specific case.

0 replies

Provissy · 2020-12-06T08:29:13Z

Provissy
Dec 6, 2020

You could also have a look on this issue #10201 .

Your test file might not satisfying ZFS' "criteria" for compression.

0 replies

PrivatePuffin · 2020-12-06T11:30:54Z

PrivatePuffin
Dec 6, 2020

Some background info on ZSTD:
Simply put: in normal ZSTD compression, zstd handles all aspects of compression. From cutting your file into blocks, compressing them and so forth.
With ZSTD on ZFS, Only the ZSTD compression function is run over the blocks that are generated by ZFS.

The compression results of those two is not 1-to-1 comparable or at least will vary significantly depending on the recordsize set in ZFS. Why? Because compression logic gets more efficient the more data it has too compress.

Due to above reasons, it's always best to compare compression algorithms through zfs. Because you can't ignore the effect of the ZFS codestack itself on compression.

It appears that zstd is much more sensitive to recordsize than lz4, e.g., lz4 with recordsize=1m makes very little difference over the default value of 128k

LZ4 simply has a terrible upper compression limit. That kinda makes it hard to compare the effect of recordsize between the two, it's two totally different algorithms with totally difference compression curves (performance/ratio and recordsize/ratio).

It very much depends on your specific test file.

Thats true, it's prefered to use some sort of highly compressible non-generated standardised testfile like enwik9. Also to enable others to repeat your tests, but also to ensure there isn't any correlation between your file generation script and the compressor.

TLDR:
Lower ratio on smallerrecordsizes is expected behavior with ZSTD on ZFS, or any algorithm on zfs for that mater.

0 replies

WhiteTrashLord · 2020-12-13T21:47:23Z

WhiteTrashLord
Dec 13, 2020

I also did some tests. Increasing the recordsize from 128K to 1M also increased the compressratio from 1.36x to 1.58x.
I don't know how to set a higher recordsize than 1M on Ubuntu.

0 replies

stuartthebruce · 2020-12-13T22:09:51Z

stuartthebruce
Dec 13, 2020
Author

It would be nice if it was possible to zfs send recordsize=128k to zfs receive recordsize=1M for efficient disk backups, which I attempted to capture in #11313.

0 replies

unisol-ua · 2023-04-11T20:15:13Z

unisol-ua
Apr 11, 2023

A detailed explanation that I've concluded to over 10 years ago:
ZFS compresses up to a recordsize chunk into at least one blocksize chunk, or into an integer number of chunks.
That is - you can compress 1M file into 4/8/12/...KB but you can't compress 4KB into anything if your blocksize is 4K, and if your 8KB file doesn't compress into 4K - it's not compressed either.
And the downside of this - you have to read that "up to a recordsize" of compressed chunks and decompress them into that 1MB to access anything inside - meaning HUGE computation/latency overhead for things like database files if you run even with 128KB recordsize, let alone 1MB.
So it does make sense to make 2KB blocksize and 8-16KB recordsize in certain cases (effectively capping compression ratio to 4-8), despite media size being 4-8KB - as you're more likely to get 8KB squezed into sub-6KB and win something than fail compressing 8KB record into 4KB block - and get nothing.
Surely, there are special cases, like sparse files, etc.
Yet another conclusion: if you compress your data info 4100 bytes with e.g. gzip -9 and into 8000 bytes with lz4 - it doesn't make sense to use gzip-9 as you still get 8KB occupied (4KB blocksize) in both cases but decompression overhead is a total loss for gzip-9.
For read-intensive loads with space being an issue - I'd prefer seeing "adaptive" compression method applied - basically "try several methods, choose the fastest to decompress that takes the same space".

1 reply

PrivatePuffin Apr 12, 2023

Thanks for necroing this spam repeat of what I already wrote before into my notifications.
/sarcsam

ZFS's zstd compression ratio is much less than /usr/bin/zstd #11293

Uh oh!

stuartthebruce Dec 6, 2020

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Replies: 12 comments · 1 reply

Uh oh!

amotin Dec 6, 2020 Collaborator

Uh oh!

stuartthebruce Dec 6, 2020 Author

Uh oh!

stuartthebruce Dec 6, 2020 Author

Uh oh!

stuartthebruce Dec 6, 2020 Author

Uh oh!

stuartthebruce Dec 6, 2020 Author

Uh oh!

amotin Dec 6, 2020 Collaborator

Uh oh!

Uh oh!

amotin Dec 6, 2020 Collaborator

Uh oh!

Provissy Dec 6, 2020

Uh oh!

PrivatePuffin Dec 6, 2020

Uh oh!

WhiteTrashLord Dec 13, 2020

Uh oh!

stuartthebruce Dec 13, 2020 Author

Uh oh!

unisol-ua Apr 11, 2023

Uh oh!

PrivatePuffin Apr 12, 2023

stuartthebruce
Dec 6, 2020

Replies: 12 comments 1 reply

amotin
Dec 6, 2020
Collaborator

stuartthebruce
Dec 6, 2020
Author

stuartthebruce
Dec 6, 2020
Author

stuartthebruce
Dec 6, 2020
Author

stuartthebruce
Dec 6, 2020
Author

amotin
Dec 6, 2020
Collaborator

amotin
Dec 6, 2020
Collaborator

Provissy
Dec 6, 2020

PrivatePuffin
Dec 6, 2020

WhiteTrashLord
Dec 13, 2020

stuartthebruce
Dec 13, 2020
Author

unisol-ua
Apr 11, 2023