Skip to content

Conversation

@lrusak
Copy link
Member

@lrusak lrusak commented Sep 5, 2016

As discussed in #644 This removes OPTIMIZATION=normal (02)

also this moves some default options from projects to the distro options.

This should be tested as it will default all projects to use lzo compression instead of gzip, not sure how big of an impact this will make.

@stefansaraev
Copy link
Contributor

on lzo vs gzip: from my past experience, gzip is the best compromise between runtime decompression speed and compression ratio. thats why I competely removed other squashfs compression methods in my fork.

lzo will make your final images ~ 5% smaller, with noticeable drop in decompression speed on arm. you will most likely not see any difference on x86 (with SYSTEM loaded into ram)

you can do a quick read test and check the speeds:

echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null

@MilhouseVH
Copy link
Contributor

I'll build this for RPi/RPi2/Generic and get some timings later this afternoon (unless anyone beats me to it)

@MilhouseVH
Copy link
Contributor

Previous attempts to change the squashfs compression algorithm have suggested lz4 HC bested lzo by a comfortable margin. If we're going to make this change then I think we should at least consider all alternatives, and pick the best one.

@stefansaraev
Copy link
Contributor

some of your kernels wont support that. I dont think it's worth the trouble backporting huge patches. but that's just me ;)

@MilhouseVH
Copy link
Contributor

Just saying it should be discussed/considered, I don't want to have to revisit this again any time soon!

@MilhouseVH
Copy link
Contributor

MilhouseVH commented Sep 5, 2016

Ran some tests on RPi3 with lzo (RPi/RPi2 default), lz4 (-Xhc) and gzip.

First, the file sizes:

lz4 : -rw-r--r-- 1 neil neil 140390400 Sep  5 14:35 LibreELEC-RPi2.arm-8.0-devel-20160905143455-#0905b-g087b668.system
lzo : -rw-r--r-- 1 neil neil 130740224 Sep  5 17:29 LibreELEC-RPi2.arm-8.0-devel-20160905172503-#0905h-g087b668.system
gzip: -rw-r--r-- 1 neil neil 120020992 Sep  5 17:55 LibreELEC-RPi2.arm-8.0-devel-20160905175450-#0905j-g087b668.system

RESULT: lz4 increases the SYSTEM file by 10MB, while gzip reduces by the same amount.

WINNER: gzip

Now the timings (using seo's method, comment 2):

lz4:

rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650356 bytes (20.6MB) copied, 0.182437 seconds, 113.2MB/s
real    0m 0.18s
user    0m 0.01s
sys     0m 0.11s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650356 bytes (20.6MB) copied, 0.186407 seconds, 110.8MB/s
real    0m 0.18s
user    0m 0.01s
sys     0m 0.11s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650356 bytes (20.6MB) copied, 0.186963 seconds, 110.4MB/s
real    0m 0.18s
user    0m 0.00s
sys     0m 0.12s

lzo:

rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.184409 seconds, 112.0MB/s
real    0m 0.18s
user    0m 0.00s
sys     0m 0.13s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.188123 seconds, 109.8MB/s
real    0m 0.19s
user    0m 0.01s
sys     0m 0.12s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.190745 seconds, 108.2MB/s
real    0m 0.19s
user    0m 0.01s
sys     0m 0.12s

gzip:

rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.247372 seconds, 83.5MB/s
real    0m 0.25s
user    0m 0.00s
sys     0m 0.19s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.255420 seconds, 80.8MB/s
real    0m 0.25s
user    0m 0.00s
sys     0m 0.19s
rpi22:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42285+1 records in
42285+1 records out
21650324 bytes (20.6MB) copied, 0.255556 seconds, 80.8MB/s
real    0m 0.25s
user    0m 0.00s
sys     0m 0.18s

RESULT: No significant difference between lz4 and lzo, but gzip is noticeably slower.

WINNER: lzo and lz4 tied

Conclusion

Nice file size saving from gzip, but slower performance. This is probably the reason why gzip is used as the default for Generic where the SYSTEM file is so much larger, but the performance penalty is going to be less noticeable.

On platforms such as ARM, lzo is likely to be the better choice as lzo compresses more efficiently than lz4 (though not as well as gzip) while outperforming gzip (and performance is more critical on these systems).

As far as this PR is concerned, continuing to use gzip for Generic (and Virtual) is probably the right choice, while all other platforms could switch to lzo (I think the only other gzip project is imx6 - the rest already default to lzo).

@MilhouseVH
Copy link
Contributor

MilhouseVH commented Sep 5, 2016

For reference this is a Generic project built with gzip, lzo and lz4:

gzip: -rw-r--r-- 1 neil neil 216678400 Sep  5 07:33 LibreELEC-Generic.x86_64-8.0-devel-20160905072540-#0904x-g087b668.system
lzo : -rw-r--r-- 1 neil neil 238718976 Sep  5 14:53 LibreELEC-Generic.x86_64-8.0-devel-20160905115135-#0905-g087b668.system
lz4 : -rw-r--r-- 1 neil neil 256512000 Sep  5 14:36 LibreELEC-Generic.x86_64-8.0-devel-20160905143455-#0905b-g087b668.system

@lrusak
Copy link
Member Author

lrusak commented Sep 5, 2016

Hmm, ok well I can put the squashfs compression back to be project specific. I didn't think it would be that much of a difference.

@stefansaraev
Copy link
Contributor

stefansaraev commented Sep 5, 2016

you got 100MB/s + on rpi and sdcard ?

EDIT: you should also check overal boot time (systemd-analyze)

@MilhouseVH
Copy link
Contributor

you got 100MB/s + on rpi and sdcard ?

It is an 8GB NOOBS card (Samsung, from RPi Foundation). Overclocked with dtparam=sd_overclock=100 - in tests this card gives a sequential read rate of about 42MB/s.

I guess when the data is compressed the actual read rate is even better.

@vpeter4
Copy link
Contributor

vpeter4 commented Sep 5, 2016

This is on imx6 cuboxi using sandisk extreme pro sd card

gzip
21679116 bytes (20.7MB) copied, 0.575798 seconds, 35.9MB/s
lz4
21679116 bytes (20.7MB) copied, 0.460767 seconds, 44.9MB/s

15% SYSTEM size increase with lz4 over gzip.

@MilhouseVH
Copy link
Contributor

RPi1 results are now in... they're a little different to the RPi2!

File sizes:

lzo : -rw-r--r-- 1 neil neil 130822144 Sep  5 09:56 LibreELEC-RPi.arm-8.0-devel-20160905095144-#0904x-g087b668.system
lz4 : -rw-r--r-- 1 neil neil 140500992 Sep  5 14:35 LibreELEC-RPi.arm-8.0-devel-20160905143455-#0905b-g087b668.system
gzip: -rw-r--r-- 1 neil neil 120123392 Sep  5 18:47 LibreELEC-RPi.arm-8.0-devel-20160905184254-#0905k-g087b668.system

Same as before: gzip compresses better than lzo which compresses better than lz4.

lz4:

rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.605761 seconds, 34.1MB/s
real    0m 0.61s
user    0m 0.00s
sys     0m 0.37s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.617278 seconds, 33.5MB/s
real    0m 0.62s
user    0m 0.02s
sys     0m 0.36s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.612548 seconds, 33.7MB/s
real    0m 0.62s
user    0m 0.01s
sys     0m 0.36s

lzo:

rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665620 bytes (20.7MB) copied, 1.022358 seconds, 20.2MB/s
real    0m 1.04s
user    0m 0.02s
sys     0m 0.60s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665620 bytes (20.7MB) copied, 1.011644 seconds, 20.4MB/s
real    0m 1.03s
user    0m 0.03s
sys     0m 0.57s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665620 bytes (20.7MB) copied, 1.029859 seconds, 20.1MB/s
real    0m 1.05s
user    0m 0.01s
sys     0m 0.62s

gzip:

rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.772969 seconds, 26.7MB/s
real    0m 0.78s
user    0m 0.02s
sys     0m 0.47s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.778225 seconds, 26.5MB/s
real    0m 0.78s
user    0m 0.03s
sys     0m 0.48s
rpi512:~ # echo 3 > /proc/sys/vm/drop_caches; time dd if=/usr/lib/kodi/kodi.bin of=/dev/null
42315+1 records in
42315+1 records out
21665644 bytes (20.7MB) copied, 0.768928 seconds, 26.9MB/s
real    0m 0.78s
user    0m 0.02s
sys     0m 0.48s

Conclusion: lz4 produces a larger file size but (in this simplistic test!) outperforms lzo by a large margin (almost 50%). gzip also outperforms lzo on RPi (about 33%), and combined with the smaller file size might be the optimal choice on low-end hardware. Certainly on low-end hardware, lzo does not seem the best choice.

@MilhouseVH
Copy link
Contributor

One other consideration - if lz4 is ever seriously considered in future - is that any kernels that currently lack lz4 support will not be able to upgrade to a new build that uses lz4 compression for the squashfs (as the old kernel needs to mount the new SYSTEM partition in order to complete the upgrade).

Given this, and the fact that imx6 3.14 currently lacks lz4 support, then perhaps gzip for low-end hardware might be a better choice.

@stefansaraev
Copy link
Contributor

now, you really need to check boot speed. and decide if supporting anything than gzip for LE (50-200MB root.sqfs) is is worth the effort.

I would bet, on low memory / slow cpu devices gzip wins, on high end hardware - it doesn't matter too much for a small squashfs root that's loaded and cached once.

@vpeter4
Copy link
Contributor

vpeter4 commented Sep 5, 2016

For imx6 with 3.14 kernel I need to add small lz4 squashfs patch.

@MilhouseVH
Copy link
Contributor

Boot speeds (based on time to start Kodi) show slightly different results to the simple tests from earlier...

RPi1:

lz4 : 12:14:28  16.996590 T:1960583168  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:28  16.929241 T:1961586688  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:28  16.953888 T:1961381888  NOTICE: special://profile/ is mapped to: special://masterprofile/

lzo:  12:14:28  14.944477 T:1960579072  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:28  14.958488 T:1960669184  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:28  14.959731 T:1960972288  NOTICE: special://profile/ is mapped to: special://masterprofile/

gzip: 12:14:30  17.949179 T:1960943616  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:29  18.002409 T:1961160704  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:29  18.022547 T:1960755200  NOTICE: special://profile/ is mapped to: special://masterprofile/

lzo is fastest on low end hardware, followed by lz4 (+2 seconds), with gzip slowest by quite a margin (+3 seconds).

RPi2:

lz4 : 12:14:25   6.225009 T:1961660416  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.254738 T:1962135552  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.238373 T:1962123264  NOTICE: special://profile/ is mapped to: special://masterprofile/

lzo : 12:14:25   6.135375 T:1961979904  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.252344 T:1962061824  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.247582 T:1962016768  NOTICE: special://profile/ is mapped to: special://masterprofile/

gzip: 12:14:25   6.537451 T:1961394176  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.428099 T:1962106880  NOTICE: special://profile/ is mapped to: special://masterprofile/
      12:14:25   6.631059 T:1962033152  NOTICE: special://profile/ is mapped to: special://masterprofile/      

lzo and lz4 are also both faster than gzip on more capable hardware (though not by much, only +0.3 seconds).

@stefansaraev
Copy link
Contributor

stefansaraev commented Sep 5, 2016

cool. so I guess lzo wins with newer (mainline, there are some nice squashfs optimizations not present in older kernels) kernels and bigger squashfs root.

EDIT: with slower sdcards result may be slightly different :)

@lrusak
Copy link
Member Author

lrusak commented Sep 6, 2016

So what's the verdict? do we leave squashfs for individual projects as is or do we move all projects to a single squashfs compression?

@CvH
Copy link
Member

CvH commented Sep 6, 2016

maybe even use gzip for rpi 2/3 - at slow sd cards it could even outperform lzo

@MilhouseVH
Copy link
Contributor

Yes, I would continue to allow individual projects to choose the squashfs compression method.

lzo performs better on low-end devices, gzip gives better compression producing smaller files, and lz4 manages to produce both larger files while performing poorly on low-end devices.

Based on the results above I'd suggest lzo for lower power devices (best performance) and gzip for Generic/Virtual (as these have much larger files, and are unlikely to notice any slightly reduced performance).

Or put it another way, gzip for Generic/Virtual (as it is now), and lzo for everything else (again as it is now, with the exception of imx6 as this project is currently using gzip for some reason).

@lrusak
Copy link
Member Author

lrusak commented Sep 9, 2016

Updated to leave squashfs compression to be project specific

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants