Skip to content

proj_dis_perf_improvements_1

Tsukasa OI edited this page Nov 17, 2022 · 33 revisions

Project: Disassembler Performance Improvements (Optimization 1)

Benchmarking System

This benchmark is performed on:

  • Ubuntu 22.04 LTS
  • AMD Ryzen 5 PRO 5650G processor.

In the parallel run, I ran 6 parallel jobs with -j6 (corresponding 6 cores; although the processor has 12 hardware threads, -j12 just slowed the benchmark).

Aggregate Performance Improvements

Type Expected Improvements (in general)
Binary Files 50-130%
RISC-V ELF Programs 30-40%
RISC-V ELF Libraries 30-500%

This is relative to the latest master (commit cf76db71dd68) and taken on 2022-11-17.

Columns

objdump -d (ELF)

Program Prerequisites HTable/Caching Mapping Syms Notes
Busybox 1.35.1 (RV64GC) 5.0-5.2% 35.7-39.3% 37.3-38.6%
OpenSBI 1.1 (generic fw_*.elf) 8.3-9.2% 65.2-65.8% 69.0-70.3%
Linux kernel 5.19 (vmlinux) 15.6-22.1% 47.7-53.6% 63.2-81.2%
Linux kernel 5.19 (vmlinux.o) 59.3-65.7% 69.4-77.5% 235.9-452.3% Not finally linked
glibc (libc.so.6) 6.2-6.9% 33.8-38.4% 37.3-42.1%

objdump -d (ELF-based archive)

Program Prerequisites HTable/Caching Mapping Syms
glibc (libc.a) 96.0-103.1% 107.9-114.3% 1115.3-1180.4%
newlib (libc.a) 58.0-65.6% 73.5-83.0% 274.9-306.5%

objdump -D (binary)

Program Prerequisites HTable/Caching Mapping Syms
Linux kernel 5.19 (vmlinux) (-2.5)-17.9% 95.9-135.6% 95.0-135.9%
Random files (/dev/urandom) (-4.6)-6.1% 115.5-140.4% 119.2-140.7%
1M (1048576) CSR instructions 21.4% 685.6% 698.2%

Despite that the CSR optimization is in the "Prerequisites", its effect is significant in "HTable/Caching" or later. Hash table optimization adds great synergy to the CSR optimization.

gdb: disas of near all code region

Program Prerequisites HTable/Caching Mapping Syms
Linux kernel 5.19 (vmlinux) with debug info 32.4% 36.5% 36.3%
Linux kernel 5.19 (vmlinux) without debug info 88.4% 101.1% 102.3%
OpenSBI 1.1 (generic fw_*.elf) 62.5-62.9% 77.3-77.7% 76.9-77.5%
1M (1048576) CSR instructions (ELF) 62.0% 305.1% 304.4%

Batch: objdump -d on Linux distribution

Serial Run: All ELF Files Under the Directory

System Path N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/bin 563 4.7% 30.5% 30.3%
Debian unstable
(as of 2022-07-20)
/usr/bin 269 5.3% 30.0% 30.5%
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/lib 6797 51.0% 70.7% 185.7%
Debian unstable
(as of 2022-07-20)
/usr/lib 548 89.2% 110.5% 450.2%

Parallel Run: All (including data-only ELFs)

System N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
7666 38.8% 55.7% 127.6%
Debian unstable
(as of 2022-07-20)
946 123.3% 130.6% 394.3%

Batch: objdump -D (as binary) on Linux distribution

Serial Run: All ELF Files Under the Directory

System Path N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
/usr/bin 563 6.1% 77.8% 79.3%
Debian unstable
(as of 2022-07-20)
/usr/bin 269 5.3% 65.0% 65.7%

Parallel Run: All (including data-only ELFs)

System N Prerequisites HTable/Caching Mapping Syms
Ubuntu 22.04 LTS
(image for HiFive Unmatched)
7666 5.7% 81.4% 81.6%
Debian unstable
(as of 2022-07-20)
946 5.5% 66.1% 66.0%

objdump -d (ELF): Extreme Examples

Program N Prerequisites HTable/Caching Mapping Syms (in other words)
OpenSSL 2 126.6-135.2% 134.8-149.3% 2728.1-2821.6% x28.281-29.216
LLVM 1 141.7% 142.1% 23211.5% x233.115
  • OpenSSL
    1. Ubuntu 22.04 LTS (image for HiFive Unmatched) : /usr/lib/riscv64-linux-gnu/libcrypto.so.3
    2. Debian unstable (as of 2022-07-20) : /usr/lib/riscv64-linux-gnu/libcrypto.so.3
  • LLVM
    1. Ubuntu 22.04 LTS (Package libllvm14 Version 1:14.0.0-1ubuntu1) : /usr/lib/riscv64-linux-gnu/libLLVM-14.so.1

For large library with many symbols, the effect of the mapping symbol optimization is huge. I did expect some improvements but not that huge.

This optimization also benefits Arm architecture (not AArch64, due to different mapping symbol handlings).

Mapping Symbol Related

Program w/ISA Prerequisites HTable/Caching Mapping Syms
Busybox 1.35.1 (RV64GC) with debug info N 5.6% 36.1% 37.8%
Busybox 1.35.1 (RV64GC) with debug info Y 162.6% 238.3% 244.2%
Linux kernel 5.19 (vmlinux) without debug info N 16.0% 48.0% 63.9%
Linux kernel 5.19 (vmlinux) without debug info Y 248.5% 326.1% 452.6%
Linux kernel 5.19 (vmlinux.o) without debug info N 58.8% 77.7% 236.5%
Linux kernel 5.19 (vmlinux.o) without debug info Y 177.9% 214.6% 436.3%

Mapping symbols with ISA benefits from new optimizations.

Clone this wiki locally