Commit e89ee2a
committed
Slightly improve performance of zend_copy_extra_args
This patch aims to improve the performance when a callback needs `zend_copy_extra_args`.
This turns out to be common with some array functions like array_walk and the array_find family of functions.
In these cases, the callback is often a short function and often only takes a single argument.
Therefore, `zend_copy_extra_args` takes measurable time in the profile.
Looking at VTune reveals that my system stalls on memory loads for op_array and the argument count.
By passing op_array as an argument, we eliminate the load for op_array and due to GCC's
inter-procedural analysis it also can use the already-loaded argument counts.
The following synthetic benchmark (courtesy of Tim) improves about 11% in run time performance:
```php
$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
$result += array_find($array, static function ($item) {
return $item === 5000;
});
}
var_dump($result);
```
Hyperfine stats (on an i7-1185G7) for this benchmark:
```
Benchmark 1: ./sapi/cli/php x.php
Time (mean ± σ): 528.5 ms ± 4.8 ms [User: 524.8 ms, System: 3.4 ms]
Range (min … max): 521.0 ms … 534.4 ms 10 runs
Benchmark 2: ./sapi/cli/php_old x.php
Time (mean ± σ): 586.2 ms ± 5.3 ms [User: 581.8 ms, System: 4.0 ms]
Range (min … max): 578.9 ms … 592.6 ms 10 runs
Summary
./sapi/cli/php x.php ran
1.11 ± 0.01 times faster than ./sapi/cli/php_old x.php
```
On an intel i7-4790 I get about a 5% +-1% performance improvement.
Ilija measured a improvement of around 7% +-2% on his intel i7-12800H.
For neither of _my_ systems I measured a noticeable difference in bench.php, micro_bench.php or Symfony demo.
This means we do not see a regression for these other benchmarks.
For reference, this is the resulting hyperfine benchmark on the i7-1185G7 for Symfony demo:
```
Benchmark 1: ../php-src/sapi/cli/php_old --repeat 50 public/index.php
Time (mean ± σ): 742.3 ms ± 4.5 ms [User: 600.8 ms, System: 139.5 ms]
Range (min … max): 736.7 ms … 749.8 ms 10 runs
Benchmark 2: ../php-src/sapi/cli/php --repeat 50 public/index.php
Time (mean ± σ): 738.5 ms ± 3.8 ms [User: 601.2 ms, System: 135.3 ms]
Range (min … max): 735.3 ms … 747.3 ms 10 runs
```
To further confirm no regressions take place, valgrind instruction count for 50 runs on Symfony demo:
Before patch: 4,452,217,516
After patch: 4,452,205,233
The difference is just due to noise.
---
Looking at the effect on the assembly of zend_init_func_execute_data.
We see on the regular path of execution one small change (besides instruction reordering),
resulting in an extra instruction.
This is around the code that compares the argument count with EX_NUM_ARGS().
Before patch:
```
je zend_init_func_execute_data+297
mov 0x68(%rbx),%rax
mov 0x2c(%r14),%r9d
movq $0x0,0x8(%r14)
mov %r13,0x10(%r14)
mov 0x4(%rbx),%edx
mov %rax,%r15
cmp %r9d,0x20(%rbx)
jb zend_init_func_execute_data+320
```
After patch:
```
je zend_init_func_execute_data+297
mov 0x68(%rbx),%rax
mov 0x20(%rbx),%esi
mov %r13,0x10(%r14)
mov 0x2c(%r14),%r9d
mov 0x4(%rbx),%edi
movq $0x0,0x8(%r14)
mov %rax,%r15
cmp %r9d,%esi
jb zend_init_func_execute_data+320
```
Where previously 0x20(%rbx) was compared directly with %r9d,
it is now stored in a register %esi so that it can be reused
without reloading in zend_copy_extra_args (caused by inter-procedural analysis).
Still, there's the same number of memory loads, just now via an extra move.
There is some changes to the code that calls zend_copy_extra_args,
where some memory loads happen prior to the call.
The memory loads at the start of zend_copy_extra_args have been eliminated however.1 parent 1ce79eb commit e89ee2a
1 file changed
+2
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4180 | 4180 | | |
4181 | 4181 | | |
4182 | 4182 | | |
4183 | | - | |
| 4183 | + | |
4184 | 4184 | | |
4185 | | - | |
4186 | 4185 | | |
4187 | 4186 | | |
4188 | 4187 | | |
| |||
4257 | 4256 | | |
4258 | 4257 | | |
4259 | 4258 | | |
4260 | | - | |
| 4259 | + | |
4261 | 4260 | | |
4262 | 4261 | | |
4263 | 4262 | | |
| |||
0 commit comments