Skip to content

Commit e89ee2a

Browse files
committed
Slightly improve performance of zend_copy_extra_args
This patch aims to improve the performance when a callback needs `zend_copy_extra_args`. This turns out to be common with some array functions like array_walk and the array_find family of functions. In these cases, the callback is often a short function and often only takes a single argument. Therefore, `zend_copy_extra_args` takes measurable time in the profile. Looking at VTune reveals that my system stalls on memory loads for op_array and the argument count. By passing op_array as an argument, we eliminate the load for op_array and due to GCC's inter-procedural analysis it also can use the already-loaded argument counts. The following synthetic benchmark (courtesy of Tim) improves about 11% in run time performance: ```php $array = range(1, 10000); $result = 0; for ($i = 0; $i < 5000; $i++) { $result += array_find($array, static function ($item) { return $item === 5000; }); } var_dump($result); ``` Hyperfine stats (on an i7-1185G7) for this benchmark: ``` Benchmark 1: ./sapi/cli/php x.php Time (mean ± σ): 528.5 ms ± 4.8 ms [User: 524.8 ms, System: 3.4 ms] Range (min … max): 521.0 ms … 534.4 ms 10 runs Benchmark 2: ./sapi/cli/php_old x.php Time (mean ± σ): 586.2 ms ± 5.3 ms [User: 581.8 ms, System: 4.0 ms] Range (min … max): 578.9 ms … 592.6 ms 10 runs Summary ./sapi/cli/php x.php ran 1.11 ± 0.01 times faster than ./sapi/cli/php_old x.php ``` On an intel i7-4790 I get about a 5% +-1% performance improvement. Ilija measured a improvement of around 7% +-2% on his intel i7-12800H. For neither of _my_ systems I measured a noticeable difference in bench.php, micro_bench.php or Symfony demo. This means we do not see a regression for these other benchmarks. For reference, this is the resulting hyperfine benchmark on the i7-1185G7 for Symfony demo: ``` Benchmark 1: ../php-src/sapi/cli/php_old --repeat 50 public/index.php Time (mean ± σ): 742.3 ms ± 4.5 ms [User: 600.8 ms, System: 139.5 ms] Range (min … max): 736.7 ms … 749.8 ms 10 runs Benchmark 2: ../php-src/sapi/cli/php --repeat 50 public/index.php Time (mean ± σ): 738.5 ms ± 3.8 ms [User: 601.2 ms, System: 135.3 ms] Range (min … max): 735.3 ms … 747.3 ms 10 runs ``` To further confirm no regressions take place, valgrind instruction count for 50 runs on Symfony demo: Before patch: 4,452,217,516 After patch: 4,452,205,233 The difference is just due to noise. --- Looking at the effect on the assembly of zend_init_func_execute_data. We see on the regular path of execution one small change (besides instruction reordering), resulting in an extra instruction. This is around the code that compares the argument count with EX_NUM_ARGS(). Before patch: ``` je zend_init_func_execute_data+297 mov 0x68(%rbx),%rax mov 0x2c(%r14),%r9d movq $0x0,0x8(%r14) mov %r13,0x10(%r14) mov 0x4(%rbx),%edx mov %rax,%r15 cmp %r9d,0x20(%rbx) jb zend_init_func_execute_data+320 ``` After patch: ``` je zend_init_func_execute_data+297 mov 0x68(%rbx),%rax mov 0x20(%rbx),%esi mov %r13,0x10(%r14) mov 0x2c(%r14),%r9d mov 0x4(%rbx),%edi movq $0x0,0x8(%r14) mov %rax,%r15 cmp %r9d,%esi jb zend_init_func_execute_data+320 ``` Where previously 0x20(%rbx) was compared directly with %r9d, it is now stored in a register %esi so that it can be reused without reloading in zend_copy_extra_args (caused by inter-procedural analysis). Still, there's the same number of memory loads, just now via an extra move. There is some changes to the code that calls zend_copy_extra_args, where some memory loads happen prior to the call. The memory loads at the start of zend_copy_extra_args have been eliminated however.
1 parent 1ce79eb commit e89ee2a

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

Zend/zend_execute.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4180,9 +4180,8 @@ ZEND_API ZEND_COLD void ZEND_FASTCALL zend_fcall_interrupt(zend_execute_data *ca
41804180
* on the zend_execute_data, and when the executor leaves the function, the
41814181
* args will be freed in zend_leave_helper.
41824182
*/
4183-
static zend_never_inline void zend_copy_extra_args(EXECUTE_DATA_D)
4183+
static zend_never_inline void zend_copy_extra_args(const zend_op_array *op_array EXECUTE_DATA_DC)
41844184
{
4185-
zend_op_array *op_array = &EX(func)->op_array;
41864185
uint32_t first_extra_arg = op_array->num_args;
41874186
uint32_t num_args = EX_NUM_ARGS();
41884187
zval *src;
@@ -4257,7 +4256,7 @@ static zend_always_inline void i_init_func_execute_data(zend_op_array *op_array,
42574256
num_args = EX_NUM_ARGS();
42584257
if (UNEXPECTED(num_args > first_extra_arg)) {
42594258
if (!may_be_trampoline || EXPECTED(!(op_array->fn_flags & ZEND_ACC_CALL_VIA_TRAMPOLINE))) {
4260-
zend_copy_extra_args(EXECUTE_DATA_C);
4259+
zend_copy_extra_args(op_array EXECUTE_DATA_CC);
42614260
}
42624261
} else if (EXPECTED((op_array->fn_flags & ZEND_ACC_HAS_TYPE_HINTS) == 0)) {
42634262
/* Skip useless ZEND_RECV and ZEND_RECV_INIT opcodes */

0 commit comments

Comments
 (0)