Skip to content

Commit 0de9494

Browse files
shqkingdstogov
authored andcommitted
Initial support of JIT/arm64
SUMMARY We implemented a prototype of PHP JIT/arm64. Briefly speaking, 1. build system Changes to the build system are made so that PHP JIT can be successfully built and run on ARM-based machine. Major change lies in file zend_jit_arm64.dasc, where the handler for each opcode is generated into machine code. Note that this file is just copied from zend_jit_x86.dasc and the *unimplemented* parts are substitued with 'brk' instruction for future work. 2. registers AArch64 registers are defined in file zend_jit_arm64.h. From our perspectives, the register usage is quite different from the x86 implementation due to the different ABI, number of registers and addressing modes. We had many confusions on this part, and will discuss it in details in the final section. 3. opcodes Several opcodes are partially supported, including INIT_FCALL, DO_UCALL, DO_ICALL, RETURN, ADD, PRE_INC, JMP, QM_ASSIGN, etc. Hence, simple use scenarios such as user function call, loops, addition with integer and floating point numbers can be supported. 18 micro test cases are added under 'ext/opcache/tests/jit/arm64/'. Note that majority of these test cases are design for functional JIT, and cases 'hot_func_*.phpt' and 'loop_002.phpt' can trigger tracing JIT. 4. test Our local test environment is an ARM-based server with Ubuntu 20.04 and GCC-10. Note that both HYBRID and CALL VM modes are supported. We suggest running the JIT test cases using the following command. Out of all 130 test cases, 66 cases can be passed currently. ``` $ make test TESTS='-d opcache.jit=1203 ext/opcache/tests/jit/' ``` DETAILS 1. I-cache flush Instruction cache must be flushed for the JIT-ed code on AArch64. See macro JIT_CACHE_FLUSH in file 'zend_jit_internal.h'. 2. Disassembler Add initialization and jump target parse operations for AArch64 backed. See the updates in file 'zend_jit_disasm.c'. 3. redzone Enable redzone for AArch64. See the update in zend_vm_opcodes.h. Redzone is designated to prevent 'vm_stack_data' from being optimized out by compilers. It's worth noting that this 16-byte redzone might be reused as temporary use(treated as extra stack space) for HYBRID mode. 4. stack space reservation The definitions of HYBRID_SPAD, SPAD and NR_SPAD are a bit tricky for x86/64. In AArch64, HYBRID_SPAD and SPAD are both defined as 16. These 16 bytes are pre-allocated for tempoerary usage along the exuection of JIT-ed code. Take line 4185 in file zend_jit_arm64.dasc as an example. NR_SPAD is defined as 48, out of which 32 bytes to save FP/IP/LR registers. Note that we choose to always reserve HYBRID_SPAD bytes in HYBRID mode, no matter whether redzone is used or not, for the sake of safety. 5. stack alignment In AArch64 the stack pointer should be 16-byte aligned. Since shadow stack is used for JIT, it's easy to guarantee the stack alignment, via simply moving SP with an offset like 16 or a multiple of 16. That's why NR_SPAD is defined as 48 and we use 32 of them to save FP/IP/LR registers which only occupies 24 bytes. 6. global registers x27 and x28 are reserved as global registers. See the updates in file zend_jit_vm_helpers.c 7. function prologue for CALL mode Two callee-saved registers x27 and x28 should saved in function zend_jit_prologue() in file zend_jit_arm64.dasc. Besides the LR, i.e. x30, should also be saved since runtime C helper functions(such as zend_jit_find_func_helper) might be invoked along the execution of JIT-ed code. 8. regset Minor changes are done to regset operations particularly for AArch64. See the updates in file zend_jit_internal.h. REGISTER USAGE In this section, we will first talk about our understanding on register usage and then demonstrate our design. 1. Register usage for HYBRID/CALL modes Registers are used similarly between HYBRID mode and CALL mode. One difference is how FP and IP are saved. In HYBRID mode, they are assigned to global registers, while in CALL mode they are saved/restored on the VM stack explicitly in prologue/epilogue. The other difference is that LR register should also be saved/restored in CALL mode since JIT-ed code are invoked as normal functions. 2. Register usage for functional/tracing JIT The way registers are used differs a lot between functional JIT and tracing JIT. For functional JIT, runtime C code (e.g. helper functions) would be invoked along the execution of JIT-ed code. As the operands for *most* opcodes are accessed via the stack slot, i.e. FP + offset. Hence there is no need to save/restore local(caller-saved) registers before/after invoking runtime C code. Exception lies in Phi node and registers might be allocated for these nodes. Currently I don't fully understand the reason, why registers are allocated for Phi functions, because I suppose for different versions of SSA variables at the Phi function, their postions on the stack slot should be identical(in other words, access via the stack slot is enough and there is no need to allocate registers). For tracing JIT, runtime information are recorded for traces(before the JIT compilation), and the data types and control flows are concrete as well. Hence it's would be faster to conduct operations and computations via registers rather than stack slots(as functional JIT does) for these collected hot paths. Besides, runtime C code can be invoked for tracing JIT, however this only happends for deoptimization and all registers are saved to stack in advance. 3. Candidates for register allocator 1) opcode candidates Function zend_jit_opline_supports_reg() determines the candidate opcodes which can use CPU registers. 2) register candidates Registers in set "ZEND_REGSET_FP + ZEND_REGSET_GP - ZEND_REGSET_FIXED - ZEND_REGSET_PRESERVED" are available for register allocator. Note that registers from ZEND_REGSET_FIXED are reserved for special purpose, such as the stack pointer, and they are excluded from register allocation process. Note that registers from ZEND_REGSET_PRESERVED are callee-saved based on the ABI and it's safe to not use them either. 4. Temporary registers Temporary registers are needed by some opcodes to save intermediate computation results. 1) Functions zend_jit_get_def_scratch_regset() and zend_jit_get_scratch_regset() return which registers might be clobbered by some opcodes. Hence register allocator would spill these scratch registers if necessary when encountering these opcodes. 2) Macro ZEND_REGSET_LOW_PRIORITY denotes a set of registers which would be allocated with low priority, and these registers can be used as temporary usage to avoid conflicts to its best. 5. Compared to the x86 implementation, in JIT/arm64 1) Called-saved FP registers are included into ZEND_REGSET_PRESERVED for AArch64. 2) We follow the logic of function zend_jit_opline_supports_reg(). 3) We reserve 4 GPRs and 2 FPRs out from register allocator and use them as temporary registers in particular. Note that these 6 registers are included in set ZEND_REGSET_FIXED. Since they are reserved, may-clobbered registers can be removed for most opcodes except for function calls. Besides, low-priority registers are defined as empty since all candidate registers are of the same priority. See the updates in function zend_jit_get_scratch_regset() and macro ZEND_REGSET_LOW_PRIORITY. 6. Why we reserve registers for temporary usage? 1) Addressing mode in AArch64 needs more temporary registers. The addressing mode is different from x86 and tempory registers might be *always* needed for most opcodes. For instance, an immediate must be first moved into one register before storing into memory in AArch64, whereas in x86 this immediate can be stored directly. 2) There are more registers in AArch64. Compared to the solution in JIT/x86(that is, temporary registers are reserved on demand, i.e. different registers for different opcodes under different conditions), our solution seems a coarse-granularity and brute-force solution, and the execution performance might be downgraded to some extent since the number of candidate registers used for allocation becomes less. We suppose the performance loss might be acceptable since there are more registers in AArch64. 3) Based on my understanding, scratch registers defined in x86 are excluded from candidates for register allocator with *low possibility*, and it can still allocate these registers. Special handling should be conducted, such as checking 'reg != ZREG_R0'. Hence, as we see it, it's simpler to reserve some temporary registers exclusively. See the updates in function zend_jit_math_long_long() for instance. TMP1 can be used directly without checking. Co-Developed-by: Nick Gasson <[email protected]>
1 parent c939bd2 commit 0de9494

32 files changed

+6704
-7
lines changed

Zend/zend_vm_opcodes.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@
3535
#endif
3636

3737
#if (ZEND_VM_KIND == ZEND_VM_KIND_HYBRID) && !defined(__SANITIZE_ADDRESS__)
38-
# if ((defined(i386) && !defined(__PIC__)) || defined(__x86_64__) || defined(_M_X64))
38+
# if ((defined(i386) && !defined(__PIC__)) || defined(__x86_64__) || \
39+
defined(_M_X64) || defined(__aarch64__))
3940
# define ZEND_VM_HYBRID_JIT_RED_ZONE_SIZE 16
4041
# endif
4142
#endif

build/Makefile.global

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,8 @@ distclean: clean
125125
rm -f scripts/man1/phpize.1 scripts/php-config scripts/man1/php-config.1 sapi/cli/php.1 sapi/cgi/php-cgi.1 sapi/phpdbg/phpdbg.1 ext/phar/phar.1 ext/phar/phar.phar.1
126126
rm -f sapi/fpm/php-fpm.conf sapi/fpm/init.d.php-fpm sapi/fpm/php-fpm.service sapi/fpm/php-fpm.8 sapi/fpm/status.html
127127
rm -f ext/phar/phar.phar ext/phar/phar.php
128+
rm -f ext/opcache/jit/zend_jit_x86.c
129+
rm -f ext/opcache/jit/zend_jit_arm64.c
128130
if test "$(srcdir)" != "$(builddir)"; then \
129131
rm -f ext/phar/phar/phar.inc; \
130132
fi

ext/opcache/config.m4

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ if test "$PHP_OPCACHE" != "no"; then
2929

3030
if test "$PHP_OPCACHE_JIT" = "yes"; then
3131
case $host_cpu in
32-
i[[34567]]86*|x86*)
32+
i[[34567]]86*|x86*|aarch64)
3333
;;
3434
*)
3535
AC_MSG_WARN([JIT not supported by host architecture])
@@ -77,6 +77,7 @@ if test "$PHP_OPCACHE" != "no"; then
7777
fi
7878

7979
PHP_SUBST(DASM_FLAGS)
80+
PHP_SUBST(DASM_ARCH)
8081

8182
AC_MSG_CHECKING(for opagent in default path)
8283
for i in /usr/local /usr; do

ext/opcache/config.w32

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ if (PHP_OPCACHE != "no") {
2525
dasm_flags += " -D ZTS=1";
2626
}
2727
DEFINE("DASM_FLAGS", dasm_flags);
28+
DEFINE("DASM_ARCH", "x86");
2829

2930
AC_DEFINE('HAVE_JIT', 1, 'Define to enable JIT');
3031
/* XXX read this dynamically */

ext/opcache/jit/Makefile.frag

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
$(builddir)/minilua: $(srcdir)/jit/dynasm/minilua.c
33
$(BUILD_CC) $(srcdir)/jit/dynasm/minilua.c -lm -o $@
44

5-
$(builddir)/jit/zend_jit_x86.c: $(srcdir)/jit/zend_jit_x86.dasc $(srcdir)/jit/dynasm/*.lua $(builddir)/minilua
6-
$(builddir)/minilua $(srcdir)/jit/dynasm/dynasm.lua $(DASM_FLAGS) -o $@ $(srcdir)/jit/zend_jit_x86.dasc
5+
$(builddir)/jit/zend_jit_$(DASM_ARCH).c: $(srcdir)/jit/zend_jit_$(DASM_ARCH).dasc $(srcdir)/jit/dynasm/*.lua $(builddir)/minilua
6+
$(builddir)/minilua $(srcdir)/jit/dynasm/dynasm.lua $(DASM_FLAGS) -o $@ $(srcdir)/jit/zend_jit_$(DASM_ARCH).dasc
77

88
$(builddir)/jit/zend_jit.lo: \
9-
$(builddir)/jit/zend_jit_x86.c \
9+
$(builddir)/jit/zend_jit_$(DASM_ARCH).c \
1010
$(srcdir)/jit/zend_jit_helpers.c \
1111
$(srcdir)/jit/zend_jit_disasm.c \
1212
$(srcdir)/jit/zend_jit_gdb.c \

ext/opcache/jit/zend_jit.c

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,14 @@
3939
#include "Optimizer/zend_call_graph.h"
4040
#include "Optimizer/zend_dump.h"
4141

42+
#if defined(__x86_64__) || defined(i386)
4243
#include "jit/zend_jit_x86.h"
44+
#elif defined (__aarch64__)
45+
#include "jit/zend_jit_arm64.h"
46+
#else
47+
#error "JIT not supported on this platform"
48+
#endif
49+
4350
#include "jit/zend_jit_internal.h"
4451

4552
#ifdef ZTS
@@ -204,7 +211,12 @@ static bool zend_long_is_power_of_two(zend_long x)
204211
#define OP2_RANGE() OP_RANGE(ssa_op, op2)
205212
#define OP1_DATA_RANGE() OP_RANGE(ssa_op + 1, op1)
206213

214+
#if defined(__x86_64__) || defined(i386)
207215
#include "dynasm/dasm_x86.h"
216+
#elif defined(__aarch64__)
217+
#include "dynasm/dasm_arm64.h"
218+
#endif
219+
208220
#include "jit/zend_jit_helpers.c"
209221
#include "jit/zend_jit_disasm.c"
210222
#ifndef _WIN32
@@ -216,7 +228,11 @@ static bool zend_long_is_power_of_two(zend_long x)
216228
#endif
217229
#include "jit/zend_jit_vtune.c"
218230

231+
#if defined(__x86_64__) || defined(i386)
219232
#include "jit/zend_jit_x86.c"
233+
#elif defined(__aarch64__)
234+
#include "jit/zend_jit_arm64.c"
235+
#endif
220236

221237
#if _WIN32
222238
# include <Windows.h>
@@ -298,15 +314,32 @@ static void handle_dasm_error(int ret) {
298314
case DASM_S_RANGE_PC:
299315
fprintf(stderr, "DASM_S_RANGE_PC %d\n", ret & 0xffffffu);
300316
break;
317+
#ifdef DASM_S_RANGE_VREG
301318
case DASM_S_RANGE_VREG:
302319
fprintf(stderr, "DASM_S_RANGE_VREG\n");
303320
break;
321+
#endif
322+
#ifdef DASM_S_UNDEF_L
304323
case DASM_S_UNDEF_L:
305324
fprintf(stderr, "DASM_S_UNDEF_L\n");
306325
break;
326+
#endif
327+
#ifdef DASM_S_UNDEF_LG
328+
case DASM_S_UNDEF_LG:
329+
fprintf(stderr, "DASM_S_UNDEF_LG\n");
330+
break;
331+
#endif
332+
#ifdef DASM_S_RANGE_REL
333+
case DASM_S_RANGE_REL:
334+
fprintf(stderr, "DASM_S_RANGE_REL\n");
335+
break;
336+
#endif
307337
case DASM_S_UNDEF_PC:
308338
fprintf(stderr, "DASM_S_UNDEF_PC\n");
309339
break;
340+
default:
341+
fprintf(stderr, "DASM_S_%0x\n", ret & 0xff000000u);
342+
break;
310343
}
311344
ZEND_UNREACHABLE();
312345
}
@@ -391,6 +424,9 @@ static void *dasm_link_and_encode(dasm_State **dasm_state,
391424
entry = *dasm_ptr;
392425
*dasm_ptr = (void*)((char*)*dasm_ptr + ZEND_MM_ALIGNED_SIZE_EX(size, DASM_ALIGNMENT));
393426

427+
/* flush the hardware I-cache */
428+
JIT_CACHE_FLUSH(entry, entry + size);
429+
394430
if (trace_num) {
395431
zend_jit_trace_add_code(entry, size);
396432
}

0 commit comments

Comments
 (0)