Experiment: Estimate opcode sizes #21

arnaud-lb · 2025-08-28T09:20:07Z

Premise: reduce the size of VM opcode encoding would make the VM faster.

The goal of this branch is to estimate by how much opcode size shrinks with various strategies. It's not meant to be merged, does not actually implement the strategies.

Most of the changes were made to find accurately which operands are used by each opcode handler.

Strategy 1: Slim ops

Remove op types, opcode, lineno
Use two zend_op sizes: One with 16bit operands (slim), one with 32bit operands (wide). The smaller size is used when all operands fit.
Naturally 8 bytes aligned

Strategy 2: Variable arity ops

Each opcode has its own opline format
Use 3 zend_op sizes: 8bit, 16bit, 32bit. Use the smaller size that can fit all operands.
handler field is replaced by handler_id (16 bits)
op1, op2, result, extended_data are added only if any handler of the opcode uses them. Use the same size for all operands.
OP_DATA oplines are removed. OP_DATA.op1 is moved to the opline that uses it.
op types are added if any handler of the opcode uses them. 4 bits per type.
2-bytes aligned (for handler_id)
Variant: 4-bytes aligned (for 32 bit operands)
Dispatch would need to fetch the handler from a lookup table (handlers[handler_id])
JIT entry needs to be handled explicitly in the handler of a few opcodes, as we can't update the handler address anymore. Maybe we can reserve an operand to store the JIT function in these opcodes.
OBSERVER handlers are not specialized and would force related opcodes to encode operand types. For now I assume that these handlers will fetch original op, so we can ignore these when finding which fields/types are used and an opcode.

Example for an opcode needing op1, op2, op1_type, result:

struct {
   uint16_t handler_id;
   znode_op8 op1;
   znode_op8 op2;
   znode_op8 result;
   uint8_t op1_type:4;
} // Total: 6-8 bytes depending on alignment

zend_vm_gen.php could generate structs like that for for every opcode and each size, so handlers can access fields as usual as long as the names match, and fields are properly aligned.

We could also generate decoder functions for each handler and size (decode(void *opaque_opline, zend_op *op1, zend_op *op2, ...)) so that slow paths outside of handlers can decode compressed oplines. Types can be inferred from the handler (minus the OBSERVER case).

Results

Results on Symfony Demo (numbers are the total size, in bytes, of compiled opcodes) :

cat /tmp/raw.log | awk '{ orig += $4; slim += $6; variable += $9; variable_align += $13 } END { printf "\torig\tslim\tvariable (align 2)\tvariable (align 4)\n"; printf "size (bytes)\t%d\t%d (%d%%)\t%d (%d%%)\t%d (%d%%)\n", orig, slim, (slim-orig)/orig*100, variable, (variable-orig)/orig*100, variable_align, (variable_align-orig)/orig*100 }' | column -s $'\t' -t
 
              orig     slim            variable (align 2)  variable (align 4)
size (bytes)  4271712  2164184 (-49%)  877664 (-79%)       1005980 (-76%)

When not forcing OBSERVER handlers to fetch the original op:

              orig     slim            variable (align 2)  variable (align 4)
size (bytes)  4271712  2164184 (-49%)  994896 (-76%)       1117764 (-73%)

Notes:

Some operands have negative values so they fit only in 32 bits when interpreted as unsigned. Most notably IS_CONST operands, but also backward jumps and maybe others. This causes 32bit operands to be used more often than necessary. I've added a special case of IS_CONST.
IS_UNUSED operands often contain uninitialized or truly unused data that could be ignored, but some opcodes actually use IS_UNUSED operands. We should introduce a new operand type or special-case these opcodes. This also causes 32bit operands to be used more often than necessary.

Experiment: Estimate opcode sizes

94fb542

arnaud-lb force-pushed the opcode-size branch from 5a8cdb2 to 94fb542 Compare August 28, 2025 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment: Estimate opcode sizes #21

Experiment: Estimate opcode sizes #21

Uh oh!

arnaud-lb commented Aug 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Experiment: Estimate opcode sizes #21

Are you sure you want to change the base?

Experiment: Estimate opcode sizes #21

Uh oh!

Conversation

arnaud-lb commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Strategy 1: Slim ops

Strategy 2: Variable arity ops

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arnaud-lb commented Aug 28, 2025 •

edited

Loading