Skip to content

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Nov 30, 2024

Small idea I had. Instead of constructing arrays with dynamic elements from scratch, clone a template from a constant though ARRAY_DUP, and then populate the missing element slots through ARRAY_SET_PLACEHOLDER by a byte-offset into the array data stored in op2.num. This optimization is only applied when opcache is enabled, because in that case the array template literal is immutable. This allows for a fast clone using memcpy without walking the array for refcounting.

Benchmarks

All benchmarks are called with:

for ($i = 0; $i < 20_000_000; $i++) {
    test(4);
}
function test($a) {
    return [$a, $a];
}
// 1% slower
// This is the only slower case, we may improve heuristics to default back to `INIT_ARRAY` and `ADD_ARRAY_ELEMENT` in this case.
function test($a) {
    return [$a, $a, $a];
}
// 2% faster
function test($a) {
    return [1, 2, 3, $a];
}
// 33% faster
function test($a) {
    return ['foo' => $a, 'bar' => $a];
}
// 10% faster
function test($a) {
    return [100 => $a, $a, $a];
}
// 6% faster

Unsurprisingly, the biggest benefits can be achieved when most of the array elements are constant. Using constant keys in hashed arrays is also beneficial, since the keys are pre-filled. That said, I did not observe any significant improvements in real applications like Symfony Demo.

If we plan for merge this, type inference and SCCP should be improved. Currently, during compilation, placeholders are set to null. This makes it hard to understand whether these are real constants or placeholders that are later removed. We should use some other placeholder value that can be easily distinguished, either _IS_ERROR or some new one. It will also be possible to eliminate ARRAY_SET_PLACEHOLDER opcodes if op1 is discovered during SCCP.

  • Support partial arrays in SCCP
  • Implement proper type inference
  • Implement JIT support Not needed according to Dmitry

With opache, arrays are marked as immutable. Without opcache, nested arrays need
refcounting, and thus require a loop over its elements.
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a good idea.

callgrind shows ~0.5% improvement on wordpress, but there is no visible real-time difference. The complication and VM size increase is not significant.

ct_eval and sccp should be cleaned up.
JIT support is not required.

&& throw_op->result_type & (IS_TMP_VAR|IS_VAR)
&& throw_op->opcode != ZEND_ADD_ARRAY_ELEMENT
&& throw_op->opcode != ZEND_ADD_ARRAY_UNPACK
&& throw_op->opcode != ZEND_ARRAY_SET_PLACEHOLDER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name this ZEND_SET_ARRAY_ELEMET, but this is just a preference.

@staabm
Copy link
Contributor

staabm commented Dec 4, 2024

AFAIR Authoritative composer autoloading depends on a big compile time known static array, which then is used for locating the files.

this might be a use-case worth measuring, as its a vital part in every application (bootstrap overhead of a minimal app)

@iluuu1994
Copy link
Member Author

@staabm Comp-time arrays don't benefit, only array initializers with some dynamic values.

We can make some assumptions about the value of the fields.
@iluuu1994 iluuu1994 force-pushed the faster-array-initialization branch from 39143f8 to df615f2 Compare December 4, 2024 16:52
This performed better for Symfony Demo in my benchmarks, possibly due to better
instruction caching.
Instead of removing elements from a partial array, simply mark the offsets as
BOT. If the array is partial, it will never be propagated, only its offsets. And
offset fetching already checks for BOT.
@iluuu1994
Copy link
Member Author

I think I've squeezed all I can out of this, but still don't get a measurable improvement for real applications. Some other opinions would be nice, @bwoebi?

@bwoebi
Copy link
Member

bwoebi commented Dec 16, 2024

@iluuu1994 I consider the optimization valid, but the impact low. I mean, how often do you actually create a larger array with expressions? And how big are these?

From my experience mixed array initializers containing expressions tend to be mostly small arrays where the impact is low.

The other problem is, probably, that before this patch, the fastest way to write arrays where you have many constants and few expressions, is to initialize the array with the constant part, then assign a few values explicitly. So, probably, where this patch has a measurable impact, users probably have rewritten their code in a way that is currently fastest, but won't be able to harness the optimizations introduced by this patch.

Also, copying arData is redundant if it's packed and nothing pre-initialized. Like in the common [$a, $b] case. (I just had a look at a random codebase of mine. 60% of all (non-empty) array creations were two element packed arrays.)
I am suspecting (but don't know for sure), that going to L3/main memory to copy an arData of no usable values is going to be expensive. Probably also for just a single non-refcounted value in a packed array.
I think the 1% slower is actually going to be more % slower in real code if it cannot read arData from L1. Possibly also the 3-element case (at least not be actually faster). And have most impact, negating many of the gains from larger arrays.

Does it change the real world benchmarks at all if you introduce a dedicated opcode (ZEND_INITIALIZE_SMALL_PACKED_ARRAY, op1, op2) for just that case of 1 or 2 packed values? (and probably also use ZEND_INITIALIZE_SMALL_PACKED_ARRAY + ADD_ARRAY_ELEMENT for 3 packed values)

@iluuu1994
Copy link
Member Author

@bwoebi Thanks for your evaluation. It's particularly well suited for things like:

return [
    'username' => $username,
    'password' => $password,
    'date_of_birth' => $dateOfBirth,
];

But yeah, I think the improvement is too small for the added complexity.

Does it change the real world benchmarks at all if you introduce a dedicated opcode (ZEND_INITIALIZE_SMALL_PACKED_ARRAY, op1, op2) for just that case of 1 or 2 packed values?

One packed value likely won't make a big difference, as this is already possible in one opcode (ZEND_INIT_ARRAY accepts the first key/value). We could see if array pairs, which are common, can benefit.

@iluuu1994
Copy link
Member Author

I have abandoned this idea for now. We might still consider an opcode for creating array pairs.

@iluuu1994 iluuu1994 closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants