Skip to content

Conversation

@iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Nov 24, 2025

https://wiki.php.net/rfc/pattern-matching

Known issues / TODOs:

  • Array elements are fetched with FETCH_DIM_IS to avoid erroring for accessign non-existent indexes. Because undefined indexes evalute to null, ['a' => 42] is ['b' => null] will evaluate to true as a side-effect. This may be solved by returning IS_UNDEF from FETCH_DIM_IS. This may break some expectations, as we currently don't store IS_UNDEF in VARs. This approach was suggested by @bwoebi.
  • The RFC specifies 42 is float should return true. This does not currently work. It's unclear what the behavior for ints outside the range of precisely representable floats should be.
  • Prevent $foo is [$bar, $bar].

Further technical considerations:

  • The implementation is currently emitting various COPY_TMP ops. These could be avoided by creating specialized ops that don't free op1. This is the case for:
    • QM_ASSIGN for bindings. This one may be avoidable by pulling bindings to the tail.
    • TYPE_CHECK for array patterns. Alternatively we could use the new HAS_TYPE opcode.
    • COUNT for array patterns.
    • FETCH_DIM_IS for array patterns. Effectively, we want an IS variant of FETCH_LIST_*.
    • INSTANCEOF for object patterns. Alternatively we could use the new HAS_TYPE opcode.
    • FETCH_OBJ_R for object patterns.
  • mixed could be a no-op. This has some some implications on branching, so it might be enough to let the optimizer elide it (which already happens).
  • The new HAS_TYPE opcode allows performing arbitrary type checks. However, pattern matching doesn't currently combine these types discovered at compile time. I.e. int|float should be a simple type mask, but currently compiles to two HAS_TYPE checks.
  • The implementaiton currently uses JMP[N]Z_EX to immediately redeclare result which it consumes. This is mostly unnecessary, as bools aren't consumed at all. The other side-effect is that JMP[N]Z_EX isn't a smart branch. This is relevant because smart branches skip assigning to result, which is necessary here. We may be able to use JMP[N]Z by explicitly removing the smart branch flags, and making sure the optimizer doesn't re-add them.
  • Avoid zend_pm_kill_last_op_if_jmp(). Instead, make sure the final jump is not emitted in the first place. This is somewhat annoying since it needs handling in all patterns, sometimes in multiple places (zend_pm_compile_container()).
  • CVs are currently used directly. However, some operations (currently only FETCH_DIM_R) may mutate CVs as a side-effect (e.g. in get through references, or globals). IMO, guarding against this is not worth it, as that would require an initial copy and subsequent COPY_TMPs. It's just worth pointing out.

Topic of discussion for the community.
Allow propagating inferred true/false, which doesn't require a concrete value.
zend_stack_push(&context->bindings, &binding);

// FIXME: This can be elided most of the time. It may not for:
// - $a is $b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think I noted in the discussion, I believe $a is $b with a bare variable on the right should just be a compile-time error.

Copy link
Member Author

@iluuu1994 iluuu1994 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reservation here is the confusion of $a is $b as $a === $b, but this seems like an arbitrary restriction. For match specifically, it seems somewhat reasonable. For example:

match ($userOrError) {
    is User & $user => login($user),
    is $error => showError($error),
};

The rename serves a purpose in the readability sense. Only preventing standalone bindings in is but not match is possible, but seems even more arbitrary.

I don't have too strong feelings on this, so if you can convince @Crell I'm happy to make the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reservation here is the confusion of $a is $b as $a === $b,

Yes. Especially since Python's is for example is a comparison operator.

but this seems like an arbitrary restriction

I feel like this is something that would be pointed out by static analysis tools / IDEs anyways as a possible source of bug, because there is rarely (if ever) a reason to write this. To me it would therefore be helpful to take this footgun away from folks within the language itself so that they don't need to rely on third party tools. I'd be okay with allowing it as $a is ($b) which is similar to how the extra parentheses in:

if (($user = $repository->find($id))) {

}

generally suppress the warnings from IDEs that ask about possible = and == confusion.

Only preventing standalone bindings in is but not match is possible, but seems even more arbitrary.

Even in match it is misleading, because it effectively is a default that doesn't look like one. is mixed&$error or possibly is default&$error would both be clearer to me.


But I'm happy to figure this out as part of the proper ML discussion rather than inside this PR where no one will find it in the future.

It's unclear as of yet whether eliding mixed is useful, especially given the
optimizer already supports it. For objects, mixed is useless. For arrays, we
need some comparison to check whether the offset is IS_UNDEF, so it can't be
fully elided anyway.

Hence, forget about this for now.
Comment on lines +83 to +100
bool(true)
bool(false)
bool(false)
bool(false)
bool(false)
bool(true)
bool(true)
bool(true)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(true)
bool(true)
bool(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help the readability of the test if this would be structured into “these should match” and “these shouldn't”, possibly with a clear marker in-between. The one doesn't need to cross-reference every single line from the expectation and the test code. Or alternatively clearly separated for each of the variables “$foo up next”, “$bar up next”, “null up next”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants