-
Notifications
You must be signed in to change notification settings - Fork 8k
[RFC] Pattern matching #20574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[RFC] Pattern matching #20574
Conversation
Topic of discussion for the community.
Allow propagating inferred true/false, which doesn't require a concrete value.
Zend/zend_compile.c
Outdated
| zend_stack_push(&context->bindings, &binding); | ||
|
|
||
| // FIXME: This can be elided most of the time. It may not for: | ||
| // - $a is $b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think I noted in the discussion, I believe $a is $b with a bare variable on the right should just be a compile-time error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your reservation here is the confusion of $a is $b as $a === $b, but this seems like an arbitrary restriction. For match specifically, it seems somewhat reasonable. For example:
match ($userOrError) {
is User & $user => login($user),
is $error => showError($error),
};The rename serves a purpose in the readability sense. Only preventing standalone bindings in is but not match is possible, but seems even more arbitrary.
I don't have too strong feelings on this, so if you can convince @Crell I'm happy to make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your reservation here is the confusion of $a is $b as $a === $b,
Yes. Especially since Python's is for example is a comparison operator.
but this seems like an arbitrary restriction
I feel like this is something that would be pointed out by static analysis tools / IDEs anyways as a possible source of bug, because there is rarely (if ever) a reason to write this. To me it would therefore be helpful to take this footgun away from folks within the language itself so that they don't need to rely on third party tools. I'd be okay with allowing it as $a is ($b) which is similar to how the extra parentheses in:
if (($user = $repository->find($id))) {
}
generally suppress the warnings from IDEs that ask about possible = and == confusion.
Only preventing standalone bindings in
isbut notmatchis possible, but seems even more arbitrary.
Even in match it is misleading, because it effectively is a default that doesn't look like one. is mixed&$error or possibly is default&$error would both be clearer to me.
But I'm happy to figure this out as part of the proper ML discussion rather than inside this PR where no one will find it in the future.
It's unclear as of yet whether eliding mixed is useful, especially given the optimizer already supports it. For objects, mixed is useless. For arrays, we need some comparison to check whether the offset is IS_UNDEF, so it can't be fully elided anyway. Hence, forget about this for now.
| bool(true) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(true) | ||
| bool(true) | ||
| bool(true) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(false) | ||
| bool(true) | ||
| bool(true) | ||
| bool(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help the readability of the test if this would be structured into “these should match” and “these shouldn't”, possibly with a clear marker in-between. The one doesn't need to cross-reference every single line from the expectation and the test code. Or alternatively clearly separated for each of the variables “$foo up next”, “$bar up next”, “null up next”.
https://wiki.php.net/rfc/pattern-matching
Known issues / TODOs:
FETCH_DIM_ISto avoid erroring for accessign non-existent indexes. Because undefined indexes evalute tonull,['a' => 42] is ['b' => null]will evaluate totrueas a side-effect. This may be solved by returningIS_UNDEFfromFETCH_DIM_IS. This may break some expectations, as we currently don't storeIS_UNDEFinVARs. This approach was suggested by @bwoebi.42 is floatshould returntrue. This does not currently work. It's unclear what the behavior for ints outside the range of precisely representable floats should be.$foo is [$bar, $bar].Further technical considerations:
COPY_TMPops. These could be avoided by creating specialized ops that don't free op1. This is the case for:QM_ASSIGNfor bindings. This one may be avoidable by pulling bindings to the tail.TYPE_CHECKfor array patterns. Alternatively we could use the newHAS_TYPEopcode.COUNTfor array patterns.FETCH_DIM_ISfor array patterns. Effectively, we want anISvariant ofFETCH_LIST_*.INSTANCEOFfor object patterns. Alternatively we could use the newHAS_TYPEopcode.FETCH_OBJ_Rfor object patterns.mixedcould be a no-op. This has some some implications on branching, so it might be enough to let the optimizer elide it (which already happens).HAS_TYPEopcode allows performing arbitrary type checks. However, pattern matching doesn't currently combine these types discovered at compile time. I.e.int|floatshould be a simple type mask, but currently compiles to twoHAS_TYPEchecks.JMP[N]Z_EXto immediately redeclareresultwhich it consumes. This is mostly unnecessary, asbools aren't consumed at all. The other side-effect is thatJMP[N]Z_EXisn't a smart branch. This is relevant because smart branches skip assigning toresult, which is necessary here. We may be able to useJMP[N]Zby explicitly removing the smart branch flags, and making sure the optimizer doesn't re-add them.zend_pm_kill_last_op_if_jmp(). Instead, make sure the final jump is not emitted in the first place. This is somewhat annoying since it needs handling in all patterns, sometimes in multiple places (zend_pm_compile_container()).FETCH_DIM_R) may mutate CVs as a side-effect (e.g. ingetthrough references, or globals). IMO, guarding against this is not worth it, as that would require an initial copy and subsequentCOPY_TMPs. It's just worth pointing out.