|
| 1 | +# Optional Chaining |
| 2 | + |
| 3 | +## Preamble |
| 4 | + |
| 5 | + Author: Breno G. de Oliveira <[email protected]> |
| 6 | + Sponsor: |
| 7 | + ID: |
| 8 | + Status: Draft |
| 9 | + |
| 10 | +## Abstract |
| 11 | + |
| 12 | +This RFC proposes a new operator, `?->`, to indicate optional dereference |
| 13 | +chains that short-circuit to an empty list when the left side is undefined. |
| 14 | + |
| 15 | +## Motivation |
| 16 | + |
| 17 | +Chained dereferencing of nested data structures and objects is quite common |
| 18 | +in Perl programs, and developers often find themselves needing to check |
| 19 | +whether the data is there in the first place before using the arrow notation, |
| 20 | +otherwise the call may trigger a runtime error or modify the original data |
| 21 | +structure due to unwanted autovivification. |
| 22 | + |
| 23 | +The current syntax for these verifications can be quite long, hard to write |
| 24 | +and read, and prone to human error - specially in chained methods, as you |
| 25 | +need to be careful not to call the same method twice. |
| 26 | + |
| 27 | +So the idea is to be able to replace this: |
| 28 | + |
| 29 | +```perl |
| 30 | + my $val; |
| 31 | + if ( defined $data |
| 32 | + && defined $data->{deeply} |
| 33 | + && defined $data->{deeply}{nested} |
| 34 | + && defined $data->{deeply}{nested}[0] |
| 35 | + && defined $data->{deeply}{nested}[0]{data} |
| 36 | + ) { |
| 37 | + $val = $data->{deeply}{nested}[0]{data}{value} |
| 38 | + } |
| 39 | +``` |
| 40 | + |
| 41 | +With this: |
| 42 | + |
| 43 | +```perl |
| 44 | + my $val = $data?->{deeply}?->{nested}?->[0]?->{data}?->{value}; |
| 45 | +``` |
| 46 | + |
| 47 | +And be able to replace this: |
| 48 | + |
| 49 | +```perl |
| 50 | + my $val; |
| 51 | + if (defined $obj) { |
| 52 | + my $tmp1 = $obj->this; |
| 53 | + if (defined $tmp1) { |
| 54 | + my $tmp2 = $tmp1->then; |
| 55 | + if (defined $tmp2) { |
| 56 | + $val = $tmp2->that; |
| 57 | + } |
| 58 | + } |
| 59 | + } |
| 60 | +``` |
| 61 | + |
| 62 | +With this: |
| 63 | + |
| 64 | +```perl |
| 65 | + my $val = $obj?->this?->then?->that; |
| 66 | +``` |
| 67 | + |
| 68 | +## Rationale |
| 69 | + |
| 70 | +An "optional chaining" (sometimes referred to as "null safe", "safe call", |
| 71 | +"safe navigation" or "optional path") operator would let developers access |
| 72 | +values located deep within a chain of connected references and fail |
| 73 | +gracefully without having to check that each of them is defined. |
| 74 | + |
| 75 | +This should result in shorter, simpler and more correct expressions whenever |
| 76 | +the path being accessed may be missing, such as when exploring objects and |
| 77 | +data structures without complete guarantees of which branches/methods are |
| 78 | +provided. |
| 79 | + |
| 80 | +Similar solutions have been thoroughly validated by many other popular |
| 81 | +programming languages like JavaScript, Kotlin, C#, Swift, TypeScript, Groovy, |
| 82 | +PHP, Raku, Ruby and Rust, in some cases for over 15 years now [1]. |
| 83 | + |
| 84 | +## Specification |
| 85 | + |
| 86 | +The `?->` operator would behave exactly like the current dereference arrow |
| 87 | +`->`, interacting with the exact same things and with the same precedence, |
| 88 | +being completely interchangeable with it. The only difference would be that, |
| 89 | +whenever the lefthand side of the operator is undefined, it short-circuits |
| 90 | +the whole expression to an empty list `()` (which becomes `undef` in scalar |
| 91 | +context). |
| 92 | + |
| 93 | +One could say that: |
| 94 | + |
| 95 | + EXPR1 ?-> EXPR2 |
| 96 | + |
| 97 | +is equivalent to: |
| 98 | + |
| 99 | + defined EXPR1 ? EXPR1->EXPR2 : () |
| 100 | + |
| 101 | +with the important caveat that EXPR1 is only evaluated once. |
| 102 | + |
| 103 | +## Backwards Compatibility |
| 104 | + |
| 105 | +All code with `?->` currently yields a compile time syntax error, so there |
| 106 | +are no expected conflicts with any other syntax in Perl 5. |
| 107 | + |
| 108 | +Notable exceptions are string interpolation (where `"$foo->bar"` already |
| 109 | +ignores the arrow and `"$foo?->{bar}"` resolves to showing the reference |
| 110 | +address followed by a literal `?->{bar}`) and regular expressions (where |
| 111 | +`?` acts as a special character). Optional chains should be disallowed in |
| 112 | +those scenarios (but see 'Open Issues' below). |
| 113 | + |
| 114 | +Static tooling may confuse the new operator with a syntax error until |
| 115 | +updated. |
| 116 | + |
| 117 | +Earlier versions of Perl could potentially emulate this feature with a CPAN |
| 118 | +module relying on something like XS::Parse::Infix (perl 5.14 onwards), |
| 119 | +though the implementation effort has not been explored. |
| 120 | + |
| 121 | +## Security Implications |
| 122 | + |
| 123 | +None foreseen. |
| 124 | + |
| 125 | +## Examples |
| 126 | + |
| 127 | +Below are a few use case examples and explored edge-cases, with their current |
| 128 | +Perl 5 equivalent in the comments. |
| 129 | + |
| 130 | +Expected common uses: |
| 131 | + |
| 132 | +```perl |
| 133 | + # $val = defined $foo && defined $foo->{bar} ? $foo->{bar}[3] : (); |
| 134 | + $val = $foo?->{bar}?->[3}; |
| 135 | + |
| 136 | + # $tmp = defined $obj ? $obj->m1 : (); $val = defined $tmp ? $tmp->m2 : (); |
| 137 | + $val = $obj?->m1?->m2; |
| 138 | + |
| 139 | + # $val = defined $obj ? $obj->$method : (); |
| 140 | + $val = $obj?->$method; |
| 141 | + |
| 142 | + # $ret = defined $coderef ? $coderef->(@args) : (); |
| 143 | + $ret = $coderef?->(@args); |
| 144 | + |
| 145 | + # $foo->{bar}{baz} = 42 if defined $foo && defined $foo->{bar}; |
| 146 | + $foo?->{bar}?->{baz} = 42; |
| 147 | + |
| 148 | + # %ret = defined $href ? $href->%* : (); |
| 149 | + %ret = $href?->%*; |
| 150 | + |
| 151 | + # $n = defined $aref ? $aref->$#* : (); |
| 152 | + $n = $aref?->$#*; |
| 153 | + |
| 154 | + # foreach my $val (defined $aref ? $aref->@* : ()) { ... } |
| 155 | + foreach my $val ($aref?->@*) { ... } |
| 156 | + |
| 157 | + # @vals = defined $aref ? $aref->@* : (); |
| 158 | + @vals = $aref?->@*; # note that @vals is (), not (undef). |
| 159 | + |
| 160 | + # @vals = defined $aref ? $aref->@[ 3...10 ] : (); |
| 161 | + @vals = $aref?->@[ 3..10 ] |
| 162 | + |
| 163 | + # @vals = map $_?->{foo}, grep defined $_, @aoh; |
| 164 | + @vals = map $_?->{foo}, @aoh; |
| 165 | + |
| 166 | + # \$foo->{bar} if defined $foo; |
| 167 | + \$foo?->{bar}; # as with regular arrow, becomes \($foo?->{bar}) |
| 168 | + |
| 169 | + # my $class = 'SomeClass'; $class->new if defined $class; |
| 170 | + my $class = 'SomeClass'; $class?->new; |
| 171 | + |
| 172 | + # my $obj = %SomeClass:: ? SomeClass->new : (); |
| 173 | + my $obj = SomeClass?->new; # TBD: see 'Future Scope' below. |
| 174 | + |
| 175 | + # my @objs = (%NotValid:: ? NotValid->new : (), %Valid:: ? Valid->new : ()); |
| 176 | + my @objs = ( NotValid?->new, Valid?->new ); # @objs == ( ValidObject ) |
| 177 | +``` |
| 178 | + |
| 179 | +Unusual and edge cases, for comprehension: |
| 180 | + |
| 181 | +```perl |
| 182 | + # $y = (); |
| 183 | + # if (defined $x) { |
| 184 | + # my $tmp = $i++; |
| 185 | + # if (defined $x->{$tmp}) { |
| 186 | + # $y = $x->{$tmp}->[++$i] |
| 187 | + # } |
| 188 | + # } |
| 189 | + $y = $x?->{$i++}?->[++$i]; |
| 190 | + |
| 191 | + # $tmp = ++$foo; $val = defined $tmp ? $tmp->{bar} : (); |
| 192 | + $val = ++$foo?->{bar}; # note that this statement makes no sense. |
| 193 | + |
| 194 | + # my $val = $scalar_ref->$* if defined $scalar_ref; |
| 195 | + my $val = $scalar_ref?->$*; |
| 196 | + |
| 197 | + # $ret = defined $coderef ? $coderef->&* : (); |
| 198 | + $ret = $coderef?->&*; |
| 199 | + |
| 200 | + # $glob = $globref->** if defined $globref; |
| 201 | + $glob = $globref?->**; |
| 202 | +``` |
| 203 | + |
| 204 | +## Prototype Implementation |
| 205 | + |
| 206 | +None. |
| 207 | + |
| 208 | +## Future Scope |
| 209 | + |
| 210 | +Because the idea is to be completely interchangeable with the arrow notation, |
| 211 | +it would be important to cover class methods, where an arrow is used but |
| 212 | +the check cannot be 'defined' because there is no concept of definedness |
| 213 | +on barewords. |
| 214 | + |
| 215 | +```perl |
| 216 | + my $obj = SomeModule?->new; |
| 217 | +``` |
| 218 | + |
| 219 | +The equivalence, in this case, would be: |
| 220 | + |
| 221 | +```perl |
| 222 | + my $obj = %SomeModule:: ? SomeModule->new : (); |
| 223 | +``` |
| 224 | + |
| 225 | +While this is the actual goal, a first version of the operator could ignore |
| 226 | +this and stick with named variables. |
| 227 | + |
| 228 | +## Rejected Ideas |
| 229 | + |
| 230 | +The idea of a similar operator has been going on and off the p5p list since |
| 231 | +at least 2010 [2] in various shapes and forms. While generally very well |
| 232 | +received, discussion quickly ended in either feature creep or bikeshedding |
| 233 | +over which symbols to use. Below, I will try to address each question that |
| 234 | +arose in the past, and the design decisions that led to this particular RFC. |
| 235 | + |
| 236 | +* Why not just wrap everything in an eval or try block? |
| 237 | + |
| 238 | +Besides being shorter, the optional chain will NOT silence actual errors |
| 239 | +coming from a method, or from a defined value that is not a ref. |
| 240 | + |
| 241 | +* Why is the token `?->` and not "X" |
| 242 | + |
| 243 | +There have been a lot of different proposals for this operator over the |
| 244 | +years, and even a community poll [3] to decide which one to use. While `~>` |
| 245 | +won, it can be hard to distingish from `->`. Also, `~` in perl is already |
| 246 | +associated with negation, regexes, and the infamous smartmatch operator. |
| 247 | +Likewise, the `&` character is used in many different contexts (bitwise, |
| 248 | +logic, prototype bypassing, subroutines) and adding another one seemed |
| 249 | +unnecessary. |
| 250 | + |
| 251 | +`?->` was the runner up in the poll, and its popularity is justified: it |
| 252 | +alludes to a ternary check and, even in regexes, to only proceeding if |
| 253 | +whatever came before is there. It also leaves the arrow intact, indicating |
| 254 | +it is checking something that comes _before_ the dereferencing takes place |
| 255 | +(unlike, for example, `->?` or `->>`). It is also the chosen notation for |
| 256 | +most languages that implement this feature, so why surprise developers with |
| 257 | +another way to express what they are already familiar with? |
| 258 | + |
| 259 | +Finally, `//->` was considered but `//` is defined-or, not defined-and as |
| 260 | +this new operator, so it could be even more confusing to developers. |
| 261 | + |
| 262 | +* Why add this just to arrow dereferencing and not to "X" |
| 263 | + |
| 264 | +Because the optional chain is trying to solve the specific (and real-world) |
| 265 | +issue of having to add a lot of repetitive and error-prone boilerplate tests |
| 266 | +on an entire chain individually, and nothing else. |
| 267 | + |
| 268 | +While we could (hypothetically) try to expand this notion to other operators |
| 269 | +(e.g. `?=`, `?=~`, etc), the benefit of doing so is unclear at this point, |
| 270 | +and would require a lot of effort picking appart which operators should |
| 271 | +and shouldn't include the "optional" variation [4]. |
| 272 | + |
| 273 | +* Why definedness and not truthfulness? Or 'ref'? Or 'isa'? |
| 274 | + |
| 275 | +Semantically, `undef` means there is nothing there. We still want the code |
| 276 | +to fail loudly if we are dereferencing anything defined, as it would indicate |
| 277 | +something wrong with the code, the underlying logic, or both. It is also |
| 278 | +important to note that Perl allows you to call methods on strings (then |
| 279 | +treated as package/class names), so we cannot reliably test for 'ref' |
| 280 | +without making things really convoluted with assumptions, rules and |
| 281 | +exceptions for each type of data. |
| 282 | + |
| 283 | +* Can we have implicit `?->` after its first use? Have it flip/flop against |
| 284 | +`->`? |
| 285 | + |
| 286 | +The idea is for `?->` to always be explicit, since it is testing the |
| 287 | +lefhand side of the expression. Making the operator change how explicit or |
| 288 | +implicit dereference works in the rest of the expression could lead to |
| 289 | +confusion and bugs. |
| 290 | + |
| 291 | +* Why not other identity values according to context? Like '' when |
| 292 | +concatenating strings, 0 on addition/subtraction and 1 on |
| 293 | +multiplication/division? |
| 294 | + |
| 295 | +While tempting, it would not only produce values that could be confused with |
| 296 | +an actual successful call, it would also mean we'd have to check all types |
| 297 | +of data and agree on their identity value (what is the identity of a sub? |
| 298 | +Or a glob?). |
| 299 | + |
| 300 | +Instead, one could just use the already provided `//` operator to achieve |
| 301 | +the same results, e.g.: `"value is: " . ($ref?->{val} // '')`. |
| 302 | + |
| 303 | +## Open Issues |
| 304 | + |
| 305 | +Optional chains under string interpolation and regular expressions could be |
| 306 | +enabled in the future, hidden behind a feature flag to prevent backwards |
| 307 | +compatibility issues, much like what was done with `postderef_qq`. |
| 308 | + |
| 309 | +## References: |
| 310 | + |
| 311 | +1. https://github.com/apache/groovy/commit/f223c9b3322fef890c6db261720f703394c7cf27 |
| 312 | +2. https://www.nntp.perl.org/group/perl.perl5.porters/2010/11/msg165931.html |
| 313 | +3. https://www.perlmonks.org/?node_id=973015 |
| 314 | +4. Python's proposal, and the ongoing discussion around it, are a noteworthy |
| 315 | + external example of the consequences of trying to add that logic to many |
| 316 | + different operators. https://peps.python.org/pep-0505 |
| 317 | + |
| 318 | +## Copyright |
| 319 | + |
| 320 | +Copyright (C) 2022, Breno G. de Oliveira. |
| 321 | + |
| 322 | +This document and code and documentation within it may be used, redistributed |
| 323 | +and/or modified under the same terms as Perl itself. |
0 commit comments