|
| 1 | +--- |
| 2 | +layout: sip |
| 3 | +permalink: /sips/:title.html |
| 4 | +stage: pre-sip |
| 5 | +status: waiting-for-implementation |
| 6 | +title: SIP-NN - Bind variables within alternative patterns |
| 7 | +--- |
| 8 | + |
| 9 | +**By: Yilin Wei** |
| 10 | + |
| 11 | +## History |
| 12 | + |
| 13 | +| Date | Version | |
| 14 | +|---------------|--------------------| |
| 15 | +| Sep 17th 2023 | Initial Draft | |
| 16 | + |
| 17 | +## Summary |
| 18 | + |
| 19 | +Pattern matching is one of the most commonly used features in Scala by beginners and experts alike. Most of |
| 20 | +the features of pattern matching compose beautifully — for example, a user who learns about bind variables |
| 21 | +and guard patterns can mix the two features intuitively. |
| 22 | + |
| 23 | +One of the few outstanding cases where this is untrue, is when mixing bind variables and alternative patterns. The part of |
| 24 | +current [specification](https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html) which we are concerned with is under section **8.1.12** and is copied below, with the relevant clause |
| 25 | +highlighted. |
| 26 | + |
| 27 | +> … All alternative patterns are type checked with the expected type of the pattern. **They may not bind variables other than wildcards**. The alternative … |
| 28 | +
|
| 29 | +We propose that this restriction be lifted and this corner case be eliminated. |
| 30 | + |
| 31 | +Removing the corner case would make the language easier to teach, reduce friction and allow users to express intent in a more natural manner. |
| 32 | + |
| 33 | +## Motivation |
| 34 | + |
| 35 | +## Scenario |
| 36 | + |
| 37 | +The following scenario is shamelessly stolen from [PEP 636](https://peps.python.org/pep-0636), which introduces pattern matching to the |
| 38 | +Python language. |
| 39 | + |
| 40 | +Suppose a user is writing classic text adventure game such as [Zork](https://en.wikipedia.org/wiki/Zork). For readers unfamiliar with |
| 41 | +text adventure games, the player typically enters freeform text into the terminal in the form of commands to interact with the game |
| 42 | +world. Examples of commands might be `"pick up rabbit"` or `"open door"`. |
| 43 | + |
| 44 | +Typically, the commands are tokenized and parsed. After a parsing stage we may end up with a encoding which is similar to the following: |
| 45 | + |
| 46 | +```scala |
| 47 | +enum Word |
| 48 | + case Get, North, Go, Pick, Up |
| 49 | + case Item(name: String) |
| 50 | + |
| 51 | + case class Command(words: List[Word]) |
| 52 | +``` |
| 53 | + |
| 54 | +In this encoding, the string `pick up jar`, would be parsed as `Command(List(Pick, Up, Item("jar")))`. |
| 55 | + |
| 56 | +Once the command is parsed, we want to actually *do* something with the command. With this particular encoding, |
| 57 | +we would naturally reach for a pattern match — in the simplest case, we could get away with a single recursive function for |
| 58 | +our whole program. |
| 59 | + |
| 60 | +Suppose we take the simplest example where we want to match on a command like `"north"`. The pattern match consists of |
| 61 | +matching on a single stable identifier, `North` and the code would look like this: |
| 62 | + |
| 63 | +~~~ scala |
| 64 | +import Command.* |
| 65 | + |
| 66 | +def loop(cmd: Command): Unit = |
| 67 | + cmd match |
| 68 | + case Command(North :: Nil) => // Code for going north |
| 69 | +~~~ |
| 70 | + |
| 71 | +However as we begin play-testing the actual text adventure, we observe that users type `"go north"`. We decide |
| 72 | +our program should treat the two distinct commands as synonyms. At this point we would reach for an alternative pattern `|` and |
| 73 | +refactor the code like so: |
| 74 | + |
| 75 | +~~~ scala |
| 76 | + case Command(North :: Nil | Go :: North :: Nil) => // Code for going north |
| 77 | +~~~ |
| 78 | + |
| 79 | +This clearly expresses our intent that the two commands map to the same underlying logic. |
| 80 | + |
| 81 | +Later we decide that we want more complex logic in our game; perhaps allowing the user to pick up |
| 82 | +items with a command like `pick up jar`. We would then extend our function with another case, binding the variable `name`: |
| 83 | + |
| 84 | +~~~ scala |
| 85 | + case Command(Pick :: Up :: Item(name) :: Nil) => // Code for picking up items |
| 86 | +~~~ |
| 87 | + |
| 88 | +Again, we might realise through our play-testing that users type `get` as a synonym for `pick up`. After playing around |
| 89 | +with alternative patterns, we may reasonably write something like: |
| 90 | + |
| 91 | +~~~ scala |
| 92 | + case Command(Pick :: Up :: Item(name) :: Nil | Get :: Item(name) :: Nil) => // Code for picking up items |
| 93 | +~~~ |
| 94 | + |
| 95 | +Unfortunately at this point, we are stopped in our tracks by the compiler. The bind variable for `name` cannot be used in conjunction with alternative patterns. |
| 96 | +We must either choose a different encoding. We carefully consult the specification and that this is not possible. |
| 97 | + |
| 98 | +We can, of course, work around it by hoisting the logic to a helper function to the nearest scope which function definitions: |
| 99 | + |
| 100 | +~~~ scala |
| 101 | +def loop(cmd: Cmd): Unit = |
| 102 | + def pickUp(item: String): Unit = // Code for picking up item |
| 103 | + cmd match |
| 104 | + case Command(Pick :: Up :: Item(name)) => pickUp(name) |
| 105 | + case Command(Get :: Item(name)) => pickUp(name) |
| 106 | +~~~ |
| 107 | + |
| 108 | +Or any number of different encodings. However, all of them are less intuitive and less obvious than the code we tried to write. |
| 109 | + |
| 110 | +## Commentary |
| 111 | + |
| 112 | +Removing the restriction leads to more obvious encodings in the case of alternative patterns. Arguably, the language |
| 113 | +would be simpler and easier to teach — we do not have to remember that bind patterns and alternatives |
| 114 | +do not mix and need to teach newcomers the workarounds. |
| 115 | + |
| 116 | +For languages which have pattern matching, a significant number also support the same feature. Languages such as [Rust](https://github.com/rust-lang/reference/pull/957) and [Python](https://peps.python.org/pep-0636/#or-patterns) have |
| 117 | +supported it for some time. While |
| 118 | +this is not a great reason for Scala to do the same, having the feature exist in other languages means that users |
| 119 | +that are more likely to expect the feature. |
| 120 | + |
| 121 | +A smaller benefit for existing users, is that removing the corner case leads to code which is |
| 122 | +easier to review; the absolute code difference between adding a bind variable within an alternative versus switching to a different |
| 123 | +encoding entirely is smaller and conveys the intent of such changesets better. |
| 124 | + |
| 125 | +It is acknowledged, however, that such cases where we share the same logic with an alternative branches are relatively rare compared to |
| 126 | +the usage of pattern matching in general. The current restrictions are not too arduous to workaround for experienced practitioners, which |
| 127 | +can be inferred from the relatively low number of comments from the original [issue](https://github.com/scala/bug/issues/182) first raised in 2007. |
| 128 | + |
| 129 | +To summarize, the main arguments for the proposal are to make the language more consistent, simpler and easier to teach. The arguments |
| 130 | +against a change are that it will be low impact for the majority of existing users. |
| 131 | + |
| 132 | +## Proposed solution |
| 133 | + |
| 134 | +Removing the alternative restriction means that we need to specify some additional constraints. Intuitively, we |
| 135 | +need to consider the restrictions on variable bindings within each alternative branch, as well as the types inferred |
| 136 | +for each binding within the scope of the pattern. |
| 137 | + |
| 138 | +## Bindings |
| 139 | + |
| 140 | +The simplest case of mixing an alternative pattern and bind variables, is where we have two `UnApply` methods, with |
| 141 | +a single alternative pattern. For now, we specifically only consider the case where each bind variable is of the same |
| 142 | +type, like so: |
| 143 | + |
| 144 | +~~~ scala |
| 145 | +enum Foo: |
| 146 | + case Bar(x: Int) |
| 147 | + case Baz(y: Int) |
| 148 | + |
| 149 | + def fun = this match |
| 150 | + case Bar(z) | Baz(z) => ... // z: Int |
| 151 | +~~~ |
| 152 | + |
| 153 | +For the expression to make sense with the current semantics around pattern matches, `z` must be defined in both branches; otherwise the |
| 154 | +case body would be nonsensical if `z` was referenced within it. |
| 155 | + |
| 156 | +Removing the restriction would also allow recursive alternative patterns: |
| 157 | + |
| 158 | +~~~ scala |
| 159 | +enum Foo: |
| 160 | + case Bar(x: Int) |
| 161 | + case Baz(x: Int) |
| 162 | + |
| 163 | +enum Qux: |
| 164 | + case Quux(y: Int) |
| 165 | + case Corge(x: Foo) |
| 166 | + |
| 167 | + def fun = this match |
| 168 | + case Quux(z) | Corge(Bar(z) | Baz(z)) => ... // z: Int |
| 169 | +~~~ |
| 170 | + |
| 171 | +Using an `Ident` within an `UnApply` is not the only way to introduce a binding within the pattern scope. |
| 172 | +We also expect to be able to use an explicit binding using an `@` like this: |
| 173 | + |
| 174 | +~~~ scala |
| 175 | +enum Foo: |
| 176 | + case Bar() |
| 177 | + case Baz(bar: Bar) |
| 178 | + |
| 179 | + def fun = this match |
| 180 | + case Baz(x) | x @ Bar() => ... // x: Foo.Bar |
| 181 | +~~~ |
| 182 | + |
| 183 | +## Types |
| 184 | + |
| 185 | +We propose that the type of each variable introduced in the scope of the pattern be the least upper-bound of the type |
| 186 | +inferred within within each branch. |
| 187 | + |
| 188 | +~~~ scala |
| 189 | +enum Foo: |
| 190 | + case Bar(x: Int) |
| 191 | + case Baz(y: String) |
| 192 | + |
| 193 | + def fun = this match |
| 194 | + case Bar(x) | Baz(x) => // x: Int | String |
| 195 | +~~~ |
| 196 | + |
| 197 | +We do not expect any inference to happen between branches. For example, in the case of a GADT we would expect the second branch of |
| 198 | +the following case to match all instances of `Bar`, regardless of the type of `A`. |
| 199 | + |
| 200 | +~~~ scala |
| 201 | +enum Foo[A]: |
| 202 | + case Bar(a: A) |
| 203 | + case Baz(i: Int) extends Foo[Int] |
| 204 | + |
| 205 | + def fun = this match |
| 206 | + case Baz(x) | Bar(x) => // x: Int | A |
| 207 | +~~~ |
| 208 | + |
| 209 | +## Specification |
| 210 | + |
| 211 | +We do not believe there are any syntax changes since the current specification already allows the proposed syntax. |
| 212 | + |
| 213 | +We propose that the following clauses be added to the specification: |
| 214 | + |
| 215 | +Let $`p_1 | \ldots | p_n`$ be an alternative pattern at an arbitrary depth within a case pattern |
| 216 | +and $`\Gamma_n`$ is the scope associated with each alternative. |
| 217 | + |
| 218 | +Let the variables introduced within each alternative, $`p_n`$, be $`x_i \in \Gamma_n`$. |
| 219 | + |
| 220 | +Each $`p_n`$ must introduce the same set of bindings, i.e. for each $`n`$, $`\Gamma_n`$ must have the same members |
| 221 | +$`\Gamma_{n+1}`$. |
| 222 | + |
| 223 | +If $`X_{n,i}`$, is the type of the binding $`x_i`$ within an alternative $`p_n`$, then the consequent type, $`X_i`$, of the |
| 224 | +variable $`x_i`$ within the pattern scope, $`\Gamma`$ is the least upper-bound of all the types $`X_{n, i}`$ associated with |
| 225 | +the variable, $`x_i`$ within each branch. |
| 226 | + |
| 227 | +## Compatibility |
| 228 | + |
| 229 | +We believe the changes are backwards compatible. |
| 230 | + |
| 231 | +# Related Work |
| 232 | + |
| 233 | +The language feature exists in multiple languages. Of the more popular languages, Rust added the feature in [2021](https://github.com/rust-lang/reference/pull/957) and |
| 234 | +Python within [PEP 636](https://peps.python.org/pep-0636/#or-patterns), the pattern matching PEP in 2020. Of course, Python is untyped and Rust does not have sub-typing |
| 235 | +but the semantics proposed are similar to this proposal. |
| 236 | + |
| 237 | +Within Scala, the [issue](https://github.com/scala/bug/issues/182) first raised in 2007. The author is also aware of attempts to fix this issue by [Lionel Parreaux](https://github.com/dotty-staging/dotty/compare/main...LPTK:dotty:vars-in-pat-alts) which |
| 238 | +were not submitted to the main dotty repository. |
| 239 | + |
| 240 | +## Implementation |
| 241 | + |
| 242 | +The author has a current in-progress implementation focused on the typer which compiles the examples with the expected types. Interested |
| 243 | + parties are welcome to see the WIP [here](https://github.com/lampepfl/dotty/compare/main...yilinwei:dotty:main). |
| 244 | + |
| 245 | +## Acknowledgements |
| 246 | + |
| 247 | +Many thanks to **Zainab Ali** for proof-reading the draft, **Nicolas Stucki** and **Guillaume Martres** for their pointers on the dotty |
| 248 | +compiler codebase. |
0 commit comments