Skip to content

Commit 8888174

Browse files
committed
initial draft of sip.
1 parent 0a7f132 commit 8888174

File tree

1 file changed

+248
-0
lines changed

1 file changed

+248
-0
lines changed

content/alternative-bind-variables.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
layout: sip
3+
permalink: /sips/:title.html
4+
stage: pre-sip
5+
status: waiting-for-implementation
6+
title: SIP-NN - Bind variables within alternative patterns
7+
---
8+
9+
**By: Yilin Wei**
10+
11+
## History
12+
13+
| Date | Version |
14+
|---------------|--------------------|
15+
| Sep 17th 2023 | Initial Draft |
16+
17+
## Summary
18+
19+
Pattern matching is one of the most commonly used features in Scala by beginners and experts alike. Most of
20+
the features of pattern matching compose beautifully — for example, a user who learns about bind variables
21+
and guard patterns can mix the two features intuitively.
22+
23+
One of the few outstanding cases where this is untrue, is when mixing bind variables and alternative patterns. The part of
24+
current [specification](https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html) which we are concerned with is under section **8.1.12** and is copied below, with the relevant clause
25+
highlighted.
26+
27+
> … All alternative patterns are type checked with the expected type of the pattern. **They may not bind variables other than wildcards**. The alternative …
28+
29+
We propose that this restriction be lifted and this corner case be eliminated.
30+
31+
Removing the corner case would make the language easier to teach, reduce friction and allow users to express intent in a more natural manner.
32+
33+
## Motivation
34+
35+
## Scenario
36+
37+
The following scenario is shamelessly stolen from [PEP 636](https://peps.python.org/pep-0636), which introduces pattern matching to the
38+
Python language.
39+
40+
Suppose a user is writing classic text adventure game such as [Zork](https://en.wikipedia.org/wiki/Zork). For readers unfamiliar with
41+
text adventure games, the player typically enters freeform text into the terminal in the form of commands to interact with the game
42+
world. Examples of commands might be `"pick up rabbit"` or `"open door"`.
43+
44+
Typically, the commands are tokenized and parsed. After a parsing stage we may end up with a encoding which is similar to the following:
45+
46+
```scala
47+
enum Word
48+
case Get, North, Go, Pick, Up
49+
case Item(name: String)
50+
51+
case class Command(words: List[Word])
52+
```
53+
54+
In this encoding, the string `pick up jar`, would be parsed as `Command(List(Pick, Up, Item("jar")))`.
55+
56+
Once the command is parsed, we want to actually *do* something with the command. With this particular encoding,
57+
we would naturally reach for a pattern match — in the simplest case, we could get away with a single recursive function for
58+
our whole program.
59+
60+
Suppose we take the simplest example where we want to match on a command like `"north"`. The pattern match consists of
61+
matching on a single stable identifier, `North` and the code would look like this:
62+
63+
~~~ scala
64+
import Command.*
65+
66+
def loop(cmd: Command): Unit =
67+
cmd match
68+
case Command(North :: Nil) => // Code for going north
69+
~~~
70+
71+
However as we begin play-testing the actual text adventure, we observe that users type `"go north"`. We decide
72+
our program should treat the two distinct commands as synonyms. At this point we would reach for an alternative pattern `|` and
73+
refactor the code like so:
74+
75+
~~~ scala
76+
case Command(North :: Nil | Go :: North :: Nil) => // Code for going north
77+
~~~
78+
79+
This clearly expresses our intent that the two commands map to the same underlying logic.
80+
81+
Later we decide that we want more complex logic in our game; perhaps allowing the user to pick up
82+
items with a command like `pick up jar`. We would then extend our function with another case, binding the variable `name`:
83+
84+
~~~ scala
85+
case Command(Pick :: Up :: Item(name) :: Nil) => // Code for picking up items
86+
~~~
87+
88+
Again, we might realise through our play-testing that users type `get` as a synonym for `pick up`. After playing around
89+
with alternative patterns, we may reasonably write something like:
90+
91+
~~~ scala
92+
case Command(Pick :: Up :: Item(name) :: Nil | Get :: Item(name) :: Nil) => // Code for picking up items
93+
~~~
94+
95+
Unfortunately at this point, we are stopped in our tracks by the compiler. The bind variable for `name` cannot be used in conjunction with alternative patterns.
96+
We must either choose a different encoding. We carefully consult the specification and that this is not possible.
97+
98+
We can, of course, work around it by hoisting the logic to a helper function to the nearest scope which function definitions:
99+
100+
~~~ scala
101+
def loop(cmd: Cmd): Unit =
102+
def pickUp(item: String): Unit = // Code for picking up item
103+
cmd match
104+
case Command(Pick :: Up :: Item(name)) => pickUp(name)
105+
case Command(Get :: Item(name)) => pickUp(name)
106+
~~~
107+
108+
Or any number of different encodings. However, all of them are less intuitive and less obvious than the code we tried to write.
109+
110+
## Commentary
111+
112+
Removing the restriction leads to more obvious encodings in the case of alternative patterns. Arguably, the language
113+
would be simpler and easier to teach — we do not have to remember that bind patterns and alternatives
114+
do not mix and need to teach newcomers the workarounds.
115+
116+
For languages which have pattern matching, a significant number also support the same feature. Languages such as [Rust](https://github.com/rust-lang/reference/pull/957) and [Python](https://peps.python.org/pep-0636/#or-patterns) have
117+
supported it for some time. While
118+
this is not a great reason for Scala to do the same, having the feature exist in other languages means that users
119+
that are more likely to expect the feature.
120+
121+
A smaller benefit for existing users, is that removing the corner case leads to code which is
122+
easier to review; the absolute code difference between adding a bind variable within an alternative versus switching to a different
123+
encoding entirely is smaller and conveys the intent of such changesets better.
124+
125+
It is acknowledged, however, that such cases where we share the same logic with an alternative branches are relatively rare compared to
126+
the usage of pattern matching in general. The current restrictions are not too arduous to workaround for experienced practitioners, which
127+
can be inferred from the relatively low number of comments from the original [issue](https://github.com/scala/bug/issues/182) first raised in 2007.
128+
129+
To summarize, the main arguments for the proposal are to make the language more consistent, simpler and easier to teach. The arguments
130+
against a change are that it will be low impact for the majority of existing users.
131+
132+
## Proposed solution
133+
134+
Removing the alternative restriction means that we need to specify some additional constraints. Intuitively, we
135+
need to consider the restrictions on variable bindings within each alternative branch, as well as the types inferred
136+
for each binding within the scope of the pattern.
137+
138+
## Bindings
139+
140+
The simplest case of mixing an alternative pattern and bind variables, is where we have two `UnApply` methods, with
141+
a single alternative pattern. For now, we specifically only consider the case where each bind variable is of the same
142+
type, like so:
143+
144+
~~~ scala
145+
enum Foo:
146+
case Bar(x: Int)
147+
case Baz(y: Int)
148+
149+
def fun = this match
150+
case Bar(z) | Baz(z) => ... // z: Int
151+
~~~
152+
153+
For the expression to make sense with the current semantics around pattern matches, `z` must be defined in both branches; otherwise the
154+
case body would be nonsensical if `z` was referenced within it.
155+
156+
Removing the restriction would also allow recursive alternative patterns:
157+
158+
~~~ scala
159+
enum Foo:
160+
case Bar(x: Int)
161+
case Baz(x: Int)
162+
163+
enum Qux:
164+
case Quux(y: Int)
165+
case Corge(x: Foo)
166+
167+
def fun = this match
168+
case Quux(z) | Corge(Bar(z) | Baz(z)) => ... // z: Int
169+
~~~
170+
171+
Using an `Ident` within an `UnApply` is not the only way to introduce a binding within the pattern scope.
172+
We also expect to be able to use an explicit binding using an `@` like this:
173+
174+
~~~ scala
175+
enum Foo:
176+
case Bar()
177+
case Baz(bar: Bar)
178+
179+
def fun = this match
180+
case Baz(x) | x @ Bar() => ... // x: Foo.Bar
181+
~~~
182+
183+
## Types
184+
185+
We propose that the type of each variable introduced in the scope of the pattern be the least upper-bound of the type
186+
inferred within within each branch.
187+
188+
~~~ scala
189+
enum Foo:
190+
case Bar(x: Int)
191+
case Baz(y: String)
192+
193+
def fun = this match
194+
case Bar(x) | Baz(x) => // x: Int | String
195+
~~~
196+
197+
We do not expect any inference to happen between branches. For example, in the case of a GADT we would expect the second branch of
198+
the following case to match all instances of `Bar`, regardless of the type of `A`.
199+
200+
~~~ scala
201+
enum Foo[A]:
202+
case Bar(a: A)
203+
case Baz(i: Int) extends Foo[Int]
204+
205+
def fun = this match
206+
case Baz(x) | Bar(x) => // x: Int | A
207+
~~~
208+
209+
## Specification
210+
211+
We do not believe there are any syntax changes since the current specification already allows the proposed syntax.
212+
213+
We propose that the following clauses be added to the specification:
214+
215+
Let $`p_1 | \ldots | p_n`$ be an alternative pattern at an arbitrary depth within a case pattern
216+
and $`\Gamma_n`$ is the scope associated with each alternative.
217+
218+
Let the variables introduced within each alternative, $`p_n`$, be $`x_i \in \Gamma_n`$.
219+
220+
Each $`p_n`$ must introduce the same set of bindings, i.e. for each $`n`$, $`\Gamma_n`$ must have the same members
221+
$`\Gamma_{n+1}`$.
222+
223+
If $`X_{n,i}`$, is the type of the binding $`x_i`$ within an alternative $`p_n`$, then the consequent type, $`X_i`$, of the
224+
variable $`x_i`$ within the pattern scope, $`\Gamma`$ is the least upper-bound of all the types $`X_{n, i}`$ associated with
225+
the variable, $`x_i`$ within each branch.
226+
227+
## Compatibility
228+
229+
We believe the changes are backwards compatible.
230+
231+
# Related Work
232+
233+
The language feature exists in multiple languages. Of the more popular languages, Rust added the feature in [2021](https://github.com/rust-lang/reference/pull/957) and
234+
Python within [PEP 636](https://peps.python.org/pep-0636/#or-patterns), the pattern matching PEP in 2020. Of course, Python is untyped and Rust does not have sub-typing
235+
but the semantics proposed are similar to this proposal.
236+
237+
Within Scala, the [issue](https://github.com/scala/bug/issues/182) first raised in 2007. The author is also aware of attempts to fix this issue by [Lionel Parreaux](https://github.com/dotty-staging/dotty/compare/main...LPTK:dotty:vars-in-pat-alts) which
238+
were not submitted to the main dotty repository.
239+
240+
## Implementation
241+
242+
The author has a current in-progress implementation focused on the typer which compiles the examples with the expected types. Interested
243+
parties are welcome to see the WIP [here](https://github.com/lampepfl/dotty/compare/main...yilinwei:dotty:main).
244+
245+
## Acknowledgements
246+
247+
Many thanks to **Zainab Ali** for proof-reading the draft, **Nicolas Stucki** and **Guillaume Martres** for their pointers on the dotty
248+
compiler codebase.

0 commit comments

Comments
 (0)