Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
375 changes: 375 additions & 0 deletions content/better-fors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,375 @@
---
layout: sip
permalink: /sips/:title.html
stage: design
status: submitted
title: SIP-62 - For comprehension improvements
---

**By: Kacper Korban (VirtusLab)**

## History

| Date | Version |
|---------------|--------------------|
| June 6th 2023 | Initial Draft |
| Feb 15th 2024 | Reviewed Version |

## Summary

`for`-comprehensions in Scala 3 improved their usability in comparison to Scala 2, but there are still some pain points relating both usability of `for`-comprehensions and simplicity of their desugaring.

This SIP tries to address some of those problems, by changing the specification of `for`-comprehensions. From user perspective, the biggest change is allowing aliases at the start of the `for`-comprehensions. e.g.

```
for {
x = 1
y <- Some(2)
} yield x + y
```

## Motivation

There are some clear pain points related to Scala'3 `for`-comprehensions and those can be divided into two categories:

1. User-facing and code simplicity problems

Specifically, for the following example written in a Haskell-style do-comprehension

```haskell
do
a = largeExpr(arg)
b <- doSth(a)
combineM(a, b)
```
in Scala we would have to write

```scala
val a = largeExpr(b)
for
b <- doSth(a)
x <- combineM(a, b)
yield x
```

Copy link
Contributor

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere, we discussed how making the final map optional would be a breaking change. But what if we went the haskell route, and only removed the final map if the final yield is also removed?

    val a = largeExpr(b)
    for
      b <- doSth(a)
      combineM(a, b)
  1. This is currently invalid syntax, so no backwards compat concerns
  2. We get to remove the useless map from the generated code
  3. We also get to remove the useless yield from the source code!
  4. The correspondence between the map and yield is direct: yield means map, no-yield means no-map. No "magic" map-removal depending on the shape of the final yield expression
  5. The final source code is shorter and more concise
  6. The final source code looks more like other languages with similar syntax, e.g. Haskell's do or F#'s return! in computation expressions

Seems like a win-win-win-win-win-win overall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting possibility. Parsing will be a problem since we don't know anymore whether the next token sequence is a pattern or an expression. I am sure it can be overcome, but don't know how much complexity it would add to the parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this change, but I didn't manage to get it to work in the parser. (I tried going in the reflect-like direction to get better syntax for this case)
But if we were to be able to distinguish a pattern from an expression, then yet another problem is automatically solved -- we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One workaround from a pre-SIP is to have a yield <- monadicExpr syntax.

Copy link
Contributor

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield <- looks super ugly syntactically IMO, but semantically it's exactly the same as yield from in python and return! in F#. Maybe we can tweak the syntax to make it prettier, but the idea of using a keyword to work around parser ambiguity seems reasonable if a 'direct' syntwx proves difficult to implement

we can get rid of the unnecessary _ <- with monadic 'unit' expressions.

One issue here is we have two things we may want to simplify

  • _ <- too
  • _ = foo

For non-trailing raw expressions in a for block, we can only assign one meaning, and the other meaning will still have to be explicitly spelled out as above. Maybe that's ok, but it's something worth considering

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be controversial to reuse return as in

for x <- List(1,2,3) return List(x, x)

Copy link

@bjornregnell bjornregnell Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be against reviving return for several reasons, including: 1) yet another syntax to learn besides do and yield, but worse: 2) my experience is that many learners have all sorts of connotations on the return keyword with all sorts of misconceptions about how it works in Scala so I think it is best to say "don't use return" and for some it takes a while not to end everything with return if they are used to that...

This complicates the code, even in this simple example.
2. The simplicity of desugared code

The second pain point is that the desugared code of `for`-comprehensions can often be surprisingly complicated.

e.g.
```scala
for
a <- doSth(arg)
b = a
yield a + b
```

Intuition would suggest for the desugared code will be of the form

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

But because of the possibility of an `if` guard being immediately after the pure alias, the desugared code is of the form

```scala
doSth(arg).map { a =>
val b = a
(a, b)
}.map { case (a, b) =>
a + b
}
```

These unnecessary assignments and additional function calls not only add unnecessary runtime overhead but can also block other optimizations from being performed.

## Proposed solution

This SIP suggests the following changes to `for` comprehensions:

1. Allow `for` comprehensions to start with pure aliases

e.g.
```scala
for
a = 1
b <- Some(2)
c <- doSth(a)
yield b + c
```
2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring.

e.g.
```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

but

```scala
for
a <- doSth(arg)
b = a
if b > 1
yield a + b
```

will be desugared to

```scala
Some(1).map { a =>
val b = a
(a, b)
}.withFilter { case (a, b) =>
b > 1
}.map { case (a, b) =>
a + b
}
```

3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value.

e.g.
```scala
for
a <- List(1, 2, 3)
yield a
```

will just be desugared to

```scala
List(1, 2, 3)
```

### Detailed description

#### Ad 1. Allow `for` comprehensions to start with pure aliases

Allowing `for` comprehensions to start with pure aliases is a straightforward change.

The Enumerators syntax will be changed from:

```
Enumerators ::= Generator {semi Enumerator | Guard}
```

to

```
Enumerators ::= {Pattern1 `=' Expr semi} Generator {semi Enumerator | Guard}
```

Which will allow adding 0 or more aliases before the first generator.

When desugaring is concerned, a for comprehension starting with pure aliases will generate a block with those aliases as `val` declarations and the rest of the desugared `for` as an expression. Unless the aliases are followed by a guard, then the desugaring should result in an error.

New desugaring rule will be added:

```scala
For any N:
for (P_1 = E_1; ... P_N = E_N; ...)
==>
{
val x_2 @ P_2 = E_2
...
val x_N @ P_N = E_N
for (...)
}
```

e.g.

```scala
for
a = 1
b <- Some(2)
c <- doSth(a)
yield b + c
```

will desugar to

```scala
{
val a = 1
for
b <- Some(2)
c <- doSth(a)
yield b + c
}
```

#### Ad 2. Simpler conditional desugaring of pure aliases. i.e. whenever a series of pure aliases is not immediately followed by an `if`, use a simpler way of desugaring.

Currently, for consistency, all pure aliases are desugared as if they are followed by an `if` condition. Which makes the desugaring more complicated than expected.

e.g.

The following code:

```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to:

```scala
Some(1).map { a =>
val b = a
(a, b)
}.map { case (a, b) =>
a + b
}
```

The proposed change is to introduce a simpler desugaring for common cases, when aliases aren't followed by a guard, and keep the old desugaring method for the other cases.

A new desugaring rules will be introduced for simple desugaring.

```scala
For any N:
for (P <- G; P_1 = E_1; ... P_N = E_N; ...)
==>
G.flatMap (P => for (P_1 = E_1; ... P_N = E_N; ...))

And:

for () yield E ==> E

(Where empty for-comprehensions are excluded by the parser)
```

It delegares desugaring aliases to the newly introduced rule from the previous impreovement. i.e.

```scala
For any N:
for (P_1 = E_1; ... P_N = E_N; ...)
==>
{
val x_2 @ P_2 = E_2
...
val x_N @ P_N = E_N
for (...)
}
```

One other rule also has to be changed, so that the current desugaring method, of passing all the aliases in a tuple with the result, will only be used when desugaring a generator, followed by some aliases, followed by a guard.

```scala
For any N:
for (P <- G; P_1 = E_1; ... P_N = E_N; if E; ...)
==>
for (TupleN(P, P_1, ... P_N) <-
for (x @ P <- G) yield {
val x_1 @ P_1 = E_2
...
val x_N @ P_N = E_N
TupleN(x, x_1, ..., x_N)
}; if E; ...)
```

This changes will make the desugaring work in the following way:

```scala
for
a <- doSth(arg)
b = a
yield a + b
```

will be desugared to

```scala
doSth(arg).map { a =>
val b = a
a + b
}
```

but

```scala
for
a <- doSth(arg)
b = a
if b > 1
yield a + b
```

will be desugared to

```scala
Some(1).map { a =>
val b = a
(a, b)
}.withFilter { case (a, b) =>
b > 1
}.map { case (a, b) =>
a + b
}
```

#### Ad 3. Avoiding redundant `map` calls if the yielded value is the same as the last bound value.

This change is strictly an optimization. This allows for the compiler to get rid of the final `map` call, if the yielded value is the same as the last bound pattern. The pattern can be either a single variable binding or a tuple.

One desugaring rule has to be modified for this purpose.

```scala
for (P <- G) yield P ==> G
If P is a variable or a tuple of variables and G is not a withFilter.

for (P <- G) yield E ==> G.map (P => E)
Otherwise
```

e.g.
```scala
for
a <- List(1, 2, 3)
yield a
```

will just be desugared to

```scala
List(1, 2, 3)
```

### Compatibility

This change is binary and TASTY compatible since for-comprehensions are desugared in the Typer. Thus, both class and TASTY files only ever use the desugared versions of programs.

While this change is forward source compatible, it is not backward compatible, as it accepts more syntax.
Copy link
Contributor

@lihaoyi lihaoyi Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty clear that many of the changes in this proposal can potentially change the behavior of user code. We could argue that such user code is badly written, but nevertheless it is a concern, and we should be rigorous about listing out what kinds of user code would have their behavior changed by this proposal.

Given that we are already choosing to potentially change the behavior of user code via these changes, I wonder if we should take it further. e.g. rather than generating

    Some(1).map { a =>
      val b = a
      (a, b)
    }.withFilter { case (a, b) =>
      b > 1
    }.map { case (a, b) =>
      a + b
    }

we could generate something like

    Some(1).flatMap { a =>
      val b = a
      if (!(b > 1)) Option.empty
      else Some(a + b)
    }

The latter is what people would write by hand, and doesn't have the same surprising behavior around order of evaluation and laziness that the former does (surprising behavior that I have myself been bitten by several times)

If we're going to be changing the desugaring and potentially breaking user code, I feel like it makes sense to change it directly to a desugaring that makes sense, rather than changing it half-way and paying the cost in breakage without reaping the full benefits of a simpler desugaring

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that particular example, how would the compiler know that it should desugar to Option.empty instead of Some.empty? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but AFAIK this would require some refactoring around desugaring of fors. Right now, the desugaring is purely syntactic and it deals with untyped trees. So to implement the desugaring based on empty in an elegant way, fors would have to become fully fledged (typed) trees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could provide implicitly[Empty[Some[Int]]] or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this would still require the type info. Implementing empty as a method on the class instead of the module might work, but that seems the opposite of elegant.

I think that this change might be way less disruptive than it may seem. Maybe we could run the open community build with vs without the feature flag and check the results. (It's far from good testing, but better than speculation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the old desugaring could remain supported under a flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this would still require the type info. Implementing empty as a method on the class instead of the module might work, but that seems the opposite of elegant.

In fact we already have it. There is an empty method on most (all?) stdlib collections. But we also need a singleton method and we don't have that (the Some in the last line of the example)

Some(1).flatMap { a =>
      val b = a
      if (!(b > 1)) this.empty
      else this.singleton(a + b)
    }

Otherwise we'd need the types, and we definitely do not want that! The untyped translation of for expression is one of the major distinguishing features of Scala.

Copy link
Contributor

@lihaoyi lihaoyi Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to go the implicit route, we might be able to do so via an untyped desugaring as well. Something like:

trait Empty[T]{
  def apply(): T
}
object Empty{
  def of[T](v: => T)(implicit e: Empty[T]) = e
}


trait Single[T]{
  def apply(t: T): T
}
object Single{
  def of[T](v: => T)(implicit e: Single[T]) = e
}

Some(1).flatMap { a =>
  val b = a
  if (!(b > 1)) Empty.of(this).apply()
  else Single.of(this).apply(a + b)
}

This could be useful if e.g. we wanted to provide the Single.of value without needing to add a def singleton on every type, allowing us to move forward without modifications to existing standard library classes (e.g. if the std lib was still frozen)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to make this change independent from unfreezing the stdlib would be great. But I'm not sure if it's that straightforward to create those typelcasses. For the sake of inference, we would like the typeclass to be contravariant (we want Empty[Option[?]] <: Empty[Some[?]]). But then IMHO we can't implement it, since it's not the case that ∀(Opt <: Option[?]). None <: Opt. On the other hand, if we make the typeclass invariant, then it will just infer the more specific type. (But maybe I'm missing something obvious)

One additional concern I have here is that this isn't a correct lookup. In this case, it's an owner class/package reference. To get the "monad" lookup, we would have to lift the first GenFrom expression and use it as a lookup afterwards, like so:

val a$lifted = Some(1)
a$lifted.flatMap { a =>
  val b = a
  if (b > 1) a$lifted.singleton(a + b)
  else a$lifted.empty
}

If done correctly, the lifted val should always be immediately before the reference, but might add a lazy to be safe.

Copy link
Member

@bishabosha bishabosha Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the by name parameter, as long as the value is bound to a variable so you could have something like Empty.of[b.type]


### Other concerns

As far as I know, there are no widely used Scala 3 libraries that depend on the desugaring specification of `for`-comprehensions.

## Links

1. Scala contributors discussion thread (pre-SIP): https://contributors.scala-lang.org/t/pre-sip-improve-for-comprehensions-functionality/3509/51
2. Github issue discussion about for desugaring: https://github.com/lampepfl/dotty/issues/2573
3. Scala 2 implementation of some of the improvements: https://github.com/oleg-py/better-monadic-for
4. Implementation of one of the simplifications: https://github.com/lampepfl/dotty/pull/16703
5. WIP implementation branch: https://github.com/dotty-staging/dotty/tree/improved-fors