Flesh out the proposal

odersky · odersky · commit 383486fd83f4 · 2024-11-16T13:33:56.000+01:00
diff --git a/docs/_docs/internals/specialized-traits.md b/docs/_docs/internals/specialized-traits.md
@@ -1,6 +1,7 @@
-# Specialized Traits
+# Specialized Traits and Classes
 
 Specialization is one of the few remaining desirable features from Scala 2 that's are as yet missing in Scala 3. We could try to port the Scala 2 scheme, which would be non-trivial since the implementation is quite complex. But that scheme is problematic enough to suggest that we also look for alternatives. A possible alternative is described here. It is meant to complement the [proposal on inline traits](https://github.com/lampepfl/dotty/issues/15532). That proposal also contains a more detailed critique of Scala 2 specialization.
+The parts in that proposal that mention specialization should be ignored; they are superseded by the proposal here.
 
 The main problem of Scala-2 specialization is code bloat. We have to pro-actively generate up to 11 copies of functions and classes when they have a specialized type parameter, and this grows exponentially with the number of such type parameters. Miniboxing tries to reduce the number under the exponent from ~10 to 3 or 4, but it has problems dealing with arrays.
 
@@ -182,7 +183,7 @@ trait Seq$sp$Int extends Seq[Int], Iterable[Int]:
   def length: Int
   def apply(i: Int): Int
 ```
-Note that these traits repeat the parent types of their corresponding inline traits, for instance `ArrayIterator$sp$Int` extends `ArrayIterator[Int]` as well as its parent `Iterator[Int]`. After erasure, the definition of
+Note that these traits repeat the parent types of their corresponding inline traits. For instance, `ArrayIterator$sp$Int` extends `ArrayIterator[Int]` as well as its parent `Iterator[Int]`. After erasure, the definition of
 `ArrayIterator$sp$Int` becomes
 ```scala
 trait ArrayIterator$sp$Int extends ArrayIterator, Iterator$sp$Int
@@ -262,7 +263,7 @@ This method is generated by Scala 2's function specialization which is also adop
 The example shows that indeed all code is properly specialized with no need for box or unbox operations.
 
 
-## Conclusion
+## Evaluation
 
 The described scheme is surprisingly simple. All the heavy lifting is done by inline traits. Adding specialization on top requires little more than arranging for a cache of specialized instances.
 
@@ -277,7 +278,17 @@ inline trait Vector[+A: Specialized](elems: A*):
 ```
 There's precedent for this in Kotlin where the majority of higher-order collection methods are declared inline, in this case  in order to allow specialization for suspendability. So the restriction does not look like a blocker.
 
-Some flexibility could be gained if we allowed method overloading between specialized inline methods and normal methods with matching type signatures. For instance, the `Vector` implementation above seriously restricts `map` by requiring that its `B` type parameter is also `Specialized`. Thus `map` cannot be used to map a specialized collection to another collection if the result element type is not ground. But we could alleviate the problem by allowing a second, overloaded `map` operation like this:
+## Going Further: Improve Existing Class Hierarchies
+
+We have shown that we can formulate an alternative version of a collection-like class hierarchy that is fully specialized. But can we retro-fit this idea even to existing collections? The direct approach would
+clearly not work since an existing collection like `Vector[T]` can be created from anywhere whereas a specialized collection can be created only in a monomorphic context where we know the type instance of `T`. So specialized
+collections come with a tax in expressiveness which pays for their superior performance.
+
+But it turns out we can gain a lot of flexibility with three additional tweaks to the language and compiler.
+
+### 1. Adapt Overloading to Specialization
+
+More flexibility could be gained if we allowed method overloading between specialized inline methods and normal methods with matching type signatures. For instance, the `Vector` implementation above seriously restricts `map` by requiring that its `B` type parameter is also `Specialized`. Thus `map` cannot be used to map a specialized collection to another collection if the result element type is not statically known. But we could alleviate the problem by allowing a second, overloaded `map` operation like this:
 ```scala
   def map[B](f: A => B): collection.immutable.Vector[B] =
     new collection.immutable.Vector[B](elems.map(f))
@@ -286,6 +297,99 @@ The second implementation of `map` will return an unspecialized vector if
 the new element type is not statically known. If overloads like this were allowed, they could be resolved by picking the specialized inline version if
 a `Specialized` instance can be synthesized for the actual type argument, and picking the unspecialized version otherwise.
 
+We can do even better if we allow some additions of the existing collections. In that case, we can add definitions like the inline `map` above to the original collections.
+That means, whenever we have a collection `xs` with a type such as `Vector[A]` and a function `f` with a statically known result type `B`, then `xs.map(f)` returns a specialized collection. So we can get specialized collections out of normal collections as long as the element type of the created collection is statically known.
+
+This can be generalized. In particular, all `apply` methods of `Vector` should be split into methods taking specialized types and unrestricted methods. For instance:
+```scala
+object Vector:
+  def apply[T](xs: T*): Vector[T] = ...
+  inline def apply[T: Specialized](xs: T*): faster.Vector[T] = ...
+```
+The same holds for all collection methods such as `map` that return a new collection of a different element type.
+
+### 2. Automate the Boilerplate with `specializedBy`
+
+The described scheme would entail some amount of code duplication. We could automate this with a new annotation that is put on a class and states that the class has a specialized variant. Example:
+```scala
+@specializedBy[faster.Vector] class Vector[+T] ...
+```
+If a class carries such an annotation the specialized inline functions described above could be added automatically.
+
+### 3. Optimize Use Sites by Path Splitting
+
+One remaining problem is that specialization is a compile-time operation. Without putting in additional work, we cannot immediately exploit the situation where a runtime type is a specialized collection but the static type is unspecialized. For instance, consider this use of `Vector`:
+
+```scala
+def sumElems(xs: Vector[Int]): Int =
+  var i = 0
+  var sum = 0
+  while i < xs.length do
+    sum += xs(i)
+    i += 1
+  sum
+```
+Here, the problem is that, even though we know that `xs` is a `Vector` of `Int`, we cannot deduce that has been specialized to a `faster.Vector[Int]`. Therefore, `xs(i)` goes through the `apply` method of `Vector`. If the runtime class of `Vector` is indeed specialized this would box the `Int` element to `Object` in a bridge method and unbox it again to `Int` at the call site. This could lose a lot of performance, unless the JVM manages to optimize the box/unbox pair away (so far, experience shows that the JVM is not very good at this). The performance could be even worse than working with an unspecialized `Vector` where elements are held in boxed form so they don't have to be boxed each time they are accessed.
+
+
+Of course, we can narrow the type of `sumElems` to
+```scala
+def sumElems(xs: faster.Vector[Int]): Int
+```
+but that would make it less generally usable. Another alternative is to optimize `sumElems` by path splitting. We could detect at runtime whether
+`xs` is a `faster.Vector` and optimize the code if it is. For instance, like this:
+```scala
+def sumElems(xs: Vector[Int]): Int =
+  val faster: faster.Vector[Int] | Null = xs match
+    case xs: faster.Vector[_] => xs
+    case _ => xs
+  var i = 0
+  var sum = 0
+  while i < xs.length do
+    sum += (if faster != null then faster(i) else xs(i))
+    i += 1
+  sum
+```
+That would avoid the boxing at the cost of a type test in the computation of `faster` and a null test in the call of `apply`. The single type test would be amortized over possibly many calls in the loop. We could do even better by generating a bit more code, splitting the whole loop:
+```scala
+def sumElems(xs: Vector[Int]): Int =
+  val faster: faster.Vector[Int] | Null = xs match
+    case xs: faster.Vector[_] => xs
+    case _ => xs
+  var i = 0
+  var sum = 0
+  if faster != null then
+    while i < xs.length do
+      sum += faster(i)
+      i += 1
+  else
+    while i < xs.length do
+      sum += xs(i)
+      i += 1
+  sum
+```
+The example has shown that it is possible to have code over possibly specialized collections that is both general and high performance. But it does require a lot of hand-written boiler-plate.
+
+The boilerplate could be generated automatically by an optimization phase in the compiler. Essentially when compiling methods that take parameters whose type is a class
+that's annotated with `specializedBy`, we can do the path splitting automatically in an optimization step. The optimization would first analyze the body of the method to decide which path splitting strategy to use.
+
+I believe the three steps I have outlined could overcome most of the performance penalties imposed by existing unspecialized class hierarchies like collections, making their performance comparable to languages that use global monomorphization.
+
+## Going Further: Hand-written Specializations
+
+Additional improvements could be gained if we allowed the programmer to pick their own implementations for specialized class instances. For example,
+we could have a
+```scala
+inline trait HashMap[K: Specialized, +V: Specialized] ...
+```
+and an optimized sub-trait
+```scala
+inline trait IntHashMap[+V: Specialized] extends HashMap[Int, V] ...
+```
+The implementation in `IntHashMap` could exploit that fact that the key type `K` is known to be `Int` to pick a more performant algorithm, for instance.
+
+It would be great if we could use `IntHashMap` each time a specialized HashMap such as `HashMap$sp$Int$String` is referred to or created. In other words, `IntHashMap` should act as a drop-in replacement for `HashMap$sp$Int$String` that is selected automatically. A detailed proposal for this is left for future work.
+