|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +post-type: blog |
| 4 | +by: Martin Odersky |
| 5 | +title: Scaling DOT to Scala: Soundness |
| 6 | +--- |
| 7 | + |
| 8 | +In my [last |
| 9 | +blog](http://www.scala-lang.org/blog/2016/02/03/essence-of-scala.html) |
| 10 | +I introduced DOT, a minimal calculus that underlies much of Scala. |
| 11 | +DOT is much more than an academic exercise, because it gives us |
| 12 | +guidelines how to design a sound type system for full Scala. As was |
| 13 | +argued in the previous blog post, the danger a path-dependent type |
| 14 | +system like Scala's faces is inconsistent bounds or aliases. For |
| 15 | +instance, you might have a type alias |
| 16 | + |
| 17 | + type T = String |
| 18 | + |
| 19 | +in scope in some part of the program, but in another part the same |
| 20 | +type member `T` is known as |
| 21 | + |
| 22 | + type T = Int |
| 23 | + |
| 24 | +If you connect the two parts, you end up allowing assigning a `String` |
| 25 | +to an `Int` and vice versa, which is unsound - it will crash at |
| 26 | +runtime with a `ClassCastException`. The problem is that there |
| 27 | +is no obvious, practical, compile time analysis for DOT or |
| 28 | +Scala that ensures that all types have good bounds. Types can contain |
| 29 | +abstract type members with bounds that can be refined elsewhere and |
| 30 | +several independent refinements might lead together to a bad bound |
| 31 | +problem. Barring a whole program analysis there is no specific |
| 32 | +point in the program where we can figure this out straightforwardly. |
| 33 | + |
| 34 | +In DOT, the problem is resolved by insisting that every path prefix `p` |
| 35 | +of a type `p.T` is at runtime a concrete value. That way, we only have |
| 36 | +to check for good bounds when objects are _created_ with `new`, and |
| 37 | +that check is easy: When objects are created, we know their class and |
| 38 | +we can insist that all nested types in that class are aliases or |
| 39 | +have consistent bounds. So far so good. |
| 40 | + |
| 41 | +But if we want to scale up the DOT result for full Scala, several |
| 42 | +loopholes open up. These come all down to the fact that the prefix of |
| 43 | +a type selection might _not_ be a value that's constructed with a |
| 44 | +`new` at run time. The loopholes can be classified into three |
| 45 | +categories: |
| 46 | + |
| 47 | + 1. The prefix value might be lazy, and never instantiated to anything, as in: |
| 48 | + |
| 49 | + lazy val p: S = p |
| 50 | + ... p.T ... |
| 51 | + |
| 52 | + Note that trying to access the lazy value `p` would result in an infinite loop. But using `p` in a type does not force its evaluation, so we might never evaluate `p`. Since `p` is not initialized |
| 53 | +with a `new`, bad bounds for `T` would go undetected. |
| 54 | + |
| 55 | + 2. The prefix value might be initialized to `null`, as in |
| 56 | + |
| 57 | + val p: S = null |
| 58 | + ... p.T ... |
| 59 | + |
| 60 | + The problem here is similar to the first one. `p` is not initialized |
| 61 | + with a `new` so we know nothing about the bounds of `T`. |
| 62 | + |
| 63 | + 3. The prefix might be a type `T` in a type projection `T # A`, where `T` |
| 64 | + is not associated with a runtime value. |
| 65 | + |
| 66 | +We can in fact construct soundness issues in all of these cases. Look |
| 67 | +at the discussion for issues [#50](https://github.com/lampepfl/dotty/issues/50) |
| 68 | +and [#1050](https://github.com/lampepfl/dotty/issues/1050) in the |
| 69 | +[dotty](https://github.com/lampepfl/dotty/issues/1050) repository |
| 70 | +on GitHub. All issues work fundamentally in the same way: Construct a type `S` |
| 71 | +which has a type member `T` with bad bounds, say |
| 72 | + |
| 73 | + Any <: T <: Nothing |
| 74 | + |
| 75 | +Then, use the left subtyping to turn an expression of type `Any` into |
| 76 | +an expression of type `T` and use the right subtyping to turn that |
| 77 | +expression into an expression of type `Nothing`: |
| 78 | + |
| 79 | + def f(x: Any): p.T = x |
| 80 | + def g(x: p.T): Nothing = x |
| 81 | + |
| 82 | +Taken together, `g(f(x))` will convert every expression into an |
| 83 | +expression of type `Nothing`. Since `Nothing` is a subtype of every |
| 84 | +other type, this means you can convert an arbitrary expression to have |
| 85 | +any type you choose. Such a feat is an impossible promise, of |
| 86 | +course. The promise is usually broken at run-time by failing with a |
| 87 | +`ClassCastException`. |
| 88 | + |
| 89 | +To get back to soundness we need to plug the loopholes. Some of the |
| 90 | +necessary measures are taken in pull request [#1051](https://github.com/lampepfl/dotty/issues/1051). |
| 91 | +That pull request |
| 92 | + |
| 93 | + - tightens the rules for overrides of lazy values: lazy values |
| 94 | + cannot override or implement non-lazy values, |
| 95 | + - tightens the rules which lazy values can appear in paths: they |
| 96 | + must be final and must have concrete types with known consistent bounds, |
| 97 | + - allows type projections `T # A` only if `T` is a concrete type |
| 98 | + with known consistent bounds. |
| 99 | + |
| 100 | +It looks like this is sufficient to plug soundness problems (1) and |
| 101 | +(3). To plug (2), we need to make the type system track nullability in |
| 102 | +more detail than we do it now. Nullability tracking is a nice feature |
| 103 | +in its own right, but now we have an added incentive for implementing |
| 104 | +it: it would help to ensure type soundness. |
| 105 | + |
| 106 | +There's one sub-case of nullability checking which is much harder to do |
| 107 | +than the others. An object reference `x.f` might be `null` at run time |
| 108 | +because the field `f` is not yet initialized. This can lead to a |
| 109 | +soundness problem, but in a more roundabout way than the other issues |
| 110 | +we have identified. In fact, Scala guarantees that in a program that |
| 111 | +runs to completion without aborting, every field will eventually be |
| 112 | +initialized, so every non-null field will have good bounds. Therefore, |
| 113 | +the only way an initialized field `f` could cause a soundness problem |
| 114 | +is if the program in question would never get to initialize `f`, |
| 115 | +either because it goes into an infinite loop or because it aborts with |
| 116 | +an exception or `System.exit` call before reaching the initialization |
| 117 | +point of `f`. It's a valid question whether type soundness guarantees |
| 118 | +should extend to this class of "strange" programs. We might want to |
| 119 | +draw the line here and resort to runtime checks or exclude "strange" |
| 120 | +programs from any soundness guarantees we can give. |
| 121 | + |
| 122 | +The necessary restrictions on type projection `T # A` are problematic |
| 123 | +because they invalidate a common idiom in type-level programming. To |
| 124 | +ease the transition, we will continue to allow unrestricted type |
| 125 | +projections for a while under a flag, even though they are |
| 126 | +potentially unsound. In the current dotty compiler, that flag is a |
| 127 | +language import `-language:Scala2`, but it could be something |
| 128 | +different for other compilers, e.g. `-unsafe`. Maybe we can find |
| 129 | +rules that are less restrictive than the ones we have now, and are |
| 130 | +still sound. But one aspect should be non-negotiable: Any fundamental |
| 131 | +deviations from the principles laid down by DOT needs to be proven |
| 132 | +mechanically correct just like DOT was. We have achieved a lot with |
| 133 | +the DOT proofs, so we should make sure not to back-slide. And if the |
| 134 | +experience of the past 10 years has taught us one thing, it is that |
| 135 | +the meta theory of type systems has many more surprises in store than |
| 136 | +one might think. That's why mechanical proofs are essential. |
| 137 | + |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | + |
0 commit comments