Cut down on exponential growth in checking switch exhaustivity.

jrose-apple · jrose-apple · commit 671944df0ce2 · 2019-01-18T18:36:58.000-08:00
The problem Assume 'A' and 'B' are enums with cases .a1, .a2, etc. and
.b1, .b2, etc. If we try to typecheck this switch:

    switch (a, b) {
    case (.a1, .b1),
         (.a1, .b2):
      break
    ...

The compiler is going to try to perform the following series of
operations:

    &gt; var s = (A, B)
    &gt; s -= (.a1, .b1)
    ((A - .a1, B) | (A, B - .b1))
    &gt; s -= (.a1, .b2)
    (((A - .a1, B) | (A - .a1, B - .b2)) |
     ((A - .a1, B - .b1) | (A, B - .b1 - .b2)))
    ...

As you can see, the disjunction representing the uncovered space is
growing exponentially. Eventually some of these will start
disappearing (for instance, if B only has .b1 and .b2, that last term
can go away), and if the switch is exhaustive they can /all/ start
disappearing. But several of them are also redundant: the second and
third cases are fully covered by the first.

I exaggerated a little: the compiler is already smart enough to know
that the second case is redundant with the first, because it already
knows that (.a1, .b2) isn't a subset of (A - .a1, B). However, the
third and fourth cases are generated separately from the first two,
and so nothing ever checks that the third case is also redundant.

This patch changes the logic for subtracting from a disjunction so
that

1. any resulting disjunctions are flattened, and
2. any later terms that are subspaces of earlier terms are dropped

This is a quadratic algorithm in the worst case (compare every space
against every earlier space), but because it saves us from exponential
growth (or at least brings down the exponent) it's worth it. For the
test case now committed in the repository, we went from 40 seconds
(using -switch-checking-invocation-threshold=20000000 to avoid cutting
off early) to 0.2 seconds.

I'll admit I was only timing this one input, and it's possible that
other complex switches will not see the same benefit, or may even see
a slowdown. But I do think this kind of switch is common in both
hand-written and auto-generated code, and therefore this is likely to
be a benefit in many places.

rdar://problem/47365349
diff --git a/lib/Sema/TypeCheckSwitchStmt.cpp b/lib/Sema/TypeCheckSwitchStmt.cpp
@@ -506,12 +506,35 @@ namespace {
         PAIRCASE (SpaceKind::Disjunct, SpaceKind::UnknownCase): {
           SmallVector<Space, 4> smallSpaces;
           for (auto s : this->getSpaces()) {
-            if (auto diff = s.minus(other, TC, DC, minusCount))
-              smallSpaces.push_back(*diff);
-            else
+            auto diff = s.minus(other, TC, DC, minusCount);
+            if (!diff)
               return None;
+            if (diff->getKind() == SpaceKind::Disjunct) {
+              smallSpaces.append(diff->getSpaces().begin(),
+                                 diff->getSpaces().end());
+            } else {
+              smallSpaces.push_back(*diff);
+            }
           }
-          return Space::forDisjunct(smallSpaces);
+
+          // Remove any of the later spaces that are contained entirely in an
+          // earlier one. Since we're not sorting by size, this isn't
+          // guaranteed to give us a minimal set, but it'll still reduce the
+          // general (A, B, C) - ((.a1, .b1, .c1) | (.a1, .b1, .c2)) problem.
+          // This is a quadratic operation but it saves us a LOT of work
+          // overall.
+          SmallVector<Space, 4> usefulSmallSpaces;
+          for (const Space &space : smallSpaces) {
+            bool alreadyHandled = llvm::any_of(usefulSmallSpaces,
+                                               [&](const Space &previousSpace) {
+              return space.isSubspace(previousSpace, TC, DC);
+            });
+            if (alreadyHandled)
+              continue;
+            usefulSmallSpaces.push_back(space);
+          }
+
+          return Space::forDisjunct(usefulSmallSpaces);
         }
         PAIRCASE (SpaceKind::Constructor, SpaceKind::Type):
           return Space();
diff --git a/validation-test/Sema/large-switch-rdar47365349.swift b/validation-test/Sema/large-switch-rdar47365349.swift
@@ -0,0 +1,212 @@
+// RUN: %target-typecheck-verify-swift
+
+enum NumericBase {
+  case binary
+  case ternary
+  case quaternary
+  case quinary
+  case senary
+  case septary
+  case octal
+  case nonary
+  case decimal
+  case undecimal
+  case duodecimal
+}
+
+enum Direction {
+  case left
+  case right
+}
+
+enum WritingSystem {
+  case logographic
+  case alphabet(kind: Alphabet)
+  case abjad
+  case abugida
+  case syllabary
+  case other
+}
+
+enum Alphabet {
+  case roman
+  case greek
+  case cyrillic
+}
+
+func test(base: NumericBase, direction: Direction, writingSystem: WritingSystem) {
+  switch (base, direction, writingSystem) {
+    case (.binary, .left, .logographic),
+      (.binary, .left, .alphabet),
+      (.binary, .left, .abugida):
+      break
+      
+    case (.binary, .right, .logographic),
+      (.binary, .right, .alphabet),
+      (.binary, .right, .abugida):
+      break
+      
+    case (.binary, _, .abjad):
+      break
+      
+    case (.binary, _, .syllabary):
+      break
+      
+    case (.ternary, .left, .logographic):
+      break
+      
+    case (.ternary, .left, .alphabet),
+      (.ternary, .left, .abugida):
+      break
+      
+    case (.ternary, .right, .logographic),
+      (.ternary, .right, .abugida):
+      break
+      
+    case (.ternary, .right, .alphabet):
+      break
+      
+    case (.ternary, _, .abjad):
+      break
+      
+    case (.ternary, _, .syllabary):
+      break
+      
+    case (.quaternary, .left, .logographic):
+      break
+      
+    case (.quaternary, .left, .alphabet),
+      (.quaternary, .left, .abugida):
+      break
+      
+    case (.quaternary, .right, .logographic),
+      (.quaternary, .right, .abugida):
+      break
+      
+    case (.quaternary, .right, .alphabet):
+      break
+      
+    case (.quaternary, _, .abjad):
+      break
+      
+    case (.quaternary, _, .syllabary):
+      break
+    
+    case (.quinary, .left, .logographic),
+      (.senary, .left, .logographic):
+      break
+      
+    case (.quinary, .left, .alphabet),
+      (.senary, .left, .alphabet),
+      (.quinary, .left, .abugida),
+      (.senary, .left, .abugida):
+      break
+      
+    case (.quinary, .right, .logographic),
+      (.senary, .right, .logographic):
+      break
+      
+    case (.quinary, .right, .alphabet),
+      (.senary, .right, .alphabet),
+      (.quinary, .right, .abugida),
+      (.senary, .right, .abugida):
+      break
+      
+    case (.quinary, _, .abjad),
+      (.senary, _, .abjad):
+      break
+      
+    case (.quinary, _, .syllabary),
+      (.senary, _, .syllabary):
+      break
+      
+    case (.septary, .left, .logographic):
+      break
+      
+    case (.septary, .left, .alphabet),
+      (.septary, .left, .abugida):
+      break
+      
+    case (.septary, .right, .logographic):
+      break
+      
+    case (.septary, .right, .alphabet),
+      (.septary, .right, .abugida):
+      break
+      
+    case (.septary, _, .abjad):
+      break
+      
+    case (.septary, _, .syllabary):
+      break
+      
+    case (.decimal, .left, .logographic):
+      break
+      
+    case (.decimal, .left, .alphabet),
+      (.decimal, .left, .abugida):
+      break
+      
+    case (.decimal, .right, .logographic):
+      break
+      
+    case (.decimal, .right, .alphabet),
+      (.decimal, .right, .abugida):
+      break
+      
+    case (.octal, .left, .logographic),
+      (.nonary, .left, .logographic):
+      break
+      
+    case (.octal, .left, .alphabet),
+      (.nonary, .left, .alphabet),
+      (.octal, .left, .abugida),
+      (.nonary, .left, .abugida):
+      break
+      
+    case (.octal, .right, .logographic),
+      (.nonary, .right, .logographic):
+      break
+      
+    case (.octal, .right, .alphabet),
+      (.nonary, .right, .alphabet),
+      (.octal, .right, .abugida),
+      (.nonary, .right, .abugida):
+      break
+      
+    case (.octal, _, .abjad),
+      (.nonary, _, .abjad),
+      (.decimal, _, .abjad):
+      break
+      
+    case (.octal, _, .syllabary),
+      (.nonary, _, .syllabary),
+      (.decimal, _, .syllabary):
+      break
+      
+    case (.undecimal, .left, .logographic):
+      break
+      
+    case (.undecimal, .left, .alphabet),
+      (.undecimal, .left, .abugida):
+      break
+      
+    case (.undecimal, .right, .logographic):
+      break
+      
+    case (.undecimal, .right, .alphabet),
+      (.undecimal, .right, .abugida):
+      break
+      
+    case (.undecimal, _, .abjad):
+      break
+      
+    case (.undecimal, _, .syllabary):
+      break
+      
+    case (.duodecimal, _, _):
+      break
+    case (_, _, .other):
+      break
+  }
+}