From 69922760b61a08cbd1acc68a7a3bf66c3f7505c8 Mon Sep 17 00:00:00 2001
From: John McCall <rjmccall@apple.com>
Date: Sat, 4 Oct 2025 03:02:21 -0400
Subject: [PATCH 1/2] Document SIL's dominance rules, which apparently we've
 never done.

---
 docs/SIL/SIL.md | 238 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 204 insertions(+), 34 deletions(-)

diff --git a/docs/SIL/SIL.md b/docs/SIL/SIL.md
index d7cc716ca4f16..d82414a639fad 100644
--- a/docs/SIL/SIL.md
+++ b/docs/SIL/SIL.md
@@ -780,6 +780,178 @@ _lexical_ in order to specify this property for all contributing lifetimes.
 For details see [Variable Lifetimes](Ownership.md#variable-lifetimes) in the
 Ownership document.
 
+# Dominance
+
+## Value and instruction dominance
+
+Whenever an instruction uses a [value](#values-and-operands) as an
+operand, the definition of the value must dominate the instruction.
+This is a common concept across all SSA-like representations. SIL
+uses a standard definition of dominance, modified slightly to account
+for SIL's use of basic block arguments rather than phi instructions:
+
+- The value `undef` always dominates an instruction.
+
+- An instruction result `R` dominates an instruction `I` if the
+  instruction that defines `R` dominates `I`.
+
+- An argument of a basic block `B` dominates an instruction `I` if all
+  initial paths passing through `I` must also pass through the start
+  of `B`.
+
+An instruction `D` dominates another instruction `I` if they are
+different instructions and all initial paths passing through `I`
+must also pass through `D`.
+
+See [below](#definition-of-a-path) for the formal definition of an
+initial path.
+
+## Basic block dominance
+
+A basic block `B1` dominates a basic block `B2` if they are different
+blocks and if all initial paths passing through the start of `B2` must
+also pass through through the start of `B1`.
+
+This relationship between blocks can be thought of as creating a
+directed acyclic graph of basic blocks, called the *dominance tree*.
+The dominance tree is not directly represented in SIL; it is just
+an emergent property of the dominance requirement on SIL functions.
+
+## Joint post-dominance
+
+Certain instructions are required to have a *joint post-dominance*
+relationship with certain other instructions. Informally, this means
+that all terminating paths through the first instruction must
+eventually pass through one of the others. This is common for
+instructions that define a scope in the SIL function, such as
+`alloc_stack` and `begin_access`.
+
+The dominating instruction is called the *scope instruction*,
+and the post-dominating instructions are called the *scope-ending
+instructions*. The specific joint post-dominance requirement
+defines the set of instructions that count as scope-ending
+instructions for the begin instruction.
+
+For example, an `alloc_stack` instruction must be jointly
+post-dominated by the set of `dealloc_stack` instructions
+whose operand is the result of the `alloc_stack`. The
+`alloc_stack` is the scope instruction, and the `dealloc_stack`s
+are the scope-ending instructions.
+
+The *scope* of a joint post-dominance relationship is the set
+of all points in the function following the scope instruction
+but prior to a scope-ending instruction. Making this precisely
+defined is part of the point of the joint post-dominance rules.
+A formal definition is given later.
+
+In SIL, if an instruction acts as a scope instruction, it always
+has exactly one set of scope-ending instructions associated
+with it, and so it forms exactly one scope. People will therefore
+often talk about, e.g., the scope of an `alloc_stack`, meaning
+the scope between it and its `dealloc_stack`s. Furthermore,
+there are no instructions in SIL which act as scope-ending
+instructions for multiple scopes.
+
+A scope instruction `I` is jointly post-dominated by its
+scope-ending instructions if:
+
+- All initial paths that pass through a scope-ending instruction
+  of `I` must also pass through `I`. (This is just the normal
+  dominance rule, and it is typically already required by the
+  definition of the joint post-dominance relationship. For example,
+  a `dealloc_stack` must be dominated by its associated
+  `alloc_stack` because it uses its result as an operand.)
+
+- All initial paths that pass through `I` twice must also pass
+  through a scope-ending instruction of `I` in between.
+
+- All initial paths that pass through a scope-ending instruction
+  of `I` twice must also pass through `I` in between.
+
+- All terminating initial paths that pass through `I` must also
+  pass through a scope-ending instruction of `I`.
+
+In other words, all paths must strictly alternate between `I`
+and its scope-ending instructions, starting with `I` and (if
+the path exits) ending with a scope-ending instruction.
+
+Note that a scope-ending instruction does not need to appear on
+a path following a scope instruction if the path doesn't exit
+the function. In fact, a function needn't include any scope-ending
+instructions for a particular scope instruction if all paths from
+that point are non-terminating, such as by ending in `unreachable`
+or containing an infinite loop.
+
+The scope defined by a joint post-dominance relationship for
+a scope instruction `I` is the set of points in the function for
+which there exists an initial path that visits that point and
+which passes through `I` but which does not pass through a
+scope-ending instruction of `I` under that relationship. Note
+that the point before a scope-ending instruction is always within
+the scope.
+
+## Definition of a path
+
+A *point* in a SIL function is the moment before an instruction.
+Every basic block has an entry point, which is the point before
+its first instruction. The entry point of the entry block is also
+called the entry point of the function.
+
+A path through a SIL function is a path (in the usual graph-theory
+sense) in the underlying directed graph of points, in which:
+
+- every point in the SIL function is a vertex in the graph,
+
+- each non-terminator instruction creates an edge from the point
+  before it to the point after it, and
+
+- each terminator instruction creates edges from the point before
+  the terminator to the initial point of each its successor blocks.
+
+A path is said to pass through an instruction if it includes
+any of the edges created by that instruction. A path is said to
+pass through the start of a basic block if it visits the entry
+point of that block.
+
+An *initial path* is a path which begins at the entry point of the
+function. A *terminating path* is a path which ends at the point
+before an exiting instruction, such as `return` or `throw`.
+
+Note that the dominance rules generally require only an initial path,
+not a terminating path. A path that simply stops in the middle of a
+block still counts for dominance. Among other things, this ensures that
+dominance holds in blocks that are part of an infinite loop.
+
+Note also that paths consider successors without regard to the
+nature of the terminator. Paths that are provably impossible because
+of value relationships still count for dominance. For example,
+consider the following function:
+
+```
+    bb0(%cond : $Builtin.Int1):
+      cond_br %cond, bb1, b22
+    bb1:
+      %value = integer_literal $Builtin.Int32, 0
+      br bb3
+    bb2:
+      br bb3
+    bb3:
+      cond_br %cond, bb4, bb5
+    bb4:
+      %twice_value = builtin "add_Int32"(%value, %value) : $Builtin.Int32
+      br bb6
+    bb5:
+      br bb6
+    bb6:
+      ret %cond
+```
+
+Dynamically, it is impossible to reach the `builtin` instruction
+without passing through the definition of `%value`: to reach
+the `builtin`, `%cond` must be `true`, and so the first `cond_br`
+must have branched to `bb1`. This is not taken into consideration
+by dominance, and so this function is ill-formed.
+
 # Debug Information
 
 Each instruction may have a debug location and a SIL scope reference at
@@ -1364,48 +1536,39 @@ stack deallocation instructions. It can even be paired with no
 instructions at all; by the rules below, this can only happen in
 non-terminating functions.
 
--   At any point in a SIL function, there is an ordered list of stack
-    allocation instructions called the *active allocations list*.
+- All stack allocation instructions must be jointly post-dominated
+  by stack deallocation instructions paired with them.
 
--   The active allocations list is defined to be empty at the initial
-    point of the entry block of the function.
+- No path through the function that passes through a stack allocation
+  instruction `B`, having already passed a stack allocation
+  instruction `A`, may subsequently pass through a stack deallocation
+  instruction paired with `A` without first passing through a stack
+  deallocation instruction paired with `B`.
 
--   The active allocations list is required to be the same at the
-    initial point of any successor block as it is at the final point of
-    any predecessor block. Note that this also requires all
-    predecessors/successors of a given block to have the same
-    final/initial active allocations lists.
+These two rules statically enforce that all stack allocations are
+properly nested. In simpler terms:
 
-    In other words, the set of active stack allocations must be the same
-    at a given place in the function no matter how it was reached.
+- At every point in a SIL function, there is an ordered list of stack
+  allocation instructions called the *active allocations list*.
 
--   The active allocations list for the point following a stack
-    allocation instruction is defined to be the result of adding that
-    instruction to the end of the active allocations list for the point
-    preceding the instruction.
+- The active allocations list is empty at the start of the entry block
+  of the function, and it must be empty again whenever an instruction
+  that exits the function is reached, like `return` or `throw`.
 
--   The active allocations list for the point following a stack
-    deallocation instruction is defined to be the result of removing the
-    instruction from the end of the active allocations list for the
-    point preceding the instruction. The active allocations list for the
-    preceding point is required to be non-empty, and the last
-    instruction in it must be paired with the deallocation instruction.
+- Whenever a stack allocation instruction is reached, it is added to
+  the end of the list.
 
-    In other words, all stack allocations must be deallocated in
-    last-in, first-out order, aka stack order.
+- Whenever a stack deallocation instruction is reached, its paired
+  stack allocation instruction must be at the end of the list, which it
+  is then removed from.
 
--   The active allocations list for the point following any other
-    instruction is defined to be the same as the active allocations list
-    for the point preceding the instruction.
+- The active allocations list always be the same on both sides of a
+  control flow edge. This implies both that all successors of a block
+  must start with the same list and that all predecessors of a block
+  must end with the same list.
 
--   The active allocations list is required to be empty prior to
-    `return` or `throw` instructions.
-
-    In other words, all stack allocations must be deallocated prior to
-    exiting the function.
-
-Note that these rules implicitly prevent an allocation instruction from
-still being active when it is reached.
+Note that these rules implicitly prevent stack allocations from leaking
+or being double-freed.
 
 The control-flow rule forbids certain patterns that would theoretically
 be useful, such as conditionally performing an allocation around an
@@ -1414,6 +1577,13 @@ to use, however, as it is illegal to locally abstract over addresses,
 and therefore a conditional allocation cannot be used in the
 intermediate operation anyway.
 
+There is currently an exception to the stack discipline rules which
+allows the predecessors of a dead-end block (a block from which no
+exit is reachable) to disagree about the state of the stack.
+The current exception is unsound and permits manipulation of the
+stack in ways that may not be valid in all predecessor states. We
+are exploring ways to improve this situation.
+
 # Structural type matching for pack indices
 
 In order to catch type errors in applying pack indices, SIL requires the

From 19b34eee6ecd5efa7b0581fe603d2d49920ace67 Mon Sep 17 00:00:00 2001
From: John McCall <rjmccall@apple.com>
Date: Mon, 6 Oct 2025 23:44:23 -0400
Subject: [PATCH 2/2] Rewrite StackNesting to be a non-iterative single-pass
 algorithm.

The previous algorithm was doing an iterative forward data flow
analysis followed by a reverse data flow analysis. I suspect the
history here is that it was a reverse analysis, and that didn't
really work for infinite loops, and so complexity kindof accumulated.

The new algorithm is quite straightforward and relies on the
allocations being properly jointly post-dominated, just not nested.
We simply walk forward through the blocks in consistent-with-dominance
order, maintaining the stack of active allocations and deferring
deallocations that are improperly nested until we deallocate the
allocations above it.

The reason I'm doing this, besides it just being a simpler, faster
algorithm, is that modeling some of the uses of the async stack
allocator properly requires builtins that cannot just be semantically
reordered. That should be somewhat easier to handle with the new
approach, although really (1) we should not have runtime functions
that need this and (2) we're going to need a conservatively-correct
solution that's different from this anyway because hoisting
allocations is *also* limited in its own way.

The test cases that changed are... I don't think the new output
is wrong under the current rules that are being enforced, but
really we should be enforcing different rules, because it's not
really okay to have broken stack nesting in blocks just because
they don't lead to an exit. But it's broken coming into
StackNesting, and I don't think the rewrite actually makes it
worse, so...

The thing that concerns me most about the rewritten pass is that
it isn't actually validating joint post-dominance on input, so
if you give it bad input, it might be a little mystifying to
debug the verifier failures.
---
 .../swift/SILOptimizer/Utils/StackNesting.h   | 121 +--
 lib/SILOptimizer/Utils/StackNesting.cpp       | 755 ++++++++++--------
 test/SILOptimizer/allocbox_to_stack.sil       |   9 +-
 test/SILOptimizer/stack_promotion.sil         |  37 +-
 4 files changed, 434 insertions(+), 488 deletions(-)

diff --git a/include/swift/SILOptimizer/Utils/StackNesting.h b/include/swift/SILOptimizer/Utils/StackNesting.h
index 3bd370016f9a6..4c8e4384bc75e 100644
--- a/include/swift/SILOptimizer/Utils/StackNesting.h
+++ b/include/swift/SILOptimizer/Utils/StackNesting.h
@@ -45,8 +45,11 @@ namespace swift {
 ///   dealloc_stack %1
 /// \endcode
 ///
+/// Each allocation must still be properly jointly post-dominated by
+/// its deallocations. StackNesting only fixes the nesting of allocations
+/// deallocations; it does not insert required deallocations that are
+/// missing entirely.
 class StackNesting {
-
 public:
 
   /// The possible return values of fixNesting().
@@ -61,122 +64,6 @@ class StackNesting {
     CFG
   };
 
-private:
-  typedef SmallBitVector BitVector;
-
-  /// Data stored for each block (actually for each block which is not dead).
-  struct BlockInfo {
-    /// The list of stack allocating/deallocating instructions in the block.
-    llvm::SmallVector<SILInstruction *, 8> StackInsts;
-
-    /// The bit-set of alive stack locations at the block entry.
-    BitVector AliveStackLocsAtEntry;
-
-    /// The bit-set of alive stack locations at the block exit.
-    BitVector AliveStackLocsAtExit;
-
-    /// Used in the setup function to walk over the CFG.
-    bool visited = false;
-
-    /// True for dead-end blocks, i.e. blocks from which there is no path to
-    /// a function exit, e.g. blocks which end with `unreachable` or an
-    /// infinite loop.
-    bool isDeadEnd = false;
-  };
-
-  /// Data stored for each stack location (= allocation).
-  ///
-  /// Each stack location is allocated by a single allocation instruction.
-  struct StackLoc {
-    StackLoc(SILInstruction *Alloc) : Alloc(Alloc) {}
-
-    /// Back-link to the allocation instruction.
-    SILInstruction *Alloc;
-
-    /// Bit-set which represents all alive locations at this allocation.
-    /// It obviously includes this location itself. And it includes all "outer"
-    /// locations which surround this location.
-    BitVector AliveLocs;
-  };
-
-  /// Mapping from stack allocations (= locations) to bit numbers.
-  llvm::DenseMap<SILInstruction *, unsigned> StackLoc2BitNumbers;
-
-  /// The list of stack locations. The index into this array is also the bit
-  /// number in the bit-sets.
-  llvm::SmallVector<StackLoc, 8> StackLocs;
-
-  BasicBlockData<BlockInfo> BlockInfos;
-
-  StackNesting(SILFunction *F) : BlockInfos(F) { }
-
-  /// Performs correction of stack nesting by moving stack-deallocation
-  /// instructions down the control flow.
-  ///
-  /// Returns the status of what changes were made.
-  Changes run();
-  
-  /// For debug dumping.
-  void dump() const;
-
-  static void dumpBits(const BitVector &Bits);
-
-  /// Initializes the data structures.
-  void setup();
-
-  /// Solves the dataflow problem.
-  ///
-  /// Returns true if there is a nesting of locations in any way, which can
-  /// potentially in the wrong order.
-  bool solve();
-
-  bool analyze() {
-    setup();
-    return solve();
-  }
-
-  /// Insert deallocation instructions for all locations which are alive before
-  /// the InsertionPoint (AliveBefore) but not alive after the InsertionPoint
-  /// (AliveAfter).
-  ///
-  /// Returns true if any deallocations were inserted.
-  bool insertDeallocs(const BitVector &AliveBefore, const BitVector &AliveAfter,
-                      SILInstruction *InsertionPoint,
-                      std::optional<SILLocation> Location);
-
-  /// Returns the location bit number for a stack allocation instruction.
-  int bitNumberForAlloc(SILInstruction *AllocInst) {
-    assert(AllocInst->isAllocatingStack());
-    return StackLoc2BitNumbers[AllocInst];
-  }
-
-  /// Returns the location bit number for a stack deallocation instruction.
-  int bitNumberForDealloc(SILInstruction *DeallocInst) {
-    assert(DeallocInst->isDeallocatingStack());
-    auto *AllocInst = getAllocForDealloc(DeallocInst);
-    return bitNumberForAlloc(AllocInst);
-  }
-
-  /// Returns the stack allocation instruction for a stack deallocation
-  /// instruction.
-  SILInstruction *getAllocForDealloc(SILInstruction *Dealloc) const {
-    SILValue op = Dealloc->getOperand(0);
-    while (auto *mvi = dyn_cast<MoveValueInst>(op)) {
-      op = mvi->getOperand();
-    }
-    return op->getDefiningInstruction();
-  }
-
-  /// Insert deallocations at block boundaries.
-  Changes insertDeallocsAtBlockBoundaries();
-
-  /// Modifies the SIL to end up with a correct stack nesting.
-  ///
-  /// Returns the status of what changes were made.
-  Changes adaptDeallocs();
-
-public:
-
   /// Performs correction of stack nesting by moving stack-deallocation
   /// instructions down the control flow.
   ///
diff --git a/lib/SILOptimizer/Utils/StackNesting.cpp b/lib/SILOptimizer/Utils/StackNesting.cpp
index 505539ad494dc..d38946617e3d1 100644
--- a/lib/SILOptimizer/Utils/StackNesting.cpp
+++ b/lib/SILOptimizer/Utils/StackNesting.cpp
@@ -20,400 +20,445 @@
 
 using namespace swift;
 
-void StackNesting::setup() {
-  SmallVector<SILBasicBlock *, 8> WorkList;
-
-  // Start with the function entry block and add blocks while walking down along
-  // the successor edges.
-  // This ensures a correct ordering of stack locations: an inner location has
-  // a higher bit-number than it's outer parent location.
-  // This ordering is only important for inserting multiple deallocation
-  // instructions (see below).
-  auto Entry = BlockInfos.entry();
-  WorkList.push_back(&Entry.block);
-  Entry.data.visited = true;
-
-  while (!WorkList.empty()) {
-    SILBasicBlock *Block = WorkList.pop_back_val();
-    BlockInfo &BI = BlockInfos[Block];
-    for (SILInstruction &I : *Block) {
-      if (I.isAllocatingStack()) {
-        auto Alloc = &I;
-        // Register this stack location.
-        unsigned CurrentBitNumber = StackLocs.size();
-        StackLoc2BitNumbers[Alloc] = CurrentBitNumber;
-        StackLocs.push_back(StackLoc(Alloc));
-
-        BI.StackInsts.push_back(Alloc);
-      } else if (I.isDeallocatingStack()) {
-        auto *AllocInst = getAllocForDealloc(&I);
-        if (!BI.StackInsts.empty() && BI.StackInsts.back() == AllocInst) {
-          // As an optimization, we ignore perfectly nested alloc-dealloc pairs
-          // inside a basic block.
-          // Actually, this catches most of the cases and keeps our bitsets
-          // small.
-          assert(StackLocs.back().Alloc == AllocInst);
-          StackLocs.pop_back();
-          BI.StackInsts.pop_back();
-        } else {
-          // Register the stack deallocation.
-          BI.StackInsts.push_back(&I);
-        }
+/// Run the given function exactly once on each of the reachable blocks in
+/// a SIL function. Blocks will be visited in a post-order consistent with
+/// dominance, which is to say, after all dominating blocks but otherwise
+/// in an unspecified order.
+///
+/// The function is passed a state value, which it can freely mutate. The
+/// initial value of the state will be the same as the value left in the
+/// state for an unspecified predecessor (or the initial value passed in,
+/// for the entry block of the function). Since the predecessor choice is
+/// arbitrary, you should only use this if the state is guaranteed to be
+/// the same for all predecessors. The state type must be copyable, but
+/// the algorithm makes a reasonable effort to avoid copying it.
+///
+/// This function assumes you don't change the CFG during its operation.
+template <class Fn, class State>
+void runInDominanceOrder(SILFunction &F, State &&state, const Fn &fn) {
+  // The set of blocks that have ever been enqueued onto the worklist.
+  // (We actually skip the queue in a bunch of cases, but *abstractly*
+  // they're enqueued.)
+  BasicBlockSet visitedBlocks;
+
+  // The next basic block to operate on. We always operate on `state`.
+  SILBasicBlock *curBB = F.getEntryBlock();
+
+  // We need to copy `state` whenever we enqueue a block onto the worklist.
+  // We'll then move-assign it back to `state` when we dequeue it.
+  using StateValue = std::remove_reference_t<State>;
+  SmallVector<std::pair<SILBasicBlock *, StateValue>> worklist;
+
+  while (true) {
+    // Run the function on the current block, updating the current state.
+    fn(curBB, state);
+
+    // Enqueue the successors.
+    SILBasicBlock *nextBB = nullptr;
+    for (SILBasicBlock *succBB : curBB->getSuccessorBlocks()) {
+      // If this insertion returns true, we've already enqueued the
+      // successor block, so we can skip it. This is fast enough because
+      // of BasicBlockSet that there's no point in avoiding it for
+      // single-predecessor blocks.
+      if (!visitedBlocks.insert(succBB))
+        continue;
+
+      // If we haven't found a successor to visit yet, pick this one.
+      if (!nextBB) {
+        nextBB = succBB;
+
+      // Otherwise, add it to the worklist, copying the current state.
+      } else {
+        worklist.emplace_back(succBB, /*copied*/ state);
       }
     }
-    for (SILBasicBlock *SuccBB : Block->getSuccessorBlocks()) {
-      BlockInfo &SuccBI = BlockInfos[SuccBB];
-      if (!SuccBI.visited) {
-        // Push the next reachable block onto the WorkList.
-        WorkList.push_back(SuccBB);
-        SuccBI.visited = true;
-      }
+
+    // If there's a viable direct successor, just continue along this
+    // path, editing the current state in-place.
+    if (nextBB) {
+      curBB = nextBB;
+      continue;
     }
+
+    // Otherwise, if the worklist is empty, we're done.
+    if (worklist.empty()) {
+      return;
+    }
+
+    // Otherwise, pull the next item off the worklist and overwrite the
+    // current state with the state we saved for it before.
+    auto &nextItem = worklist.back();
+    curBB = nextItem.first;
+    state = std::move(nextItem.second);
+    worklist.pop_back();
   }
+}
 
-  unsigned NumLocs = StackLocs.size();
-  for (unsigned Idx = 0; Idx < NumLocs; ++Idx) {
-    StackLocs[Idx].AliveLocs.resize(NumLocs);
-    // Initially each location gets it's own alive-bit.
-    StackLocs[Idx].AliveLocs.set(Idx);
+/// Returns the stack allocation instruction for a stack deallocation
+/// instruction.
+static SILInstruction *getAllocForDealloc(SILInstruction *dealloc) {
+  SILValue op = dealloc->getOperand(0);
+  while (auto *mvi = dyn_cast<MoveValueInst>(op)) {
+    op = mvi->getOperand();
   }
+  return op->getDefiningInstruction();
 }
 
-bool StackNesting::solve() {
-  bool changed = false;
-  bool isNested = false;
-  BitVector Bits(StackLocs.size());
+/// Create a dealloc for a particular allocation.
+///
+/// This is expected to work for all allocations that don't have
+/// properly-nested deallocations. It's fine to have a kind of allocation
+/// that you can't do this for, as long as as it's always explicitly
+/// deallocated on all paths. This pass doesn't change any allocations
+/// or deallocations that are properly nested already.
+///
+/// Only allocations whose deallocations return true from canMoveDealloc
+/// need to support this.
+static void createDealloc(SILBuilder &B, SILLocation loc, SILInstruction *alloc) {
+  switch (alloc->getKind()) {
+  case SILInstructionKind::PartialApplyInst:
+  case SILInstructionKind::AllocStackInst:
+    assert((isa<AllocStackInst>(alloc) ||
+            cast<PartialApplyInst>(alloc)->isOnStack()) &&
+           "wrong instruction");
+    B.createDeallocStack(loc, cast<SingleValueInstruction>(alloc));
+    return;
+  case SILInstructionKind::BeginApplyInst: {
+    auto *bai = cast<BeginApplyInst>(alloc);
+    assert(bai->isCalleeAllocated());
+    B.createDeallocStack(loc, bai->getCalleeAllocationResult());
+    return;
+  }
+  case SILInstructionKind::AllocRefDynamicInst:
+  case SILInstructionKind::AllocRefInst:
+    assert(cast<AllocRefInstBase>(alloc)->canAllocOnStack());
+    B.createDeallocStackRef(loc, cast<AllocRefInstBase>(alloc));
+    return;
+  case SILInstructionKind::AllocPackInst:
+    B.createDeallocPack(loc, cast<AllocPackInst>(alloc));
+    return;
+  case SILInstructionKind::BuiltinInst: {
+    auto *bi = cast<BuiltinInst>(alloc);
+    auto &ctx = alloc->getFunction()->getModule().getASTContext();
 
-  StackList<SILBasicBlock *> deadEndWorklist(BlockInfos.getFunction());
+    switch (*bi->getBuiltinKind()) {
+    case BuiltinValueKind::StackAlloc:
+    case BuiltinValueKind::UnprotectedStackAlloc: {
+      auto identifier =
+        ctx.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc));
+      B.createBuiltin(loc, identifier,
+                      SILType::getEmptyTupleType(ctx),
+                      SubstitutionMap(), {bi});
+      return;
+    }
+    default:
+      llvm_unreachable("unknown stack allocation builtin");
+    }
+  }
+  case SILInstructionKind::AllocPackMetadataInst:
+    B.createDeallocPackMetadata(loc, cast<AllocPackMetadataInst>(alloc));
+    return;
+  default:
+    llvm_unreachable("unknown stack allocation");
+  }
+}
 
-  // Initialize all bit fields to 1s, expect 0s for the entry block.
-  bool initVal = false;
-  for (auto bd : BlockInfos) {
-    bd.data.AliveStackLocsAtEntry.resize(StackLocs.size(), initVal);
-    initVal = true;
+namespace {
+class ActiveAllocation {
+  llvm::PointerIntPair<SILInstruction*, 1, bool> valueAndIsPending;
 
-    bd.data.isDeadEnd = !bd.block.getTerminator()->isFunctionExiting();
-    if (!bd.data.isDeadEnd)
-      deadEndWorklist.push_back(&bd.block);
+public:
+  ActiveAllocation(SILInstruction *value) : valueAndIsPending(value, false) {}
+
+  SILInstruction *getValue() const {
+    return valueAndIsPending.getPointer();
   }
 
-  // Calculate the isDeadEnd block flags.
-  while (!deadEndWorklist.empty()) {
-    SILBasicBlock *b = deadEndWorklist.pop_back_val();
-    for (SILBasicBlock *pred : b->getPredecessorBlocks()) {
-      BlockInfo &bi = BlockInfos[pred];
-      if (bi.isDeadEnd) {
-        bi.isDeadEnd = false;
-        deadEndWorklist.push_back(pred);
-      }
-    }
+  bool isPending() const {
+    return valueAndIsPending.getInt();
   }
 
-  // First step: do a forward dataflow analysis to get the live stack locations
-  // at the block exits.
-  // This is necessary to get the live locations at dead-end blocks (otherwise
-  // the backward data flow would be sufficient).
-  // The special thing about dead-end blocks is that it's okay to have alive
-  // locations at that point (e.g. at an `unreachable`) i.e. locations which are
-  // never dealloced. We cannot get such locations with a purely backward
-  // dataflow.
-  do {
-    changed = false;
-
-    for (auto bd : BlockInfos) {
-      Bits = bd.data.AliveStackLocsAtEntry;
-      for (SILInstruction *StackInst : bd.data.StackInsts) {
-        if (StackInst->isAllocatingStack()) {
-          Bits.set(bitNumberForAlloc(StackInst));
-        } else if (StackInst->isDeallocatingStack()) {
-          Bits.reset(bitNumberForDealloc(StackInst));
-        }
-      }
-      if (Bits != bd.data.AliveStackLocsAtExit) {
-        bd.data.AliveStackLocsAtExit = Bits;
-        changed = true;
-      }
-      // Merge the bits into the successors.
-      for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) {
-        BlockInfos[SuccBB].AliveStackLocsAtEntry &= Bits;
-      }
+  void setPending() {
+    assert(!isPending());
+    valueAndIsPending.setInt(true);
+  }
+};
+
+struct State {
+  // The active allocations and whether they're pending deallocation.
+  SmallVector<ActiveAllocation, 4> allocations;
+
+#ifndef NDEBUG
+  SWIFT_ATTRIBUTE_NORETURN
+  void abortForUnknownAllocation(SILInstruction *alloc,
+                                 SILInstruction *dealloc) {
+    llvm::errs() << "fatal error: StackNesting could not find record of "
+                    "allocation for deallocation:\n  "
+                 << *dealloc
+                 << "Allocation might not be jointly post-dominated. "
+                    "Current stack:\n";
+    for (auto i : indices(allocations)) {
+      llvm::errs() << "[" << i << "] "
+                   << (allocations[i].isPending() ? "(pending) " : "")
+                   << *allocations[i].getValue();
     }
-  } while (changed);
+    llvm::errs() << "Complete function:\n";
+    alloc->getFunction()->dump();
+    abort();
+  }
+#endif
+};
 
-  // Second step: do a backward dataflow analysis to extend the lifetimes of
-  // not properly nested allocations.
-  do {
-    changed = false;
+} // end anonymous namespace
 
-    for (auto bd : llvm::reverse(BlockInfos)) {
-      // Collect the alive-bits (at the block exit) from the successor blocks.
-      for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) {
-        bd.data.AliveStackLocsAtExit |= BlockInfos[SuccBB].AliveStackLocsAtEntry;
-      }
-      Bits = bd.data.AliveStackLocsAtExit;
-      assert(!(bd.data.visited && bd.block.getTerminator()->isFunctionExiting()
-               && Bits.any())
-             && "stack location is missing dealloc");
-
-      if (bd.data.isDeadEnd) {
-        // We treat `unreachable` as an implicit deallocation for all locations
-        // which are still alive at this point. The same is true for dead-end
-        // CFG regions due to an infinite loop.
-        for (int BitNr = Bits.find_first(); BitNr >= 0;
-             BitNr = Bits.find_next(BitNr)) {
-          // For each alive location extend the lifetime of all locations which
-          // are alive at the allocation point. This is the same as we do for
-          // a "real" deallocation instruction (see below).
-          // In dead-end CFG regions we have to do that for all blocks (because
-          // of potential infinite loops), whereas in "normal" CFG regions it's
-          // sufficient to do it at deallocation instructions.
-          Bits |= StackLocs[BitNr].AliveLocs;
-        }
-        bd.data.AliveStackLocsAtExit = Bits;
-      }
-      for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) {
-        if (StackInst->isAllocatingStack()) {
-          int BitNr = bitNumberForAlloc(StackInst);
-          if (Bits != StackLocs[BitNr].AliveLocs) {
-            // More locations are alive around the StackInst's location.
-            // Update the AliveLocs bitset, which contains all those alive
-            // locations.
-            assert(Bits.test(BitNr) && "no dealloc found for alloc stack");
-            StackLocs[BitNr].AliveLocs = Bits;
-            changed = true;
-            isNested = true;
-          }
-          // The allocation ends the lifetime of it's stack location (in reverse
-          // order)
-          Bits.reset(BitNr);
-        } else if (StackInst->isDeallocatingStack()) {
-          // A stack deallocation begins the lifetime of its location (in
-          // reverse order). And it also begins the lifetime of all other
-          // locations which are alive at the allocation point.
-          Bits |= StackLocs[bitNumberForDealloc(StackInst)].AliveLocs;
-        }
-      }
-      if (Bits != bd.data.AliveStackLocsAtEntry) {
-        bd.data.AliveStackLocsAtEntry = Bits;
-        changed = true;
-      }
+using IndexForAllocationMap = llvm::DenseMap<SILInstruction*, size_t>;
+
+/// Flag that a particular allocation is pending.
+static void setAllocationAsPending(State &state, SILInstruction *alloc,
+                                   SILInstruction *dealloc,
+                                   IndexForAllocationMap &indexForAllocation) {
+  auto stack = MutableArrayRef(state.allocations);
+  assert(!stack.empty());
+
+  // Just ignore the top entry in all of this; we know it doesn't match
+  // the allocation.
+  assert(stack.back().getValue() != alloc);
+  stack = stack.drop_back();
+
+  // Ultimately, we're just calling setPending() on the entry matching
+  // `alloc` in the allocations stack. All the complexity has to do with
+  // trying to avoid super-linear behavior while also trying very hard
+  // to avoid actually using indexForAllocation for simple cases.
+
+  // It's very common for allocations to never be improperly nested,
+  // so we don't want to eagerly add allocations to indexForAllocation
+  // when we encounter them. This means we can't rely on it having
+  // an entry for `alloc` now.
+
+  // `alloc` is very likely to be close to the top of the stack. Just do
+  // a short linear scan there first. This might be slightly slower than
+  // a hash lookup in the worst case, but usually it means we can avoid
+  // adding any entries to indexForAllocation at all. Even for this case
+  // where nesting is broken, that's still worthwhile to do.
+  const size_t linearScanLimit = 8;
+  auto linearScanEntries = stack.take_back(linearScanLimit);
+  for (auto &entry : linearScanEntries) {
+    if (entry.getValue() == alloc) {
+      entry.setPending();
+      return;
     }
-  } while (changed);
-  
-  return isNested;
-}
+  }
+
+  // Okay, so much for that, time for the hashtable.
+
+#ifndef NDEBUG
+  if (stack.size() <= linearScanLimit) {
+    state.abortForUnknownAllocation(alloc, dealloc);
+  }
+#endif
 
-static SILInstruction *createDealloc(SILInstruction *Alloc,
-                                     SILInstruction *InsertionPoint,
-                                     SILLocation Location) {
-  SILBuilderWithScope B(InsertionPoint);
-  switch (Alloc->getKind()) {
-    case SILInstructionKind::PartialApplyInst:
-    case SILInstructionKind::AllocStackInst:
-      assert((isa<AllocStackInst>(Alloc) ||
-              cast<PartialApplyInst>(Alloc)->isOnStack()) &&
-             "wrong instruction");
-      return B.createDeallocStack(Location,
-                                  cast<SingleValueInstruction>(Alloc));
-    case SILInstructionKind::BeginApplyInst: {
-      auto *bai = cast<BeginApplyInst>(Alloc);
-      assert(bai->isCalleeAllocated());
-      return B.createDeallocStack(Location, bai->getCalleeAllocationResult());
+  // We don't need to consider entries that we've already linearly scanned.
+  stack = stack.drop_back(linearScanLimit);
+
+  // Check if the entry's already in the hashtable.
+  if (auto it = indexForAllocation.find(alloc); it != indexForAllocation.end()) {
+    auto index = it->second;
+    assert(stack[index].getValue() == alloc);
+    stack[index].setPending();
+    return;
+  }
+
+  // Fill in any missing entries in indexForAllocations.
+  //
+  // The invariant we maintain is that there may be allocations at the
+  // top of the stack that aren't hashed, but once we reach a hashed
+  // entry, everything beneath it is hashed. The first half of this
+  // is necessary because we don't eagerly add allocations to the table,
+  // but it's also what makes it okay that we skip the entries we
+  // linearly scanned. The second half of this means that, if we start
+  // adding entries from the top down, we can stop hashing once we find
+  // that the entries we're adding are redundant. That's what keeps this
+  // O(N).
+  //
+  // All of this caching is relying on us (1) never revisiting a block
+  // and (2) never changing the active-allocations stack except via push
+  // and pop.
+
+  // Look for the target allocation index in this loop rather than doing
+  // a hash lookup at the end.
+  std::optional<size_t> foundIndexForAlloc;
+
+  for (size_t onePast = stack.size(); onePast != 0; --onePast) {
+    size_t entryIndex = onePast - 1;
+    auto entryAlloc = stack[entryIndex].getValue();
+
+    // Remember this if it's the allocation we're looking for.
+    if (entryAlloc == alloc) {
+      foundIndexForAlloc = entryIndex;
     }
-    case SILInstructionKind::AllocRefDynamicInst:
-    case SILInstructionKind::AllocRefInst:
-      assert(cast<AllocRefInstBase>(Alloc)->canAllocOnStack());
-      return B.createDeallocStackRef(Location, cast<AllocRefInstBase>(Alloc));
-    case SILInstructionKind::AllocPackInst:
-      return B.createDeallocPack(Location, cast<AllocPackInst>(Alloc));
-    case SILInstructionKind::BuiltinInst: {
-      auto *bi = cast<BuiltinInst>(Alloc);
-      assert(bi->getBuiltinKind() == BuiltinValueKind::StackAlloc ||
-             bi->getBuiltinKind() == BuiltinValueKind::UnprotectedStackAlloc);
-      auto &context = Alloc->getFunction()->getModule().getASTContext();
-      auto identifier =
-          context.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc));
-      return B.createBuiltin(Location, identifier,
-                             SILType::getEmptyTupleType(context),
-                             SubstitutionMap(), {bi});
+
+    // Add this entry to the hashtable. Stop hashing as soon as this fails.
+    auto insertResult = indexForAllocation.insert({entryAlloc, entryIndex});
+    if (!insertResult.second) {
+      continue;
     }
-    case SILInstructionKind::AllocPackMetadataInst:
-      return B.createDeallocPackMetadata(Location,
-                                         cast<AllocPackMetadataInst>(Alloc));
-    default:
-      llvm_unreachable("unknown stack allocation");
   }
+
+#ifndef NDEBUG
+  if (!foundIndexForAlloc) {
+    state.abortForUnknownAllocation(alloc, dealloc);
+  }
+#endif
+
+  stack[*foundIndexForAlloc].setPending();
 }
 
-bool StackNesting::insertDeallocs(const BitVector &AliveBefore,
-                                  const BitVector &AliveAfter,
-                                  SILInstruction *InsertionPoint,
-                                  std::optional<SILLocation> Location) {
-  if (!AliveBefore.test(AliveAfter))
-    return false;
-
-  // The order matters here if we have to insert more than one
-  // deallocation. We already ensured in setup() that the bit numbers
-  // are allocated in the right order.
-  bool changesMade = false;
-  for (int LocNr = AliveBefore.find_first(); LocNr >= 0;
-       LocNr = AliveBefore.find_next(LocNr)) {
-    if (!AliveAfter.test(LocNr)) {
-      auto *Alloc = StackLocs[LocNr].Alloc;
-      InsertionPoint = createDealloc(Alloc, InsertionPoint,
-                   Location.has_value() ? Location.value() : Alloc->getLoc());
-      changesMade = true;
+/// Pop and emit deallocations for any allocations on top of the
+/// active allocations stack that are pending deallocation.
+///
+/// This operation is called whenever we pop an allocation; it
+/// restores the invariant that the top of the stack is never in a
+/// pending state.
+static void emitPendingDeallocations(State &state,
+                                     SILInstruction *insertAfterDealloc,
+                                     bool &madeChanges) {
+  std::optional<SILBuilderWithScope> builder;
+
+  while (!state.allocations.empty() &&
+         state.allocations.back().isPending()) {
+    auto entry = state.allocations.pop_back_val();
+    SILInstruction *alloc = entry.getValue();
+
+    // Create a builder that inserts after the initial dealloc, if we
+    // haven't already. Re-using the same builder for subsequent deallocs
+    // means we order them correctly w.r.t each other, which we wouldn't
+    // if we made a fresh builder after the initial dealloc each time.
+    if (!builder) {
+      // We want to use the location of (and inherit debug scopes from)
+      // the initial dealloc that we're inserting after.
+      builder.emplace(/*insertion point*/
+                        std::next(insertAfterDealloc->getIterator()),
+                      /*inherit scope from*/insertAfterDealloc);
     }
+
+    createDealloc(*builder, insertAfterDealloc->getLoc(), alloc);
+    madeChanges = true;
   }
-  return changesMade;
 }
 
-// Insert deallocations at block boundaries.
-// This can be necessary for unreachable blocks. Example:
-//
-//   %1 = alloc_stack
-//   %2 = alloc_stack
-//   cond_br %c, bb2, bb3
-// bb2: <--- need to insert a dealloc_stack %2 at the begin of bb2
-//   dealloc_stack %1
-//   unreachable
-// bb3:
-//   dealloc_stack %2
-//   dealloc_stack %1
-StackNesting::Changes StackNesting::insertDeallocsAtBlockBoundaries() {
-  Changes changes = Changes::None;
-   for (auto bd : llvm::reverse(BlockInfos)) {
-    // Collect the alive-bits (at the block exit) from the successor blocks.
-    for (auto succAndIdx : llvm::enumerate(bd.block.getSuccessorBlocks())) {
-      BlockInfo &SuccBI = BlockInfos[succAndIdx.value()];
-      if (SuccBI.AliveStackLocsAtEntry == bd.data.AliveStackLocsAtExit)
-        continue;
+/// The main entrypoint for clients.
+StackNesting::Changes StackNesting::fixNesting(SILFunction *F) {
+  bool madeChanges = false;
+
+  // The index in the allocation stack for each allocation. Multiple
+  // allocations can map to the same index, since ultimately it's a stack;
+  // we should be looking up an allocation while it's still in use.
+  // This is very lazily filled in, because we don't want to do unnecessary
+  // work if nothing is unscoped. See setAllocationAsPending for invariants.
+  // This function never accesses it directly.
+  IndexForAllocationMap indexForAllocation;
+
+  // Visit each block of the function in an order consistent with dominance.
+  // The state represents the stack of active allocations, so it's appropriate
+  // that it starts with an empty stack. We're not worried about states
+  // potentially being different for different paths to the same block because
+  // that can only happen if deallocations don't properly post-dominate
+  // their allocations.
+  runInDominanceOrder(*F, State(), [&](SILBasicBlock *B, State &state) {
+
+    // We can't use a foreach loop because we sometimes remove the
+    // current instruction or add instructions (that we shouldn't visit)
+    // after it. Advancing the iterator immediately within the loop is
+    // sufficient to protect against both.
+    for (auto II = B->begin(), IE = B->end(); II != IE; ) {
+      SILInstruction *I = &*II++;
+
+      // Invariant: the top of the stack is never pending.
+      assert(state.allocations.empty() ||
+             !state.allocations.back().isPending());
 
-      // Insert deallocations for all locations which are alive at the end of
-      // the current block, but not at the begin of the successor block.
-      SILBasicBlock *InsertionBlock = succAndIdx.value();
-      if (!InsertionBlock->getSinglePredecessorBlock()) {
-        // If the current block is not the only predecessor of the successor
-        // block, we have to insert a new block where we can add the
-        // deallocations.
-        InsertionBlock = splitEdge(bd.block.getTerminator(), succAndIdx.index());
-        changes = Changes::CFG;
+      // Push allocations onto the current stack in the non-pending state.
+      if (I->isAllocatingStack()) {
+        state.allocations.push_back(I);
+        continue;
       }
-      if (insertDeallocs(bd.data.AliveStackLocsAtExit,
-                         SuccBI.AliveStackLocsAtEntry, &InsertionBlock->front(),
-                         std::nullopt)) {
-        if (changes == Changes::None)
-          changes = Changes::Instructions;
+
+      // Ignore instructions other than allocations and deallocations.
+      if (!I->isDeallocatingStack()) {
+        continue;
       }
-    }
-  }
-  return changes;
-}
 
-StackNesting::Changes StackNesting::adaptDeallocs() {
-  bool InstChanged = false;
-  BitVector Bits(StackLocs.size());
-
-  // Visit all blocks. Actually the order doesn't matter, but let's to it in
-  // the same order as in solve().
-   for (auto bd : llvm::reverse(BlockInfos)) {
-    Bits = bd.data.AliveStackLocsAtExit;
-
-    // Insert/remove deallocations inside blocks.
-    for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) {
-      if (StackInst->isAllocatingStack()) {
-        // For allocations we just update the bit-set.
-        int BitNr = bitNumberForAlloc(StackInst);
-        assert(Bits == StackLocs[BitNr].AliveLocs &&
-               "dataflow didn't converge");
-        Bits.reset(BitNr);
-      } else if (StackInst->isDeallocatingStack()) {
-        // Handle deallocations.
-        SILLocation Loc = StackInst->getLoc();
-        int BitNr = bitNumberForDealloc(StackInst);
-        SILInstruction *InsertionPoint = &*std::next(StackInst->getIterator());
-        if (Bits.test(BitNr)) {
-          // The location of StackInst is alive after StackInst. So we have to
-          // remove this deallocation.
-          StackInst->eraseFromParent();
-          InstChanged = true;
-        } else {
-          // Avoid inserting another deallocation for BitNr (which is already
-          // StackInst).
-          Bits.set(BitNr);
-        }
-
-        // Insert deallocations for all locations which are not alive after
-        // StackInst but _are_ alive at the StackInst.
-        InstChanged |= insertDeallocs(StackLocs[BitNr].AliveLocs, Bits,
-                                      InsertionPoint, Loc);
-        Bits |= StackLocs[BitNr].AliveLocs;
+      // Get the allocation for the deallocation.
+      SILInstruction *dealloc = I;
+      SILInstruction *alloc = getAllocForDealloc(dealloc);
+
+#ifndef NDEBUG
+      if (state.allocations.empty()) {
+        state.abortForUnknownAllocation(alloc, dealloc);
       }
-    }
-    assert(Bits == bd.data.AliveStackLocsAtEntry && "dataflow didn't converge");
-  }
-  return InstChanged ? Changes::Instructions : Changes::None;
-}
+#endif
 
-StackNesting::Changes StackNesting::fixNesting(SILFunction *F) {
-  Changes changes = Changes::None;
-  {
-    StackNesting SN(F);
-    if (!SN.analyze())
-      return Changes::None;
-
-    // Insert deallocs at block boundaries. This might be necessary in CFG sub
-    // graphs which don't reach a function exit, but only an unreachable.
-    changes = SN.insertDeallocsAtBlockBoundaries();
-    if (changes == Changes::None) {
-      // Do the real work: extend lifetimes by moving deallocs.
-      return SN.adaptDeallocs();
-    }
-  }
-  {
-    // Those inserted deallocs make it necessary to re-compute the analysis.
-    StackNesting SN(F);
-    SN.analyze();
-    // Do the real work: extend lifetimes by moving deallocs.
-    return std::max(SN.adaptDeallocs(), changes);
-  }
-}
+      // If the allocation is the top of the allocations stack, we can
+      // leave it alone.
+      if (alloc == state.allocations.back().getValue()) {
+        // Pop off our record of the allocation.
+        state.allocations.pop_back();
+
+        // We may need to emit any pending deallocations still on the stack.
+        // Pop and emit them in order.
+        emitPendingDeallocations(state, /*after*/ dealloc, madeChanges);
 
-void StackNesting::dump() const {
-  for (auto bd : BlockInfos) {
-    llvm::dbgs() << "Block " << bd.block.getDebugID();
-    if (bd.data.isDeadEnd)
-      llvm::dbgs() << "(deadend)";
-    llvm::dbgs() << ": entry-bits=";
-    dumpBits(bd.data.AliveStackLocsAtEntry);
-    llvm::dbgs() << ": exit-bits=";
-    dumpBits(bd.data.AliveStackLocsAtExit);
-    llvm::dbgs() << '\n';
-    for (SILInstruction *StackInst : bd.data.StackInsts) {
-      if (StackInst->isAllocatingStack()) {
-        auto AllocInst = StackInst;
-        int BitNr = StackLoc2BitNumbers.lookup(AllocInst);
-        llvm::dbgs() << "  alloc #" << BitNr << ": alive=";
-        dumpBits(StackLocs[BitNr].AliveLocs);
-        llvm::dbgs() << ",     " << *StackInst;
-      } else if (StackInst->isDeallocatingStack()) {
-        auto *AllocInst = getAllocForDealloc(StackInst);
-        int BitNr = StackLoc2BitNumbers.lookup(AllocInst);
-        llvm::dbgs() << "  dealloc for #" << BitNr << "\n"
-                        "    " << *StackInst;
+        continue;
       }
+
+      // Otherwise, just remove the deallocation and set the allocation
+      // as having a pending deallocation on this path.
+      //
+      // When we mark `alloc` as having a pending deallocation, we are
+      // deferring its deallocation on this path to the deallocation
+      // points of some non-pending allocation that's on top of it on the
+      // allocation stack. That may not end up being the current top of
+      // the stack:
+      //
+      // - the current top may itself get deferred later, e.g.
+      //
+      //     %alloc = alloc_stack $Int
+      //     %top = alloc_stack $Int
+      //     dealloc_stack %alloc       // gets deferred
+      //     %new = alloc_stack $Int
+      //     dealloc_stack %top         // gets deferred
+      //     dealloc_stack %new         // dealloc %alloc and %top after this
+      //
+      // - there might be some other allocation between `alloc` and the
+      //   current that "inherits" the deferral after we dealloc the
+      //   current top, e.g.:
+      //
+      //     %alloc = alloc_stack $Int
+      //     %middle = alloc_stack $Int
+      //     %top = alloc_stack $Int
+      //     dealloc_stack %alloc       // gets deferred
+      //     dealloc_stack %top
+      //     dealloc_stack %middle      // dealloc %alloc after this
+      //
+      // The key is that, no matter what happens, there's always a last
+      // thing on the stack above `alloc` that hasn't been deferred yet.
+      // Joint post-dominance means that, on every path we can reach
+      // from this point, we'll either eventually reach a dealloc for
+      // that last undeferred allocation (in which case, we'll deallocate
+      // `alloc` then), or we'll reach a dead-end (in which case it's fine
+      // that we never deallocated `alloc`). And whatever those points
+      // are, replacing this deallocation of `alloc` with those points
+      // will re-establish the joint post-dominance of `alloc` by its
+      // deallocations with respect to this path.
+
+      dealloc->eraseFromParent();
+      madeChanges = true;
+      setAllocationAsPending(state, alloc, dealloc, indexForAllocation);
     }
-  }
-}
+  });
 
-void StackNesting::dumpBits(const BitVector &Bits) {
-  llvm::dbgs() << '<';
-  const char *separator = "";
-  for (int Bit = Bits.find_first(); Bit >= 0; Bit = Bits.find_next(Bit)) {
-    llvm::dbgs() << separator << Bit;
-    separator = ",";
-  }
-  llvm::dbgs() << '>';
+  // We never make changes to the CFG.
+  return (madeChanges ? Changes::Instructions : Changes::None);
 }
 
 namespace swift::test {
diff --git a/test/SILOptimizer/allocbox_to_stack.sil b/test/SILOptimizer/allocbox_to_stack.sil
index 26f568cf31549..4ffe484cf2156 100644
--- a/test/SILOptimizer/allocbox_to_stack.sil
+++ b/test/SILOptimizer/allocbox_to_stack.sil
@@ -1086,6 +1086,9 @@ bb3:
   %3 = load %as1 : $*Bool
   unreachable
 }
+//   StackNesting used to split the edge to bb3, but that's not really
+//   necessary. Also, this input very arguably should not be allowed
+//   because there isn't a consistent stack depth on entry to bb3.
 // CHECK-LABEL: sil @nesting_and_unreachable_critical_edge
 // CHECK:      bb0(%0 : $Int):
 // CHECK-NEXT:   [[BOX:%[0-9]+]] = alloc_stack $Int
@@ -1095,17 +1098,15 @@ bb3:
 // CHECK-NEXT:   [[STACK2:%[0-9]+]] = alloc_stack $Bool
 // CHECK-NEXT:   cond_br
 // CHECK:      bb2:
-// CHECK-NEXT:   dealloc_stack [[STACK2]]
-// CHECK-NEXT:   br bb4
-// CHECK:      bb3:
 // CHECK:        store {{%[0-9]+}}
 // CHECK:        dealloc_stack [[STACK2]]
 // CHECK-NEXT:   dealloc_stack [[STACK1]]
 // CHECK-NEXT:   dealloc_stack [[BOX]]
 // CHECK-NEXT:   tuple
 // CHECK-NEXT:   return
-// CHECK:      bb4:
+// CHECK:      bb3:
 // CHECK-NEXT:   unreachable
+// CHECK:        // end sil function 'nesting_and_unreachable_critical_edge'
 sil @nesting_and_unreachable_critical_edge : $(Int) -> () {
 bb0(%0 : $Int):
   %1 = alloc_box ${ var Int }
diff --git a/test/SILOptimizer/stack_promotion.sil b/test/SILOptimizer/stack_promotion.sil
index 40edd69d12807..dfb588229ff91 100644
--- a/test/SILOptimizer/stack_promotion.sil
+++ b/test/SILOptimizer/stack_promotion.sil
@@ -1058,19 +1058,32 @@ bb0(%0 : $Int, %another: $Array<Int>):
   return %24 : $()
 }
 
-
+//   The input to StackNesting seems sketchy here; it isn't really
+//   obeying joint post-dominance, and is just skating by under the
+//   laxer rules for blocks that don't exit.
 // CHECK-LABEL: sil @promote_with_unreachable_block_nest_bug
-// CHECK: bb3:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb4:
-// CHECK: br bb5
-// CHECK: bb5:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb6:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb11:
-// CHECK: alloc_ref [stack] $XX
-// CHECK: return
+// CHECK:      bb3:
+// CHECK-NEXT:   strong_release [[APPLY_RESULT:%[0-9]+]]
+// CHECK-NEXT:   dealloc_stack_ref [[ALLOC:%[0-9]+]]
+// CHECK-NEXT:   br bb6
+// CHECK:      bb4:
+// CHECK-NEXT:   integer_literal
+// CHECK-NEXT:   struct
+// CHECK-NEXT:   br bb13
+// CHECK:      bb5:
+// CHECK-NEXT:   strong_release [[APPLY_RESULT]]
+// CHECK-NEXT:   dealloc_stack_ref [[ALLOC]]
+// CHECK-NEXT:   br bb6
+// CHECK:      bb10:
+// CHECK-NEXT:   [[ALLOC]] = alloc_ref [stack] $XX
+// CHECK-NEXT:   // function_ref
+// CHECK-NEXT:   function_ref
+// CHECK-NEXT:   [[APPLY_RESULT]] = apply
+// CHECK-NEXT:   ref_element_addr
+// CHECK-NEXT:   load
+// CHECK-NEXT:   cond_br undef, bb2, bb4
+// CHECK:      bb13(
+// CHECK-NEXT:   unreachable
 sil @promote_with_unreachable_block_nest_bug : $@convention(thin) () -> Int32 {
 bb0:
   %0 = alloc_stack $Builtin.Int32                 // user: %30