diff --git a/docs/SIL/SIL.md b/docs/SIL/SIL.md
index d7cc716ca4f16..d82414a639fad 100644
--- a/docs/SIL/SIL.md
+++ b/docs/SIL/SIL.md
@@ -780,6 +780,178 @@ _lexical_ in order to specify this property for all contributing lifetimes.
 For details see [Variable Lifetimes](Ownership.md#variable-lifetimes) in the
 Ownership document.
 
+# Dominance
+
+## Value and instruction dominance
+
+Whenever an instruction uses a [value](#values-and-operands) as an
+operand, the definition of the value must dominate the instruction.
+This is a common concept across all SSA-like representations. SIL
+uses a standard definition of dominance, modified slightly to account
+for SIL's use of basic block arguments rather than phi instructions:
+
+- The value `undef` always dominates an instruction.
+
+- An instruction result `R` dominates an instruction `I` if the
+  instruction that defines `R` dominates `I`.
+
+- An argument of a basic block `B` dominates an instruction `I` if all
+  initial paths passing through `I` must also pass through the start
+  of `B`.
+
+An instruction `D` dominates another instruction `I` if they are
+different instructions and all initial paths passing through `I`
+must also pass through `D`.
+
+See [below](#definition-of-a-path) for the formal definition of an
+initial path.
+
+## Basic block dominance
+
+A basic block `B1` dominates a basic block `B2` if they are different
+blocks and if all initial paths passing through the start of `B2` must
+also pass through through the start of `B1`.
+
+This relationship between blocks can be thought of as creating a
+directed acyclic graph of basic blocks, called the *dominance tree*.
+The dominance tree is not directly represented in SIL; it is just
+an emergent property of the dominance requirement on SIL functions.
+
+## Joint post-dominance
+
+Certain instructions are required to have a *joint post-dominance*
+relationship with certain other instructions. Informally, this means
+that all terminating paths through the first instruction must
+eventually pass through one of the others. This is common for
+instructions that define a scope in the SIL function, such as
+`alloc_stack` and `begin_access`.
+
+The dominating instruction is called the *scope instruction*,
+and the post-dominating instructions are called the *scope-ending
+instructions*. The specific joint post-dominance requirement
+defines the set of instructions that count as scope-ending
+instructions for the begin instruction.
+
+For example, an `alloc_stack` instruction must be jointly
+post-dominated by the set of `dealloc_stack` instructions
+whose operand is the result of the `alloc_stack`. The
+`alloc_stack` is the scope instruction, and the `dealloc_stack`s
+are the scope-ending instructions.
+
+The *scope* of a joint post-dominance relationship is the set
+of all points in the function following the scope instruction
+but prior to a scope-ending instruction. Making this precisely
+defined is part of the point of the joint post-dominance rules.
+A formal definition is given later.
+
+In SIL, if an instruction acts as a scope instruction, it always
+has exactly one set of scope-ending instructions associated
+with it, and so it forms exactly one scope. People will therefore
+often talk about, e.g., the scope of an `alloc_stack`, meaning
+the scope between it and its `dealloc_stack`s. Furthermore,
+there are no instructions in SIL which act as scope-ending
+instructions for multiple scopes.
+
+A scope instruction `I` is jointly post-dominated by its
+scope-ending instructions if:
+
+- All initial paths that pass through a scope-ending instruction
+  of `I` must also pass through `I`. (This is just the normal
+  dominance rule, and it is typically already required by the
+  definition of the joint post-dominance relationship. For example,
+  a `dealloc_stack` must be dominated by its associated
+  `alloc_stack` because it uses its result as an operand.)
+
+- All initial paths that pass through `I` twice must also pass
+  through a scope-ending instruction of `I` in between.
+
+- All initial paths that pass through a scope-ending instruction
+  of `I` twice must also pass through `I` in between.
+
+- All terminating initial paths that pass through `I` must also
+  pass through a scope-ending instruction of `I`.
+
+In other words, all paths must strictly alternate between `I`
+and its scope-ending instructions, starting with `I` and (if
+the path exits) ending with a scope-ending instruction.
+
+Note that a scope-ending instruction does not need to appear on
+a path following a scope instruction if the path doesn't exit
+the function. In fact, a function needn't include any scope-ending
+instructions for a particular scope instruction if all paths from
+that point are non-terminating, such as by ending in `unreachable`
+or containing an infinite loop.
+
+The scope defined by a joint post-dominance relationship for
+a scope instruction `I` is the set of points in the function for
+which there exists an initial path that visits that point and
+which passes through `I` but which does not pass through a
+scope-ending instruction of `I` under that relationship. Note
+that the point before a scope-ending instruction is always within
+the scope.
+
+## Definition of a path
+
+A *point* in a SIL function is the moment before an instruction.
+Every basic block has an entry point, which is the point before
+its first instruction. The entry point of the entry block is also
+called the entry point of the function.
+
+A path through a SIL function is a path (in the usual graph-theory
+sense) in the underlying directed graph of points, in which:
+
+- every point in the SIL function is a vertex in the graph,
+
+- each non-terminator instruction creates an edge from the point
+  before it to the point after it, and
+
+- each terminator instruction creates edges from the point before
+  the terminator to the initial point of each its successor blocks.
+
+A path is said to pass through an instruction if it includes
+any of the edges created by that instruction. A path is said to
+pass through the start of a basic block if it visits the entry
+point of that block.
+
+An *initial path* is a path which begins at the entry point of the
+function. A *terminating path* is a path which ends at the point
+before an exiting instruction, such as `return` or `throw`.
+
+Note that the dominance rules generally require only an initial path,
+not a terminating path. A path that simply stops in the middle of a
+block still counts for dominance. Among other things, this ensures that
+dominance holds in blocks that are part of an infinite loop.
+
+Note also that paths consider successors without regard to the
+nature of the terminator. Paths that are provably impossible because
+of value relationships still count for dominance. For example,
+consider the following function:
+
+```
+    bb0(%cond : $Builtin.Int1):
+      cond_br %cond, bb1, b22
+    bb1:
+      %value = integer_literal $Builtin.Int32, 0
+      br bb3
+    bb2:
+      br bb3
+    bb3:
+      cond_br %cond, bb4, bb5
+    bb4:
+      %twice_value = builtin "add_Int32"(%value, %value) : $Builtin.Int32
+      br bb6
+    bb5:
+      br bb6
+    bb6:
+      ret %cond
+```
+
+Dynamically, it is impossible to reach the `builtin` instruction
+without passing through the definition of `%value`: to reach
+the `builtin`, `%cond` must be `true`, and so the first `cond_br`
+must have branched to `bb1`. This is not taken into consideration
+by dominance, and so this function is ill-formed.
+
 # Debug Information
 
 Each instruction may have a debug location and a SIL scope reference at
@@ -1364,48 +1536,39 @@ stack deallocation instructions. It can even be paired with no
 instructions at all; by the rules below, this can only happen in
 non-terminating functions.
 
--   At any point in a SIL function, there is an ordered list of stack
-    allocation instructions called the *active allocations list*.
+- All stack allocation instructions must be jointly post-dominated
+  by stack deallocation instructions paired with them.
 
--   The active allocations list is defined to be empty at the initial
-    point of the entry block of the function.
+- No path through the function that passes through a stack allocation
+  instruction `B`, having already passed a stack allocation
+  instruction `A`, may subsequently pass through a stack deallocation
+  instruction paired with `A` without first passing through a stack
+  deallocation instruction paired with `B`.
 
--   The active allocations list is required to be the same at the
-    initial point of any successor block as it is at the final point of
-    any predecessor block. Note that this also requires all
-    predecessors/successors of a given block to have the same
-    final/initial active allocations lists.
+These two rules statically enforce that all stack allocations are
+properly nested. In simpler terms:
 
-    In other words, the set of active stack allocations must be the same
-    at a given place in the function no matter how it was reached.
+- At every point in a SIL function, there is an ordered list of stack
+  allocation instructions called the *active allocations list*.
 
--   The active allocations list for the point following a stack
-    allocation instruction is defined to be the result of adding that
-    instruction to the end of the active allocations list for the point
-    preceding the instruction.
+- The active allocations list is empty at the start of the entry block
+  of the function, and it must be empty again whenever an instruction
+  that exits the function is reached, like `return` or `throw`.
 
--   The active allocations list for the point following a stack
-    deallocation instruction is defined to be the result of removing the
-    instruction from the end of the active allocations list for the
-    point preceding the instruction. The active allocations list for the
-    preceding point is required to be non-empty, and the last
-    instruction in it must be paired with the deallocation instruction.
+- Whenever a stack allocation instruction is reached, it is added to
+  the end of the list.
 
-    In other words, all stack allocations must be deallocated in
-    last-in, first-out order, aka stack order.
+- Whenever a stack deallocation instruction is reached, its paired
+  stack allocation instruction must be at the end of the list, which it
+  is then removed from.
 
--   The active allocations list for the point following any other
-    instruction is defined to be the same as the active allocations list
-    for the point preceding the instruction.
+- The active allocations list always be the same on both sides of a
+  control flow edge. This implies both that all successors of a block
+  must start with the same list and that all predecessors of a block
+  must end with the same list.
 
--   The active allocations list is required to be empty prior to
-    `return` or `throw` instructions.
-
-    In other words, all stack allocations must be deallocated prior to
-    exiting the function.
-
-Note that these rules implicitly prevent an allocation instruction from
-still being active when it is reached.
+Note that these rules implicitly prevent stack allocations from leaking
+or being double-freed.
 
 The control-flow rule forbids certain patterns that would theoretically
 be useful, such as conditionally performing an allocation around an
@@ -1414,6 +1577,13 @@ to use, however, as it is illegal to locally abstract over addresses,
 and therefore a conditional allocation cannot be used in the
 intermediate operation anyway.
 
+There is currently an exception to the stack discipline rules which
+allows the predecessors of a dead-end block (a block from which no
+exit is reachable) to disagree about the state of the stack.
+The current exception is unsound and permits manipulation of the
+stack in ways that may not be valid in all predecessor states. We
+are exploring ways to improve this situation.
+
 # Structural type matching for pack indices
 
 In order to catch type errors in applying pack indices, SIL requires the
diff --git a/include/swift/SILOptimizer/Utils/StackNesting.h b/include/swift/SILOptimizer/Utils/StackNesting.h
index 3bd370016f9a6..4c8e4384bc75e 100644
--- a/include/swift/SILOptimizer/Utils/StackNesting.h
+++ b/include/swift/SILOptimizer/Utils/StackNesting.h
@@ -45,8 +45,11 @@ namespace swift {
 ///   dealloc_stack %1
 /// \endcode
 ///
+/// Each allocation must still be properly jointly post-dominated by
+/// its deallocations. StackNesting only fixes the nesting of allocations
+/// deallocations; it does not insert required deallocations that are
+/// missing entirely.
 class StackNesting {
-
 public:
 
   /// The possible return values of fixNesting().
@@ -61,122 +64,6 @@ class StackNesting {
     CFG
   };
 
-private:
-  typedef SmallBitVector BitVector;
-
-  /// Data stored for each block (actually for each block which is not dead).
-  struct BlockInfo {
-    /// The list of stack allocating/deallocating instructions in the block.
-    llvm::SmallVector<SILInstruction *, 8> StackInsts;
-
-    /// The bit-set of alive stack locations at the block entry.
-    BitVector AliveStackLocsAtEntry;
-
-    /// The bit-set of alive stack locations at the block exit.
-    BitVector AliveStackLocsAtExit;
-
-    /// Used in the setup function to walk over the CFG.
-    bool visited = false;
-
-    /// True for dead-end blocks, i.e. blocks from which there is no path to
-    /// a function exit, e.g. blocks which end with `unreachable` or an
-    /// infinite loop.
-    bool isDeadEnd = false;
-  };
-
-  /// Data stored for each stack location (= allocation).
-  ///
-  /// Each stack location is allocated by a single allocation instruction.
-  struct StackLoc {
-    StackLoc(SILInstruction *Alloc) : Alloc(Alloc) {}
-
-    /// Back-link to the allocation instruction.
-    SILInstruction *Alloc;
-
-    /// Bit-set which represents all alive locations at this allocation.
-    /// It obviously includes this location itself. And it includes all "outer"
-    /// locations which surround this location.
-    BitVector AliveLocs;
-  };
-
-  /// Mapping from stack allocations (= locations) to bit numbers.
-  llvm::DenseMap<SILInstruction *, unsigned> StackLoc2BitNumbers;
-
-  /// The list of stack locations. The index into this array is also the bit
-  /// number in the bit-sets.
-  llvm::SmallVector<StackLoc, 8> StackLocs;
-
-  BasicBlockData<BlockInfo> BlockInfos;
-
-  StackNesting(SILFunction *F) : BlockInfos(F) { }
-
-  /// Performs correction of stack nesting by moving stack-deallocation
-  /// instructions down the control flow.
-  ///
-  /// Returns the status of what changes were made.
-  Changes run();
-  
-  /// For debug dumping.
-  void dump() const;
-
-  static void dumpBits(const BitVector &Bits);
-
-  /// Initializes the data structures.
-  void setup();
-
-  /// Solves the dataflow problem.
-  ///
-  /// Returns true if there is a nesting of locations in any way, which can
-  /// potentially in the wrong order.
-  bool solve();
-
-  bool analyze() {
-    setup();
-    return solve();
-  }
-
-  /// Insert deallocation instructions for all locations which are alive before
-  /// the InsertionPoint (AliveBefore) but not alive after the InsertionPoint
-  /// (AliveAfter).
-  ///
-  /// Returns true if any deallocations were inserted.
-  bool insertDeallocs(const BitVector &AliveBefore, const BitVector &AliveAfter,
-                      SILInstruction *InsertionPoint,
-                      std::optional<SILLocation> Location);
-
-  /// Returns the location bit number for a stack allocation instruction.
-  int bitNumberForAlloc(SILInstruction *AllocInst) {
-    assert(AllocInst->isAllocatingStack());
-    return StackLoc2BitNumbers[AllocInst];
-  }
-
-  /// Returns the location bit number for a stack deallocation instruction.
-  int bitNumberForDealloc(SILInstruction *DeallocInst) {
-    assert(DeallocInst->isDeallocatingStack());
-    auto *AllocInst = getAllocForDealloc(DeallocInst);
-    return bitNumberForAlloc(AllocInst);
-  }
-
-  /// Returns the stack allocation instruction for a stack deallocation
-  /// instruction.
-  SILInstruction *getAllocForDealloc(SILInstruction *Dealloc) const {
-    SILValue op = Dealloc->getOperand(0);
-    while (auto *mvi = dyn_cast<MoveValueInst>(op)) {
-      op = mvi->getOperand();
-    }
-    return op->getDefiningInstruction();
-  }
-
-  /// Insert deallocations at block boundaries.
-  Changes insertDeallocsAtBlockBoundaries();
-
-  /// Modifies the SIL to end up with a correct stack nesting.
-  ///
-  /// Returns the status of what changes were made.
-  Changes adaptDeallocs();
-
-public:
-
   /// Performs correction of stack nesting by moving stack-deallocation
   /// instructions down the control flow.
   ///
diff --git a/lib/SILOptimizer/Utils/StackNesting.cpp b/lib/SILOptimizer/Utils/StackNesting.cpp
index 505539ad494dc..d38946617e3d1 100644
--- a/lib/SILOptimizer/Utils/StackNesting.cpp
+++ b/lib/SILOptimizer/Utils/StackNesting.cpp
@@ -20,400 +20,445 @@
 
 using namespace swift;
 
-void StackNesting::setup() {
-  SmallVector<SILBasicBlock *, 8> WorkList;
-
-  // Start with the function entry block and add blocks while walking down along
-  // the successor edges.
-  // This ensures a correct ordering of stack locations: an inner location has
-  // a higher bit-number than it's outer parent location.
-  // This ordering is only important for inserting multiple deallocation
-  // instructions (see below).
-  auto Entry = BlockInfos.entry();
-  WorkList.push_back(&Entry.block);
-  Entry.data.visited = true;
-
-  while (!WorkList.empty()) {
-    SILBasicBlock *Block = WorkList.pop_back_val();
-    BlockInfo &BI = BlockInfos[Block];
-    for (SILInstruction &I : *Block) {
-      if (I.isAllocatingStack()) {
-        auto Alloc = &I;
-        // Register this stack location.
-        unsigned CurrentBitNumber = StackLocs.size();
-        StackLoc2BitNumbers[Alloc] = CurrentBitNumber;
-        StackLocs.push_back(StackLoc(Alloc));
-
-        BI.StackInsts.push_back(Alloc);
-      } else if (I.isDeallocatingStack()) {
-        auto *AllocInst = getAllocForDealloc(&I);
-        if (!BI.StackInsts.empty() && BI.StackInsts.back() == AllocInst) {
-          // As an optimization, we ignore perfectly nested alloc-dealloc pairs
-          // inside a basic block.
-          // Actually, this catches most of the cases and keeps our bitsets
-          // small.
-          assert(StackLocs.back().Alloc == AllocInst);
-          StackLocs.pop_back();
-          BI.StackInsts.pop_back();
-        } else {
-          // Register the stack deallocation.
-          BI.StackInsts.push_back(&I);
-        }
+/// Run the given function exactly once on each of the reachable blocks in
+/// a SIL function. Blocks will be visited in a post-order consistent with
+/// dominance, which is to say, after all dominating blocks but otherwise
+/// in an unspecified order.
+///
+/// The function is passed a state value, which it can freely mutate. The
+/// initial value of the state will be the same as the value left in the
+/// state for an unspecified predecessor (or the initial value passed in,
+/// for the entry block of the function). Since the predecessor choice is
+/// arbitrary, you should only use this if the state is guaranteed to be
+/// the same for all predecessors. The state type must be copyable, but
+/// the algorithm makes a reasonable effort to avoid copying it.
+///
+/// This function assumes you don't change the CFG during its operation.
+template <class Fn, class State>
+void runInDominanceOrder(SILFunction &F, State &&state, const Fn &fn) {
+  // The set of blocks that have ever been enqueued onto the worklist.
+  // (We actually skip the queue in a bunch of cases, but *abstractly*
+  // they're enqueued.)
+  BasicBlockSet visitedBlocks;
+
+  // The next basic block to operate on. We always operate on `state`.
+  SILBasicBlock *curBB = F.getEntryBlock();
+
+  // We need to copy `state` whenever we enqueue a block onto the worklist.
+  // We'll then move-assign it back to `state` when we dequeue it.
+  using StateValue = std::remove_reference_t<State>;
+  SmallVector<std::pair<SILBasicBlock *, StateValue>> worklist;
+
+  while (true) {
+    // Run the function on the current block, updating the current state.
+    fn(curBB, state);
+
+    // Enqueue the successors.
+    SILBasicBlock *nextBB = nullptr;
+    for (SILBasicBlock *succBB : curBB->getSuccessorBlocks()) {
+      // If this insertion returns true, we've already enqueued the
+      // successor block, so we can skip it. This is fast enough because
+      // of BasicBlockSet that there's no point in avoiding it for
+      // single-predecessor blocks.
+      if (!visitedBlocks.insert(succBB))
+        continue;
+
+      // If we haven't found a successor to visit yet, pick this one.
+      if (!nextBB) {
+        nextBB = succBB;
+
+      // Otherwise, add it to the worklist, copying the current state.
+      } else {
+        worklist.emplace_back(succBB, /*copied*/ state);
       }
     }
-    for (SILBasicBlock *SuccBB : Block->getSuccessorBlocks()) {
-      BlockInfo &SuccBI = BlockInfos[SuccBB];
-      if (!SuccBI.visited) {
-        // Push the next reachable block onto the WorkList.
-        WorkList.push_back(SuccBB);
-        SuccBI.visited = true;
-      }
+
+    // If there's a viable direct successor, just continue along this
+    // path, editing the current state in-place.
+    if (nextBB) {
+      curBB = nextBB;
+      continue;
     }
+
+    // Otherwise, if the worklist is empty, we're done.
+    if (worklist.empty()) {
+      return;
+    }
+
+    // Otherwise, pull the next item off the worklist and overwrite the
+    // current state with the state we saved for it before.
+    auto &nextItem = worklist.back();
+    curBB = nextItem.first;
+    state = std::move(nextItem.second);
+    worklist.pop_back();
   }
+}
 
-  unsigned NumLocs = StackLocs.size();
-  for (unsigned Idx = 0; Idx < NumLocs; ++Idx) {
-    StackLocs[Idx].AliveLocs.resize(NumLocs);
-    // Initially each location gets it's own alive-bit.
-    StackLocs[Idx].AliveLocs.set(Idx);
+/// Returns the stack allocation instruction for a stack deallocation
+/// instruction.
+static SILInstruction *getAllocForDealloc(SILInstruction *dealloc) {
+  SILValue op = dealloc->getOperand(0);
+  while (auto *mvi = dyn_cast<MoveValueInst>(op)) {
+    op = mvi->getOperand();
   }
+  return op->getDefiningInstruction();
 }
 
-bool StackNesting::solve() {
-  bool changed = false;
-  bool isNested = false;
-  BitVector Bits(StackLocs.size());
+/// Create a dealloc for a particular allocation.
+///
+/// This is expected to work for all allocations that don't have
+/// properly-nested deallocations. It's fine to have a kind of allocation
+/// that you can't do this for, as long as as it's always explicitly
+/// deallocated on all paths. This pass doesn't change any allocations
+/// or deallocations that are properly nested already.
+///
+/// Only allocations whose deallocations return true from canMoveDealloc
+/// need to support this.
+static void createDealloc(SILBuilder &B, SILLocation loc, SILInstruction *alloc) {
+  switch (alloc->getKind()) {
+  case SILInstructionKind::PartialApplyInst:
+  case SILInstructionKind::AllocStackInst:
+    assert((isa<AllocStackInst>(alloc) ||
+            cast<PartialApplyInst>(alloc)->isOnStack()) &&
+           "wrong instruction");
+    B.createDeallocStack(loc, cast<SingleValueInstruction>(alloc));
+    return;
+  case SILInstructionKind::BeginApplyInst: {
+    auto *bai = cast<BeginApplyInst>(alloc);
+    assert(bai->isCalleeAllocated());
+    B.createDeallocStack(loc, bai->getCalleeAllocationResult());
+    return;
+  }
+  case SILInstructionKind::AllocRefDynamicInst:
+  case SILInstructionKind::AllocRefInst:
+    assert(cast<AllocRefInstBase>(alloc)->canAllocOnStack());
+    B.createDeallocStackRef(loc, cast<AllocRefInstBase>(alloc));
+    return;
+  case SILInstructionKind::AllocPackInst:
+    B.createDeallocPack(loc, cast<AllocPackInst>(alloc));
+    return;
+  case SILInstructionKind::BuiltinInst: {
+    auto *bi = cast<BuiltinInst>(alloc);
+    auto &ctx = alloc->getFunction()->getModule().getASTContext();
 
-  StackList<SILBasicBlock *> deadEndWorklist(BlockInfos.getFunction());
+    switch (*bi->getBuiltinKind()) {
+    case BuiltinValueKind::StackAlloc:
+    case BuiltinValueKind::UnprotectedStackAlloc: {
+      auto identifier =
+        ctx.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc));
+      B.createBuiltin(loc, identifier,
+                      SILType::getEmptyTupleType(ctx),
+                      SubstitutionMap(), {bi});
+      return;
+    }
+    default:
+      llvm_unreachable("unknown stack allocation builtin");
+    }
+  }
+  case SILInstructionKind::AllocPackMetadataInst:
+    B.createDeallocPackMetadata(loc, cast<AllocPackMetadataInst>(alloc));
+    return;
+  default:
+    llvm_unreachable("unknown stack allocation");
+  }
+}
 
-  // Initialize all bit fields to 1s, expect 0s for the entry block.
-  bool initVal = false;
-  for (auto bd : BlockInfos) {
-    bd.data.AliveStackLocsAtEntry.resize(StackLocs.size(), initVal);
-    initVal = true;
+namespace {
+class ActiveAllocation {
+  llvm::PointerIntPair<SILInstruction*, 1, bool> valueAndIsPending;
 
-    bd.data.isDeadEnd = !bd.block.getTerminator()->isFunctionExiting();
-    if (!bd.data.isDeadEnd)
-      deadEndWorklist.push_back(&bd.block);
+public:
+  ActiveAllocation(SILInstruction *value) : valueAndIsPending(value, false) {}
+
+  SILInstruction *getValue() const {
+    return valueAndIsPending.getPointer();
   }
 
-  // Calculate the isDeadEnd block flags.
-  while (!deadEndWorklist.empty()) {
-    SILBasicBlock *b = deadEndWorklist.pop_back_val();
-    for (SILBasicBlock *pred : b->getPredecessorBlocks()) {
-      BlockInfo &bi = BlockInfos[pred];
-      if (bi.isDeadEnd) {
-        bi.isDeadEnd = false;
-        deadEndWorklist.push_back(pred);
-      }
-    }
+  bool isPending() const {
+    return valueAndIsPending.getInt();
   }
 
-  // First step: do a forward dataflow analysis to get the live stack locations
-  // at the block exits.
-  // This is necessary to get the live locations at dead-end blocks (otherwise
-  // the backward data flow would be sufficient).
-  // The special thing about dead-end blocks is that it's okay to have alive
-  // locations at that point (e.g. at an `unreachable`) i.e. locations which are
-  // never dealloced. We cannot get such locations with a purely backward
-  // dataflow.
-  do {
-    changed = false;
-
-    for (auto bd : BlockInfos) {
-      Bits = bd.data.AliveStackLocsAtEntry;
-      for (SILInstruction *StackInst : bd.data.StackInsts) {
-        if (StackInst->isAllocatingStack()) {
-          Bits.set(bitNumberForAlloc(StackInst));
-        } else if (StackInst->isDeallocatingStack()) {
-          Bits.reset(bitNumberForDealloc(StackInst));
-        }
-      }
-      if (Bits != bd.data.AliveStackLocsAtExit) {
-        bd.data.AliveStackLocsAtExit = Bits;
-        changed = true;
-      }
-      // Merge the bits into the successors.
-      for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) {
-        BlockInfos[SuccBB].AliveStackLocsAtEntry &= Bits;
-      }
+  void setPending() {
+    assert(!isPending());
+    valueAndIsPending.setInt(true);
+  }
+};
+
+struct State {
+  // The active allocations and whether they're pending deallocation.
+  SmallVector<ActiveAllocation, 4> allocations;
+
+#ifndef NDEBUG
+  SWIFT_ATTRIBUTE_NORETURN
+  void abortForUnknownAllocation(SILInstruction *alloc,
+                                 SILInstruction *dealloc) {
+    llvm::errs() << "fatal error: StackNesting could not find record of "
+                    "allocation for deallocation:\n  "
+                 << *dealloc
+                 << "Allocation might not be jointly post-dominated. "
+                    "Current stack:\n";
+    for (auto i : indices(allocations)) {
+      llvm::errs() << "[" << i << "] "
+                   << (allocations[i].isPending() ? "(pending) " : "")
+                   << *allocations[i].getValue();
     }
-  } while (changed);
+    llvm::errs() << "Complete function:\n";
+    alloc->getFunction()->dump();
+    abort();
+  }
+#endif
+};
 
-  // Second step: do a backward dataflow analysis to extend the lifetimes of
-  // not properly nested allocations.
-  do {
-    changed = false;
+} // end anonymous namespace
 
-    for (auto bd : llvm::reverse(BlockInfos)) {
-      // Collect the alive-bits (at the block exit) from the successor blocks.
-      for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) {
-        bd.data.AliveStackLocsAtExit |= BlockInfos[SuccBB].AliveStackLocsAtEntry;
-      }
-      Bits = bd.data.AliveStackLocsAtExit;
-      assert(!(bd.data.visited && bd.block.getTerminator()->isFunctionExiting()
-               && Bits.any())
-             && "stack location is missing dealloc");
-
-      if (bd.data.isDeadEnd) {
-        // We treat `unreachable` as an implicit deallocation for all locations
-        // which are still alive at this point. The same is true for dead-end
-        // CFG regions due to an infinite loop.
-        for (int BitNr = Bits.find_first(); BitNr >= 0;
-             BitNr = Bits.find_next(BitNr)) {
-          // For each alive location extend the lifetime of all locations which
-          // are alive at the allocation point. This is the same as we do for
-          // a "real" deallocation instruction (see below).
-          // In dead-end CFG regions we have to do that for all blocks (because
-          // of potential infinite loops), whereas in "normal" CFG regions it's
-          // sufficient to do it at deallocation instructions.
-          Bits |= StackLocs[BitNr].AliveLocs;
-        }
-        bd.data.AliveStackLocsAtExit = Bits;
-      }
-      for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) {
-        if (StackInst->isAllocatingStack()) {
-          int BitNr = bitNumberForAlloc(StackInst);
-          if (Bits != StackLocs[BitNr].AliveLocs) {
-            // More locations are alive around the StackInst's location.
-            // Update the AliveLocs bitset, which contains all those alive
-            // locations.
-            assert(Bits.test(BitNr) && "no dealloc found for alloc stack");
-            StackLocs[BitNr].AliveLocs = Bits;
-            changed = true;
-            isNested = true;
-          }
-          // The allocation ends the lifetime of it's stack location (in reverse
-          // order)
-          Bits.reset(BitNr);
-        } else if (StackInst->isDeallocatingStack()) {
-          // A stack deallocation begins the lifetime of its location (in
-          // reverse order). And it also begins the lifetime of all other
-          // locations which are alive at the allocation point.
-          Bits |= StackLocs[bitNumberForDealloc(StackInst)].AliveLocs;
-        }
-      }
-      if (Bits != bd.data.AliveStackLocsAtEntry) {
-        bd.data.AliveStackLocsAtEntry = Bits;
-        changed = true;
-      }
+using IndexForAllocationMap = llvm::DenseMap<SILInstruction*, size_t>;
+
+/// Flag that a particular allocation is pending.
+static void setAllocationAsPending(State &state, SILInstruction *alloc,
+                                   SILInstruction *dealloc,
+                                   IndexForAllocationMap &indexForAllocation) {
+  auto stack = MutableArrayRef(state.allocations);
+  assert(!stack.empty());
+
+  // Just ignore the top entry in all of this; we know it doesn't match
+  // the allocation.
+  assert(stack.back().getValue() != alloc);
+  stack = stack.drop_back();
+
+  // Ultimately, we're just calling setPending() on the entry matching
+  // `alloc` in the allocations stack. All the complexity has to do with
+  // trying to avoid super-linear behavior while also trying very hard
+  // to avoid actually using indexForAllocation for simple cases.
+
+  // It's very common for allocations to never be improperly nested,
+  // so we don't want to eagerly add allocations to indexForAllocation
+  // when we encounter them. This means we can't rely on it having
+  // an entry for `alloc` now.
+
+  // `alloc` is very likely to be close to the top of the stack. Just do
+  // a short linear scan there first. This might be slightly slower than
+  // a hash lookup in the worst case, but usually it means we can avoid
+  // adding any entries to indexForAllocation at all. Even for this case
+  // where nesting is broken, that's still worthwhile to do.
+  const size_t linearScanLimit = 8;
+  auto linearScanEntries = stack.take_back(linearScanLimit);
+  for (auto &entry : linearScanEntries) {
+    if (entry.getValue() == alloc) {
+      entry.setPending();
+      return;
     }
-  } while (changed);
-  
-  return isNested;
-}
+  }
+
+  // Okay, so much for that, time for the hashtable.
+
+#ifndef NDEBUG
+  if (stack.size() <= linearScanLimit) {
+    state.abortForUnknownAllocation(alloc, dealloc);
+  }
+#endif
 
-static SILInstruction *createDealloc(SILInstruction *Alloc,
-                                     SILInstruction *InsertionPoint,
-                                     SILLocation Location) {
-  SILBuilderWithScope B(InsertionPoint);
-  switch (Alloc->getKind()) {
-    case SILInstructionKind::PartialApplyInst:
-    case SILInstructionKind::AllocStackInst:
-      assert((isa<AllocStackInst>(Alloc) ||
-              cast<PartialApplyInst>(Alloc)->isOnStack()) &&
-             "wrong instruction");
-      return B.createDeallocStack(Location,
-                                  cast<SingleValueInstruction>(Alloc));
-    case SILInstructionKind::BeginApplyInst: {
-      auto *bai = cast<BeginApplyInst>(Alloc);
-      assert(bai->isCalleeAllocated());
-      return B.createDeallocStack(Location, bai->getCalleeAllocationResult());
+  // We don't need to consider entries that we've already linearly scanned.
+  stack = stack.drop_back(linearScanLimit);
+
+  // Check if the entry's already in the hashtable.
+  if (auto it = indexForAllocation.find(alloc); it != indexForAllocation.end()) {
+    auto index = it->second;
+    assert(stack[index].getValue() == alloc);
+    stack[index].setPending();
+    return;
+  }
+
+  // Fill in any missing entries in indexForAllocations.
+  //
+  // The invariant we maintain is that there may be allocations at the
+  // top of the stack that aren't hashed, but once we reach a hashed
+  // entry, everything beneath it is hashed. The first half of this
+  // is necessary because we don't eagerly add allocations to the table,
+  // but it's also what makes it okay that we skip the entries we
+  // linearly scanned. The second half of this means that, if we start
+  // adding entries from the top down, we can stop hashing once we find
+  // that the entries we're adding are redundant. That's what keeps this
+  // O(N).
+  //
+  // All of this caching is relying on us (1) never revisiting a block
+  // and (2) never changing the active-allocations stack except via push
+  // and pop.
+
+  // Look for the target allocation index in this loop rather than doing
+  // a hash lookup at the end.
+  std::optional<size_t> foundIndexForAlloc;
+
+  for (size_t onePast = stack.size(); onePast != 0; --onePast) {
+    size_t entryIndex = onePast - 1;
+    auto entryAlloc = stack[entryIndex].getValue();
+
+    // Remember this if it's the allocation we're looking for.
+    if (entryAlloc == alloc) {
+      foundIndexForAlloc = entryIndex;
     }
-    case SILInstructionKind::AllocRefDynamicInst:
-    case SILInstructionKind::AllocRefInst:
-      assert(cast<AllocRefInstBase>(Alloc)->canAllocOnStack());
-      return B.createDeallocStackRef(Location, cast<AllocRefInstBase>(Alloc));
-    case SILInstructionKind::AllocPackInst:
-      return B.createDeallocPack(Location, cast<AllocPackInst>(Alloc));
-    case SILInstructionKind::BuiltinInst: {
-      auto *bi = cast<BuiltinInst>(Alloc);
-      assert(bi->getBuiltinKind() == BuiltinValueKind::StackAlloc ||
-             bi->getBuiltinKind() == BuiltinValueKind::UnprotectedStackAlloc);
-      auto &context = Alloc->getFunction()->getModule().getASTContext();
-      auto identifier =
-          context.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc));
-      return B.createBuiltin(Location, identifier,
-                             SILType::getEmptyTupleType(context),
-                             SubstitutionMap(), {bi});
+
+    // Add this entry to the hashtable. Stop hashing as soon as this fails.
+    auto insertResult = indexForAllocation.insert({entryAlloc, entryIndex});
+    if (!insertResult.second) {
+      continue;
     }
-    case SILInstructionKind::AllocPackMetadataInst:
-      return B.createDeallocPackMetadata(Location,
-                                         cast<AllocPackMetadataInst>(Alloc));
-    default:
-      llvm_unreachable("unknown stack allocation");
   }
+
+#ifndef NDEBUG
+  if (!foundIndexForAlloc) {
+    state.abortForUnknownAllocation(alloc, dealloc);
+  }
+#endif
+
+  stack[*foundIndexForAlloc].setPending();
 }
 
-bool StackNesting::insertDeallocs(const BitVector &AliveBefore,
-                                  const BitVector &AliveAfter,
-                                  SILInstruction *InsertionPoint,
-                                  std::optional<SILLocation> Location) {
-  if (!AliveBefore.test(AliveAfter))
-    return false;
-
-  // The order matters here if we have to insert more than one
-  // deallocation. We already ensured in setup() that the bit numbers
-  // are allocated in the right order.
-  bool changesMade = false;
-  for (int LocNr = AliveBefore.find_first(); LocNr >= 0;
-       LocNr = AliveBefore.find_next(LocNr)) {
-    if (!AliveAfter.test(LocNr)) {
-      auto *Alloc = StackLocs[LocNr].Alloc;
-      InsertionPoint = createDealloc(Alloc, InsertionPoint,
-                   Location.has_value() ? Location.value() : Alloc->getLoc());
-      changesMade = true;
+/// Pop and emit deallocations for any allocations on top of the
+/// active allocations stack that are pending deallocation.
+///
+/// This operation is called whenever we pop an allocation; it
+/// restores the invariant that the top of the stack is never in a
+/// pending state.
+static void emitPendingDeallocations(State &state,
+                                     SILInstruction *insertAfterDealloc,
+                                     bool &madeChanges) {
+  std::optional<SILBuilderWithScope> builder;
+
+  while (!state.allocations.empty() &&
+         state.allocations.back().isPending()) {
+    auto entry = state.allocations.pop_back_val();
+    SILInstruction *alloc = entry.getValue();
+
+    // Create a builder that inserts after the initial dealloc, if we
+    // haven't already. Re-using the same builder for subsequent deallocs
+    // means we order them correctly w.r.t each other, which we wouldn't
+    // if we made a fresh builder after the initial dealloc each time.
+    if (!builder) {
+      // We want to use the location of (and inherit debug scopes from)
+      // the initial dealloc that we're inserting after.
+      builder.emplace(/*insertion point*/
+                        std::next(insertAfterDealloc->getIterator()),
+                      /*inherit scope from*/insertAfterDealloc);
     }
+
+    createDealloc(*builder, insertAfterDealloc->getLoc(), alloc);
+    madeChanges = true;
   }
-  return changesMade;
 }
 
-// Insert deallocations at block boundaries.
-// This can be necessary for unreachable blocks. Example:
-//
-//   %1 = alloc_stack
-//   %2 = alloc_stack
-//   cond_br %c, bb2, bb3
-// bb2: <--- need to insert a dealloc_stack %2 at the begin of bb2
-//   dealloc_stack %1
-//   unreachable
-// bb3:
-//   dealloc_stack %2
-//   dealloc_stack %1
-StackNesting::Changes StackNesting::insertDeallocsAtBlockBoundaries() {
-  Changes changes = Changes::None;
-   for (auto bd : llvm::reverse(BlockInfos)) {
-    // Collect the alive-bits (at the block exit) from the successor blocks.
-    for (auto succAndIdx : llvm::enumerate(bd.block.getSuccessorBlocks())) {
-      BlockInfo &SuccBI = BlockInfos[succAndIdx.value()];
-      if (SuccBI.AliveStackLocsAtEntry == bd.data.AliveStackLocsAtExit)
-        continue;
+/// The main entrypoint for clients.
+StackNesting::Changes StackNesting::fixNesting(SILFunction *F) {
+  bool madeChanges = false;
+
+  // The index in the allocation stack for each allocation. Multiple
+  // allocations can map to the same index, since ultimately it's a stack;
+  // we should be looking up an allocation while it's still in use.
+  // This is very lazily filled in, because we don't want to do unnecessary
+  // work if nothing is unscoped. See setAllocationAsPending for invariants.
+  // This function never accesses it directly.
+  IndexForAllocationMap indexForAllocation;
+
+  // Visit each block of the function in an order consistent with dominance.
+  // The state represents the stack of active allocations, so it's appropriate
+  // that it starts with an empty stack. We're not worried about states
+  // potentially being different for different paths to the same block because
+  // that can only happen if deallocations don't properly post-dominate
+  // their allocations.
+  runInDominanceOrder(*F, State(), [&](SILBasicBlock *B, State &state) {
+
+    // We can't use a foreach loop because we sometimes remove the
+    // current instruction or add instructions (that we shouldn't visit)
+    // after it. Advancing the iterator immediately within the loop is
+    // sufficient to protect against both.
+    for (auto II = B->begin(), IE = B->end(); II != IE; ) {
+      SILInstruction *I = &*II++;
+
+      // Invariant: the top of the stack is never pending.
+      assert(state.allocations.empty() ||
+             !state.allocations.back().isPending());
 
-      // Insert deallocations for all locations which are alive at the end of
-      // the current block, but not at the begin of the successor block.
-      SILBasicBlock *InsertionBlock = succAndIdx.value();
-      if (!InsertionBlock->getSinglePredecessorBlock()) {
-        // If the current block is not the only predecessor of the successor
-        // block, we have to insert a new block where we can add the
-        // deallocations.
-        InsertionBlock = splitEdge(bd.block.getTerminator(), succAndIdx.index());
-        changes = Changes::CFG;
+      // Push allocations onto the current stack in the non-pending state.
+      if (I->isAllocatingStack()) {
+        state.allocations.push_back(I);
+        continue;
       }
-      if (insertDeallocs(bd.data.AliveStackLocsAtExit,
-                         SuccBI.AliveStackLocsAtEntry, &InsertionBlock->front(),
-                         std::nullopt)) {
-        if (changes == Changes::None)
-          changes = Changes::Instructions;
+
+      // Ignore instructions other than allocations and deallocations.
+      if (!I->isDeallocatingStack()) {
+        continue;
       }
-    }
-  }
-  return changes;
-}
 
-StackNesting::Changes StackNesting::adaptDeallocs() {
-  bool InstChanged = false;
-  BitVector Bits(StackLocs.size());
-
-  // Visit all blocks. Actually the order doesn't matter, but let's to it in
-  // the same order as in solve().
-   for (auto bd : llvm::reverse(BlockInfos)) {
-    Bits = bd.data.AliveStackLocsAtExit;
-
-    // Insert/remove deallocations inside blocks.
-    for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) {
-      if (StackInst->isAllocatingStack()) {
-        // For allocations we just update the bit-set.
-        int BitNr = bitNumberForAlloc(StackInst);
-        assert(Bits == StackLocs[BitNr].AliveLocs &&
-               "dataflow didn't converge");
-        Bits.reset(BitNr);
-      } else if (StackInst->isDeallocatingStack()) {
-        // Handle deallocations.
-        SILLocation Loc = StackInst->getLoc();
-        int BitNr = bitNumberForDealloc(StackInst);
-        SILInstruction *InsertionPoint = &*std::next(StackInst->getIterator());
-        if (Bits.test(BitNr)) {
-          // The location of StackInst is alive after StackInst. So we have to
-          // remove this deallocation.
-          StackInst->eraseFromParent();
-          InstChanged = true;
-        } else {
-          // Avoid inserting another deallocation for BitNr (which is already
-          // StackInst).
-          Bits.set(BitNr);
-        }
-
-        // Insert deallocations for all locations which are not alive after
-        // StackInst but _are_ alive at the StackInst.
-        InstChanged |= insertDeallocs(StackLocs[BitNr].AliveLocs, Bits,
-                                      InsertionPoint, Loc);
-        Bits |= StackLocs[BitNr].AliveLocs;
+      // Get the allocation for the deallocation.
+      SILInstruction *dealloc = I;
+      SILInstruction *alloc = getAllocForDealloc(dealloc);
+
+#ifndef NDEBUG
+      if (state.allocations.empty()) {
+        state.abortForUnknownAllocation(alloc, dealloc);
       }
-    }
-    assert(Bits == bd.data.AliveStackLocsAtEntry && "dataflow didn't converge");
-  }
-  return InstChanged ? Changes::Instructions : Changes::None;
-}
+#endif
 
-StackNesting::Changes StackNesting::fixNesting(SILFunction *F) {
-  Changes changes = Changes::None;
-  {
-    StackNesting SN(F);
-    if (!SN.analyze())
-      return Changes::None;
-
-    // Insert deallocs at block boundaries. This might be necessary in CFG sub
-    // graphs which don't reach a function exit, but only an unreachable.
-    changes = SN.insertDeallocsAtBlockBoundaries();
-    if (changes == Changes::None) {
-      // Do the real work: extend lifetimes by moving deallocs.
-      return SN.adaptDeallocs();
-    }
-  }
-  {
-    // Those inserted deallocs make it necessary to re-compute the analysis.
-    StackNesting SN(F);
-    SN.analyze();
-    // Do the real work: extend lifetimes by moving deallocs.
-    return std::max(SN.adaptDeallocs(), changes);
-  }
-}
+      // If the allocation is the top of the allocations stack, we can
+      // leave it alone.
+      if (alloc == state.allocations.back().getValue()) {
+        // Pop off our record of the allocation.
+        state.allocations.pop_back();
+
+        // We may need to emit any pending deallocations still on the stack.
+        // Pop and emit them in order.
+        emitPendingDeallocations(state, /*after*/ dealloc, madeChanges);
 
-void StackNesting::dump() const {
-  for (auto bd : BlockInfos) {
-    llvm::dbgs() << "Block " << bd.block.getDebugID();
-    if (bd.data.isDeadEnd)
-      llvm::dbgs() << "(deadend)";
-    llvm::dbgs() << ": entry-bits=";
-    dumpBits(bd.data.AliveStackLocsAtEntry);
-    llvm::dbgs() << ": exit-bits=";
-    dumpBits(bd.data.AliveStackLocsAtExit);
-    llvm::dbgs() << '\n';
-    for (SILInstruction *StackInst : bd.data.StackInsts) {
-      if (StackInst->isAllocatingStack()) {
-        auto AllocInst = StackInst;
-        int BitNr = StackLoc2BitNumbers.lookup(AllocInst);
-        llvm::dbgs() << "  alloc #" << BitNr << ": alive=";
-        dumpBits(StackLocs[BitNr].AliveLocs);
-        llvm::dbgs() << ",     " << *StackInst;
-      } else if (StackInst->isDeallocatingStack()) {
-        auto *AllocInst = getAllocForDealloc(StackInst);
-        int BitNr = StackLoc2BitNumbers.lookup(AllocInst);
-        llvm::dbgs() << "  dealloc for #" << BitNr << "\n"
-                        "    " << *StackInst;
+        continue;
       }
+
+      // Otherwise, just remove the deallocation and set the allocation
+      // as having a pending deallocation on this path.
+      //
+      // When we mark `alloc` as having a pending deallocation, we are
+      // deferring its deallocation on this path to the deallocation
+      // points of some non-pending allocation that's on top of it on the
+      // allocation stack. That may not end up being the current top of
+      // the stack:
+      //
+      // - the current top may itself get deferred later, e.g.
+      //
+      //     %alloc = alloc_stack $Int
+      //     %top = alloc_stack $Int
+      //     dealloc_stack %alloc       // gets deferred
+      //     %new = alloc_stack $Int
+      //     dealloc_stack %top         // gets deferred
+      //     dealloc_stack %new         // dealloc %alloc and %top after this
+      //
+      // - there might be some other allocation between `alloc` and the
+      //   current that "inherits" the deferral after we dealloc the
+      //   current top, e.g.:
+      //
+      //     %alloc = alloc_stack $Int
+      //     %middle = alloc_stack $Int
+      //     %top = alloc_stack $Int
+      //     dealloc_stack %alloc       // gets deferred
+      //     dealloc_stack %top
+      //     dealloc_stack %middle      // dealloc %alloc after this
+      //
+      // The key is that, no matter what happens, there's always a last
+      // thing on the stack above `alloc` that hasn't been deferred yet.
+      // Joint post-dominance means that, on every path we can reach
+      // from this point, we'll either eventually reach a dealloc for
+      // that last undeferred allocation (in which case, we'll deallocate
+      // `alloc` then), or we'll reach a dead-end (in which case it's fine
+      // that we never deallocated `alloc`). And whatever those points
+      // are, replacing this deallocation of `alloc` with those points
+      // will re-establish the joint post-dominance of `alloc` by its
+      // deallocations with respect to this path.
+
+      dealloc->eraseFromParent();
+      madeChanges = true;
+      setAllocationAsPending(state, alloc, dealloc, indexForAllocation);
     }
-  }
-}
+  });
 
-void StackNesting::dumpBits(const BitVector &Bits) {
-  llvm::dbgs() << '<';
-  const char *separator = "";
-  for (int Bit = Bits.find_first(); Bit >= 0; Bit = Bits.find_next(Bit)) {
-    llvm::dbgs() << separator << Bit;
-    separator = ",";
-  }
-  llvm::dbgs() << '>';
+  // We never make changes to the CFG.
+  return (madeChanges ? Changes::Instructions : Changes::None);
 }
 
 namespace swift::test {
diff --git a/test/SILOptimizer/allocbox_to_stack.sil b/test/SILOptimizer/allocbox_to_stack.sil
index 26f568cf31549..4ffe484cf2156 100644
--- a/test/SILOptimizer/allocbox_to_stack.sil
+++ b/test/SILOptimizer/allocbox_to_stack.sil
@@ -1086,6 +1086,9 @@ bb3:
   %3 = load %as1 : $*Bool
   unreachable
 }
+//   StackNesting used to split the edge to bb3, but that's not really
+//   necessary. Also, this input very arguably should not be allowed
+//   because there isn't a consistent stack depth on entry to bb3.
 // CHECK-LABEL: sil @nesting_and_unreachable_critical_edge
 // CHECK:      bb0(%0 : $Int):
 // CHECK-NEXT:   [[BOX:%[0-9]+]] = alloc_stack $Int
@@ -1095,17 +1098,15 @@ bb3:
 // CHECK-NEXT:   [[STACK2:%[0-9]+]] = alloc_stack $Bool
 // CHECK-NEXT:   cond_br
 // CHECK:      bb2:
-// CHECK-NEXT:   dealloc_stack [[STACK2]]
-// CHECK-NEXT:   br bb4
-// CHECK:      bb3:
 // CHECK:        store {{%[0-9]+}}
 // CHECK:        dealloc_stack [[STACK2]]
 // CHECK-NEXT:   dealloc_stack [[STACK1]]
 // CHECK-NEXT:   dealloc_stack [[BOX]]
 // CHECK-NEXT:   tuple
 // CHECK-NEXT:   return
-// CHECK:      bb4:
+// CHECK:      bb3:
 // CHECK-NEXT:   unreachable
+// CHECK:        // end sil function 'nesting_and_unreachable_critical_edge'
 sil @nesting_and_unreachable_critical_edge : $(Int) -> () {
 bb0(%0 : $Int):
   %1 = alloc_box ${ var Int }
diff --git a/test/SILOptimizer/stack_promotion.sil b/test/SILOptimizer/stack_promotion.sil
index 40edd69d12807..dfb588229ff91 100644
--- a/test/SILOptimizer/stack_promotion.sil
+++ b/test/SILOptimizer/stack_promotion.sil
@@ -1058,19 +1058,32 @@ bb0(%0 : $Int, %another: $Array<Int>):
   return %24 : $()
 }
 
-
+//   The input to StackNesting seems sketchy here; it isn't really
+//   obeying joint post-dominance, and is just skating by under the
+//   laxer rules for blocks that don't exit.
 // CHECK-LABEL: sil @promote_with_unreachable_block_nest_bug
-// CHECK: bb3:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb4:
-// CHECK: br bb5
-// CHECK: bb5:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb6:
-// CHECK: dealloc_stack_ref %{{.*}}
-// CHECK: bb11:
-// CHECK: alloc_ref [stack] $XX
-// CHECK: return
+// CHECK:      bb3:
+// CHECK-NEXT:   strong_release [[APPLY_RESULT:%[0-9]+]]
+// CHECK-NEXT:   dealloc_stack_ref [[ALLOC:%[0-9]+]]
+// CHECK-NEXT:   br bb6
+// CHECK:      bb4:
+// CHECK-NEXT:   integer_literal
+// CHECK-NEXT:   struct
+// CHECK-NEXT:   br bb13
+// CHECK:      bb5:
+// CHECK-NEXT:   strong_release [[APPLY_RESULT]]
+// CHECK-NEXT:   dealloc_stack_ref [[ALLOC]]
+// CHECK-NEXT:   br bb6
+// CHECK:      bb10:
+// CHECK-NEXT:   [[ALLOC]] = alloc_ref [stack] $XX
+// CHECK-NEXT:   // function_ref
+// CHECK-NEXT:   function_ref
+// CHECK-NEXT:   [[APPLY_RESULT]] = apply
+// CHECK-NEXT:   ref_element_addr
+// CHECK-NEXT:   load
+// CHECK-NEXT:   cond_br undef, bb2, bb4
+// CHECK:      bb13(
+// CHECK-NEXT:   unreachable
 sil @promote_with_unreachable_block_nest_bug : $@convention(thin) () -> Int32 {
 bb0:
   %0 = alloc_stack $Builtin.Int32                 // user: %30