diff --git a/docs/SIL/SIL.md b/docs/SIL/SIL.md index d7cc716ca4f16..d82414a639fad 100644 --- a/docs/SIL/SIL.md +++ b/docs/SIL/SIL.md @@ -780,6 +780,178 @@ _lexical_ in order to specify this property for all contributing lifetimes. For details see [Variable Lifetimes](Ownership.md#variable-lifetimes) in the Ownership document. +# Dominance + +## Value and instruction dominance + +Whenever an instruction uses a [value](#values-and-operands) as an +operand, the definition of the value must dominate the instruction. +This is a common concept across all SSA-like representations. SIL +uses a standard definition of dominance, modified slightly to account +for SIL's use of basic block arguments rather than phi instructions: + +- The value `undef` always dominates an instruction. + +- An instruction result `R` dominates an instruction `I` if the + instruction that defines `R` dominates `I`. + +- An argument of a basic block `B` dominates an instruction `I` if all + initial paths passing through `I` must also pass through the start + of `B`. + +An instruction `D` dominates another instruction `I` if they are +different instructions and all initial paths passing through `I` +must also pass through `D`. + +See [below](#definition-of-a-path) for the formal definition of an +initial path. + +## Basic block dominance + +A basic block `B1` dominates a basic block `B2` if they are different +blocks and if all initial paths passing through the start of `B2` must +also pass through through the start of `B1`. + +This relationship between blocks can be thought of as creating a +directed acyclic graph of basic blocks, called the *dominance tree*. +The dominance tree is not directly represented in SIL; it is just +an emergent property of the dominance requirement on SIL functions. + +## Joint post-dominance + +Certain instructions are required to have a *joint post-dominance* +relationship with certain other instructions. Informally, this means +that all terminating paths through the first instruction must +eventually pass through one of the others. This is common for +instructions that define a scope in the SIL function, such as +`alloc_stack` and `begin_access`. + +The dominating instruction is called the *scope instruction*, +and the post-dominating instructions are called the *scope-ending +instructions*. The specific joint post-dominance requirement +defines the set of instructions that count as scope-ending +instructions for the begin instruction. + +For example, an `alloc_stack` instruction must be jointly +post-dominated by the set of `dealloc_stack` instructions +whose operand is the result of the `alloc_stack`. The +`alloc_stack` is the scope instruction, and the `dealloc_stack`s +are the scope-ending instructions. + +The *scope* of a joint post-dominance relationship is the set +of all points in the function following the scope instruction +but prior to a scope-ending instruction. Making this precisely +defined is part of the point of the joint post-dominance rules. +A formal definition is given later. + +In SIL, if an instruction acts as a scope instruction, it always +has exactly one set of scope-ending instructions associated +with it, and so it forms exactly one scope. People will therefore +often talk about, e.g., the scope of an `alloc_stack`, meaning +the scope between it and its `dealloc_stack`s. Furthermore, +there are no instructions in SIL which act as scope-ending +instructions for multiple scopes. + +A scope instruction `I` is jointly post-dominated by its +scope-ending instructions if: + +- All initial paths that pass through a scope-ending instruction + of `I` must also pass through `I`. (This is just the normal + dominance rule, and it is typically already required by the + definition of the joint post-dominance relationship. For example, + a `dealloc_stack` must be dominated by its associated + `alloc_stack` because it uses its result as an operand.) + +- All initial paths that pass through `I` twice must also pass + through a scope-ending instruction of `I` in between. + +- All initial paths that pass through a scope-ending instruction + of `I` twice must also pass through `I` in between. + +- All terminating initial paths that pass through `I` must also + pass through a scope-ending instruction of `I`. + +In other words, all paths must strictly alternate between `I` +and its scope-ending instructions, starting with `I` and (if +the path exits) ending with a scope-ending instruction. + +Note that a scope-ending instruction does not need to appear on +a path following a scope instruction if the path doesn't exit +the function. In fact, a function needn't include any scope-ending +instructions for a particular scope instruction if all paths from +that point are non-terminating, such as by ending in `unreachable` +or containing an infinite loop. + +The scope defined by a joint post-dominance relationship for +a scope instruction `I` is the set of points in the function for +which there exists an initial path that visits that point and +which passes through `I` but which does not pass through a +scope-ending instruction of `I` under that relationship. Note +that the point before a scope-ending instruction is always within +the scope. + +## Definition of a path + +A *point* in a SIL function is the moment before an instruction. +Every basic block has an entry point, which is the point before +its first instruction. The entry point of the entry block is also +called the entry point of the function. + +A path through a SIL function is a path (in the usual graph-theory +sense) in the underlying directed graph of points, in which: + +- every point in the SIL function is a vertex in the graph, + +- each non-terminator instruction creates an edge from the point + before it to the point after it, and + +- each terminator instruction creates edges from the point before + the terminator to the initial point of each its successor blocks. + +A path is said to pass through an instruction if it includes +any of the edges created by that instruction. A path is said to +pass through the start of a basic block if it visits the entry +point of that block. + +An *initial path* is a path which begins at the entry point of the +function. A *terminating path* is a path which ends at the point +before an exiting instruction, such as `return` or `throw`. + +Note that the dominance rules generally require only an initial path, +not a terminating path. A path that simply stops in the middle of a +block still counts for dominance. Among other things, this ensures that +dominance holds in blocks that are part of an infinite loop. + +Note also that paths consider successors without regard to the +nature of the terminator. Paths that are provably impossible because +of value relationships still count for dominance. For example, +consider the following function: + +``` + bb0(%cond : $Builtin.Int1): + cond_br %cond, bb1, b22 + bb1: + %value = integer_literal $Builtin.Int32, 0 + br bb3 + bb2: + br bb3 + bb3: + cond_br %cond, bb4, bb5 + bb4: + %twice_value = builtin "add_Int32"(%value, %value) : $Builtin.Int32 + br bb6 + bb5: + br bb6 + bb6: + ret %cond +``` + +Dynamically, it is impossible to reach the `builtin` instruction +without passing through the definition of `%value`: to reach +the `builtin`, `%cond` must be `true`, and so the first `cond_br` +must have branched to `bb1`. This is not taken into consideration +by dominance, and so this function is ill-formed. + # Debug Information Each instruction may have a debug location and a SIL scope reference at @@ -1364,48 +1536,39 @@ stack deallocation instructions. It can even be paired with no instructions at all; by the rules below, this can only happen in non-terminating functions. -- At any point in a SIL function, there is an ordered list of stack - allocation instructions called the *active allocations list*. +- All stack allocation instructions must be jointly post-dominated + by stack deallocation instructions paired with them. -- The active allocations list is defined to be empty at the initial - point of the entry block of the function. +- No path through the function that passes through a stack allocation + instruction `B`, having already passed a stack allocation + instruction `A`, may subsequently pass through a stack deallocation + instruction paired with `A` without first passing through a stack + deallocation instruction paired with `B`. -- The active allocations list is required to be the same at the - initial point of any successor block as it is at the final point of - any predecessor block. Note that this also requires all - predecessors/successors of a given block to have the same - final/initial active allocations lists. +These two rules statically enforce that all stack allocations are +properly nested. In simpler terms: - In other words, the set of active stack allocations must be the same - at a given place in the function no matter how it was reached. +- At every point in a SIL function, there is an ordered list of stack + allocation instructions called the *active allocations list*. -- The active allocations list for the point following a stack - allocation instruction is defined to be the result of adding that - instruction to the end of the active allocations list for the point - preceding the instruction. +- The active allocations list is empty at the start of the entry block + of the function, and it must be empty again whenever an instruction + that exits the function is reached, like `return` or `throw`. -- The active allocations list for the point following a stack - deallocation instruction is defined to be the result of removing the - instruction from the end of the active allocations list for the - point preceding the instruction. The active allocations list for the - preceding point is required to be non-empty, and the last - instruction in it must be paired with the deallocation instruction. +- Whenever a stack allocation instruction is reached, it is added to + the end of the list. - In other words, all stack allocations must be deallocated in - last-in, first-out order, aka stack order. +- Whenever a stack deallocation instruction is reached, its paired + stack allocation instruction must be at the end of the list, which it + is then removed from. -- The active allocations list for the point following any other - instruction is defined to be the same as the active allocations list - for the point preceding the instruction. +- The active allocations list always be the same on both sides of a + control flow edge. This implies both that all successors of a block + must start with the same list and that all predecessors of a block + must end with the same list. -- The active allocations list is required to be empty prior to - `return` or `throw` instructions. - - In other words, all stack allocations must be deallocated prior to - exiting the function. - -Note that these rules implicitly prevent an allocation instruction from -still being active when it is reached. +Note that these rules implicitly prevent stack allocations from leaking +or being double-freed. The control-flow rule forbids certain patterns that would theoretically be useful, such as conditionally performing an allocation around an @@ -1414,6 +1577,13 @@ to use, however, as it is illegal to locally abstract over addresses, and therefore a conditional allocation cannot be used in the intermediate operation anyway. +There is currently an exception to the stack discipline rules which +allows the predecessors of a dead-end block (a block from which no +exit is reachable) to disagree about the state of the stack. +The current exception is unsound and permits manipulation of the +stack in ways that may not be valid in all predecessor states. We +are exploring ways to improve this situation. + # Structural type matching for pack indices In order to catch type errors in applying pack indices, SIL requires the diff --git a/include/swift/SILOptimizer/Utils/StackNesting.h b/include/swift/SILOptimizer/Utils/StackNesting.h index 3bd370016f9a6..4c8e4384bc75e 100644 --- a/include/swift/SILOptimizer/Utils/StackNesting.h +++ b/include/swift/SILOptimizer/Utils/StackNesting.h @@ -45,8 +45,11 @@ namespace swift { /// dealloc_stack %1 /// \endcode /// +/// Each allocation must still be properly jointly post-dominated by +/// its deallocations. StackNesting only fixes the nesting of allocations +/// deallocations; it does not insert required deallocations that are +/// missing entirely. class StackNesting { - public: /// The possible return values of fixNesting(). @@ -61,122 +64,6 @@ class StackNesting { CFG }; -private: - typedef SmallBitVector BitVector; - - /// Data stored for each block (actually for each block which is not dead). - struct BlockInfo { - /// The list of stack allocating/deallocating instructions in the block. - llvm::SmallVector StackInsts; - - /// The bit-set of alive stack locations at the block entry. - BitVector AliveStackLocsAtEntry; - - /// The bit-set of alive stack locations at the block exit. - BitVector AliveStackLocsAtExit; - - /// Used in the setup function to walk over the CFG. - bool visited = false; - - /// True for dead-end blocks, i.e. blocks from which there is no path to - /// a function exit, e.g. blocks which end with `unreachable` or an - /// infinite loop. - bool isDeadEnd = false; - }; - - /// Data stored for each stack location (= allocation). - /// - /// Each stack location is allocated by a single allocation instruction. - struct StackLoc { - StackLoc(SILInstruction *Alloc) : Alloc(Alloc) {} - - /// Back-link to the allocation instruction. - SILInstruction *Alloc; - - /// Bit-set which represents all alive locations at this allocation. - /// It obviously includes this location itself. And it includes all "outer" - /// locations which surround this location. - BitVector AliveLocs; - }; - - /// Mapping from stack allocations (= locations) to bit numbers. - llvm::DenseMap StackLoc2BitNumbers; - - /// The list of stack locations. The index into this array is also the bit - /// number in the bit-sets. - llvm::SmallVector StackLocs; - - BasicBlockData BlockInfos; - - StackNesting(SILFunction *F) : BlockInfos(F) { } - - /// Performs correction of stack nesting by moving stack-deallocation - /// instructions down the control flow. - /// - /// Returns the status of what changes were made. - Changes run(); - - /// For debug dumping. - void dump() const; - - static void dumpBits(const BitVector &Bits); - - /// Initializes the data structures. - void setup(); - - /// Solves the dataflow problem. - /// - /// Returns true if there is a nesting of locations in any way, which can - /// potentially in the wrong order. - bool solve(); - - bool analyze() { - setup(); - return solve(); - } - - /// Insert deallocation instructions for all locations which are alive before - /// the InsertionPoint (AliveBefore) but not alive after the InsertionPoint - /// (AliveAfter). - /// - /// Returns true if any deallocations were inserted. - bool insertDeallocs(const BitVector &AliveBefore, const BitVector &AliveAfter, - SILInstruction *InsertionPoint, - std::optional Location); - - /// Returns the location bit number for a stack allocation instruction. - int bitNumberForAlloc(SILInstruction *AllocInst) { - assert(AllocInst->isAllocatingStack()); - return StackLoc2BitNumbers[AllocInst]; - } - - /// Returns the location bit number for a stack deallocation instruction. - int bitNumberForDealloc(SILInstruction *DeallocInst) { - assert(DeallocInst->isDeallocatingStack()); - auto *AllocInst = getAllocForDealloc(DeallocInst); - return bitNumberForAlloc(AllocInst); - } - - /// Returns the stack allocation instruction for a stack deallocation - /// instruction. - SILInstruction *getAllocForDealloc(SILInstruction *Dealloc) const { - SILValue op = Dealloc->getOperand(0); - while (auto *mvi = dyn_cast(op)) { - op = mvi->getOperand(); - } - return op->getDefiningInstruction(); - } - - /// Insert deallocations at block boundaries. - Changes insertDeallocsAtBlockBoundaries(); - - /// Modifies the SIL to end up with a correct stack nesting. - /// - /// Returns the status of what changes were made. - Changes adaptDeallocs(); - -public: - /// Performs correction of stack nesting by moving stack-deallocation /// instructions down the control flow. /// diff --git a/lib/SILOptimizer/Utils/StackNesting.cpp b/lib/SILOptimizer/Utils/StackNesting.cpp index 505539ad494dc..d38946617e3d1 100644 --- a/lib/SILOptimizer/Utils/StackNesting.cpp +++ b/lib/SILOptimizer/Utils/StackNesting.cpp @@ -20,400 +20,445 @@ using namespace swift; -void StackNesting::setup() { - SmallVector WorkList; - - // Start with the function entry block and add blocks while walking down along - // the successor edges. - // This ensures a correct ordering of stack locations: an inner location has - // a higher bit-number than it's outer parent location. - // This ordering is only important for inserting multiple deallocation - // instructions (see below). - auto Entry = BlockInfos.entry(); - WorkList.push_back(&Entry.block); - Entry.data.visited = true; - - while (!WorkList.empty()) { - SILBasicBlock *Block = WorkList.pop_back_val(); - BlockInfo &BI = BlockInfos[Block]; - for (SILInstruction &I : *Block) { - if (I.isAllocatingStack()) { - auto Alloc = &I; - // Register this stack location. - unsigned CurrentBitNumber = StackLocs.size(); - StackLoc2BitNumbers[Alloc] = CurrentBitNumber; - StackLocs.push_back(StackLoc(Alloc)); - - BI.StackInsts.push_back(Alloc); - } else if (I.isDeallocatingStack()) { - auto *AllocInst = getAllocForDealloc(&I); - if (!BI.StackInsts.empty() && BI.StackInsts.back() == AllocInst) { - // As an optimization, we ignore perfectly nested alloc-dealloc pairs - // inside a basic block. - // Actually, this catches most of the cases and keeps our bitsets - // small. - assert(StackLocs.back().Alloc == AllocInst); - StackLocs.pop_back(); - BI.StackInsts.pop_back(); - } else { - // Register the stack deallocation. - BI.StackInsts.push_back(&I); - } +/// Run the given function exactly once on each of the reachable blocks in +/// a SIL function. Blocks will be visited in a post-order consistent with +/// dominance, which is to say, after all dominating blocks but otherwise +/// in an unspecified order. +/// +/// The function is passed a state value, which it can freely mutate. The +/// initial value of the state will be the same as the value left in the +/// state for an unspecified predecessor (or the initial value passed in, +/// for the entry block of the function). Since the predecessor choice is +/// arbitrary, you should only use this if the state is guaranteed to be +/// the same for all predecessors. The state type must be copyable, but +/// the algorithm makes a reasonable effort to avoid copying it. +/// +/// This function assumes you don't change the CFG during its operation. +template +void runInDominanceOrder(SILFunction &F, State &&state, const Fn &fn) { + // The set of blocks that have ever been enqueued onto the worklist. + // (We actually skip the queue in a bunch of cases, but *abstractly* + // they're enqueued.) + BasicBlockSet visitedBlocks; + + // The next basic block to operate on. We always operate on `state`. + SILBasicBlock *curBB = F.getEntryBlock(); + + // We need to copy `state` whenever we enqueue a block onto the worklist. + // We'll then move-assign it back to `state` when we dequeue it. + using StateValue = std::remove_reference_t; + SmallVector> worklist; + + while (true) { + // Run the function on the current block, updating the current state. + fn(curBB, state); + + // Enqueue the successors. + SILBasicBlock *nextBB = nullptr; + for (SILBasicBlock *succBB : curBB->getSuccessorBlocks()) { + // If this insertion returns true, we've already enqueued the + // successor block, so we can skip it. This is fast enough because + // of BasicBlockSet that there's no point in avoiding it for + // single-predecessor blocks. + if (!visitedBlocks.insert(succBB)) + continue; + + // If we haven't found a successor to visit yet, pick this one. + if (!nextBB) { + nextBB = succBB; + + // Otherwise, add it to the worklist, copying the current state. + } else { + worklist.emplace_back(succBB, /*copied*/ state); } } - for (SILBasicBlock *SuccBB : Block->getSuccessorBlocks()) { - BlockInfo &SuccBI = BlockInfos[SuccBB]; - if (!SuccBI.visited) { - // Push the next reachable block onto the WorkList. - WorkList.push_back(SuccBB); - SuccBI.visited = true; - } + + // If there's a viable direct successor, just continue along this + // path, editing the current state in-place. + if (nextBB) { + curBB = nextBB; + continue; } + + // Otherwise, if the worklist is empty, we're done. + if (worklist.empty()) { + return; + } + + // Otherwise, pull the next item off the worklist and overwrite the + // current state with the state we saved for it before. + auto &nextItem = worklist.back(); + curBB = nextItem.first; + state = std::move(nextItem.second); + worklist.pop_back(); } +} - unsigned NumLocs = StackLocs.size(); - for (unsigned Idx = 0; Idx < NumLocs; ++Idx) { - StackLocs[Idx].AliveLocs.resize(NumLocs); - // Initially each location gets it's own alive-bit. - StackLocs[Idx].AliveLocs.set(Idx); +/// Returns the stack allocation instruction for a stack deallocation +/// instruction. +static SILInstruction *getAllocForDealloc(SILInstruction *dealloc) { + SILValue op = dealloc->getOperand(0); + while (auto *mvi = dyn_cast(op)) { + op = mvi->getOperand(); } + return op->getDefiningInstruction(); } -bool StackNesting::solve() { - bool changed = false; - bool isNested = false; - BitVector Bits(StackLocs.size()); +/// Create a dealloc for a particular allocation. +/// +/// This is expected to work for all allocations that don't have +/// properly-nested deallocations. It's fine to have a kind of allocation +/// that you can't do this for, as long as as it's always explicitly +/// deallocated on all paths. This pass doesn't change any allocations +/// or deallocations that are properly nested already. +/// +/// Only allocations whose deallocations return true from canMoveDealloc +/// need to support this. +static void createDealloc(SILBuilder &B, SILLocation loc, SILInstruction *alloc) { + switch (alloc->getKind()) { + case SILInstructionKind::PartialApplyInst: + case SILInstructionKind::AllocStackInst: + assert((isa(alloc) || + cast(alloc)->isOnStack()) && + "wrong instruction"); + B.createDeallocStack(loc, cast(alloc)); + return; + case SILInstructionKind::BeginApplyInst: { + auto *bai = cast(alloc); + assert(bai->isCalleeAllocated()); + B.createDeallocStack(loc, bai->getCalleeAllocationResult()); + return; + } + case SILInstructionKind::AllocRefDynamicInst: + case SILInstructionKind::AllocRefInst: + assert(cast(alloc)->canAllocOnStack()); + B.createDeallocStackRef(loc, cast(alloc)); + return; + case SILInstructionKind::AllocPackInst: + B.createDeallocPack(loc, cast(alloc)); + return; + case SILInstructionKind::BuiltinInst: { + auto *bi = cast(alloc); + auto &ctx = alloc->getFunction()->getModule().getASTContext(); - StackList deadEndWorklist(BlockInfos.getFunction()); + switch (*bi->getBuiltinKind()) { + case BuiltinValueKind::StackAlloc: + case BuiltinValueKind::UnprotectedStackAlloc: { + auto identifier = + ctx.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc)); + B.createBuiltin(loc, identifier, + SILType::getEmptyTupleType(ctx), + SubstitutionMap(), {bi}); + return; + } + default: + llvm_unreachable("unknown stack allocation builtin"); + } + } + case SILInstructionKind::AllocPackMetadataInst: + B.createDeallocPackMetadata(loc, cast(alloc)); + return; + default: + llvm_unreachable("unknown stack allocation"); + } +} - // Initialize all bit fields to 1s, expect 0s for the entry block. - bool initVal = false; - for (auto bd : BlockInfos) { - bd.data.AliveStackLocsAtEntry.resize(StackLocs.size(), initVal); - initVal = true; +namespace { +class ActiveAllocation { + llvm::PointerIntPair valueAndIsPending; - bd.data.isDeadEnd = !bd.block.getTerminator()->isFunctionExiting(); - if (!bd.data.isDeadEnd) - deadEndWorklist.push_back(&bd.block); +public: + ActiveAllocation(SILInstruction *value) : valueAndIsPending(value, false) {} + + SILInstruction *getValue() const { + return valueAndIsPending.getPointer(); } - // Calculate the isDeadEnd block flags. - while (!deadEndWorklist.empty()) { - SILBasicBlock *b = deadEndWorklist.pop_back_val(); - for (SILBasicBlock *pred : b->getPredecessorBlocks()) { - BlockInfo &bi = BlockInfos[pred]; - if (bi.isDeadEnd) { - bi.isDeadEnd = false; - deadEndWorklist.push_back(pred); - } - } + bool isPending() const { + return valueAndIsPending.getInt(); } - // First step: do a forward dataflow analysis to get the live stack locations - // at the block exits. - // This is necessary to get the live locations at dead-end blocks (otherwise - // the backward data flow would be sufficient). - // The special thing about dead-end blocks is that it's okay to have alive - // locations at that point (e.g. at an `unreachable`) i.e. locations which are - // never dealloced. We cannot get such locations with a purely backward - // dataflow. - do { - changed = false; - - for (auto bd : BlockInfos) { - Bits = bd.data.AliveStackLocsAtEntry; - for (SILInstruction *StackInst : bd.data.StackInsts) { - if (StackInst->isAllocatingStack()) { - Bits.set(bitNumberForAlloc(StackInst)); - } else if (StackInst->isDeallocatingStack()) { - Bits.reset(bitNumberForDealloc(StackInst)); - } - } - if (Bits != bd.data.AliveStackLocsAtExit) { - bd.data.AliveStackLocsAtExit = Bits; - changed = true; - } - // Merge the bits into the successors. - for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) { - BlockInfos[SuccBB].AliveStackLocsAtEntry &= Bits; - } + void setPending() { + assert(!isPending()); + valueAndIsPending.setInt(true); + } +}; + +struct State { + // The active allocations and whether they're pending deallocation. + SmallVector allocations; + +#ifndef NDEBUG + SWIFT_ATTRIBUTE_NORETURN + void abortForUnknownAllocation(SILInstruction *alloc, + SILInstruction *dealloc) { + llvm::errs() << "fatal error: StackNesting could not find record of " + "allocation for deallocation:\n " + << *dealloc + << "Allocation might not be jointly post-dominated. " + "Current stack:\n"; + for (auto i : indices(allocations)) { + llvm::errs() << "[" << i << "] " + << (allocations[i].isPending() ? "(pending) " : "") + << *allocations[i].getValue(); } - } while (changed); + llvm::errs() << "Complete function:\n"; + alloc->getFunction()->dump(); + abort(); + } +#endif +}; - // Second step: do a backward dataflow analysis to extend the lifetimes of - // not properly nested allocations. - do { - changed = false; +} // end anonymous namespace - for (auto bd : llvm::reverse(BlockInfos)) { - // Collect the alive-bits (at the block exit) from the successor blocks. - for (SILBasicBlock *SuccBB : bd.block.getSuccessorBlocks()) { - bd.data.AliveStackLocsAtExit |= BlockInfos[SuccBB].AliveStackLocsAtEntry; - } - Bits = bd.data.AliveStackLocsAtExit; - assert(!(bd.data.visited && bd.block.getTerminator()->isFunctionExiting() - && Bits.any()) - && "stack location is missing dealloc"); - - if (bd.data.isDeadEnd) { - // We treat `unreachable` as an implicit deallocation for all locations - // which are still alive at this point. The same is true for dead-end - // CFG regions due to an infinite loop. - for (int BitNr = Bits.find_first(); BitNr >= 0; - BitNr = Bits.find_next(BitNr)) { - // For each alive location extend the lifetime of all locations which - // are alive at the allocation point. This is the same as we do for - // a "real" deallocation instruction (see below). - // In dead-end CFG regions we have to do that for all blocks (because - // of potential infinite loops), whereas in "normal" CFG regions it's - // sufficient to do it at deallocation instructions. - Bits |= StackLocs[BitNr].AliveLocs; - } - bd.data.AliveStackLocsAtExit = Bits; - } - for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) { - if (StackInst->isAllocatingStack()) { - int BitNr = bitNumberForAlloc(StackInst); - if (Bits != StackLocs[BitNr].AliveLocs) { - // More locations are alive around the StackInst's location. - // Update the AliveLocs bitset, which contains all those alive - // locations. - assert(Bits.test(BitNr) && "no dealloc found for alloc stack"); - StackLocs[BitNr].AliveLocs = Bits; - changed = true; - isNested = true; - } - // The allocation ends the lifetime of it's stack location (in reverse - // order) - Bits.reset(BitNr); - } else if (StackInst->isDeallocatingStack()) { - // A stack deallocation begins the lifetime of its location (in - // reverse order). And it also begins the lifetime of all other - // locations which are alive at the allocation point. - Bits |= StackLocs[bitNumberForDealloc(StackInst)].AliveLocs; - } - } - if (Bits != bd.data.AliveStackLocsAtEntry) { - bd.data.AliveStackLocsAtEntry = Bits; - changed = true; - } +using IndexForAllocationMap = llvm::DenseMap; + +/// Flag that a particular allocation is pending. +static void setAllocationAsPending(State &state, SILInstruction *alloc, + SILInstruction *dealloc, + IndexForAllocationMap &indexForAllocation) { + auto stack = MutableArrayRef(state.allocations); + assert(!stack.empty()); + + // Just ignore the top entry in all of this; we know it doesn't match + // the allocation. + assert(stack.back().getValue() != alloc); + stack = stack.drop_back(); + + // Ultimately, we're just calling setPending() on the entry matching + // `alloc` in the allocations stack. All the complexity has to do with + // trying to avoid super-linear behavior while also trying very hard + // to avoid actually using indexForAllocation for simple cases. + + // It's very common for allocations to never be improperly nested, + // so we don't want to eagerly add allocations to indexForAllocation + // when we encounter them. This means we can't rely on it having + // an entry for `alloc` now. + + // `alloc` is very likely to be close to the top of the stack. Just do + // a short linear scan there first. This might be slightly slower than + // a hash lookup in the worst case, but usually it means we can avoid + // adding any entries to indexForAllocation at all. Even for this case + // where nesting is broken, that's still worthwhile to do. + const size_t linearScanLimit = 8; + auto linearScanEntries = stack.take_back(linearScanLimit); + for (auto &entry : linearScanEntries) { + if (entry.getValue() == alloc) { + entry.setPending(); + return; } - } while (changed); - - return isNested; -} + } + + // Okay, so much for that, time for the hashtable. + +#ifndef NDEBUG + if (stack.size() <= linearScanLimit) { + state.abortForUnknownAllocation(alloc, dealloc); + } +#endif -static SILInstruction *createDealloc(SILInstruction *Alloc, - SILInstruction *InsertionPoint, - SILLocation Location) { - SILBuilderWithScope B(InsertionPoint); - switch (Alloc->getKind()) { - case SILInstructionKind::PartialApplyInst: - case SILInstructionKind::AllocStackInst: - assert((isa(Alloc) || - cast(Alloc)->isOnStack()) && - "wrong instruction"); - return B.createDeallocStack(Location, - cast(Alloc)); - case SILInstructionKind::BeginApplyInst: { - auto *bai = cast(Alloc); - assert(bai->isCalleeAllocated()); - return B.createDeallocStack(Location, bai->getCalleeAllocationResult()); + // We don't need to consider entries that we've already linearly scanned. + stack = stack.drop_back(linearScanLimit); + + // Check if the entry's already in the hashtable. + if (auto it = indexForAllocation.find(alloc); it != indexForAllocation.end()) { + auto index = it->second; + assert(stack[index].getValue() == alloc); + stack[index].setPending(); + return; + } + + // Fill in any missing entries in indexForAllocations. + // + // The invariant we maintain is that there may be allocations at the + // top of the stack that aren't hashed, but once we reach a hashed + // entry, everything beneath it is hashed. The first half of this + // is necessary because we don't eagerly add allocations to the table, + // but it's also what makes it okay that we skip the entries we + // linearly scanned. The second half of this means that, if we start + // adding entries from the top down, we can stop hashing once we find + // that the entries we're adding are redundant. That's what keeps this + // O(N). + // + // All of this caching is relying on us (1) never revisiting a block + // and (2) never changing the active-allocations stack except via push + // and pop. + + // Look for the target allocation index in this loop rather than doing + // a hash lookup at the end. + std::optional foundIndexForAlloc; + + for (size_t onePast = stack.size(); onePast != 0; --onePast) { + size_t entryIndex = onePast - 1; + auto entryAlloc = stack[entryIndex].getValue(); + + // Remember this if it's the allocation we're looking for. + if (entryAlloc == alloc) { + foundIndexForAlloc = entryIndex; } - case SILInstructionKind::AllocRefDynamicInst: - case SILInstructionKind::AllocRefInst: - assert(cast(Alloc)->canAllocOnStack()); - return B.createDeallocStackRef(Location, cast(Alloc)); - case SILInstructionKind::AllocPackInst: - return B.createDeallocPack(Location, cast(Alloc)); - case SILInstructionKind::BuiltinInst: { - auto *bi = cast(Alloc); - assert(bi->getBuiltinKind() == BuiltinValueKind::StackAlloc || - bi->getBuiltinKind() == BuiltinValueKind::UnprotectedStackAlloc); - auto &context = Alloc->getFunction()->getModule().getASTContext(); - auto identifier = - context.getIdentifier(getBuiltinName(BuiltinValueKind::StackDealloc)); - return B.createBuiltin(Location, identifier, - SILType::getEmptyTupleType(context), - SubstitutionMap(), {bi}); + + // Add this entry to the hashtable. Stop hashing as soon as this fails. + auto insertResult = indexForAllocation.insert({entryAlloc, entryIndex}); + if (!insertResult.second) { + continue; } - case SILInstructionKind::AllocPackMetadataInst: - return B.createDeallocPackMetadata(Location, - cast(Alloc)); - default: - llvm_unreachable("unknown stack allocation"); } + +#ifndef NDEBUG + if (!foundIndexForAlloc) { + state.abortForUnknownAllocation(alloc, dealloc); + } +#endif + + stack[*foundIndexForAlloc].setPending(); } -bool StackNesting::insertDeallocs(const BitVector &AliveBefore, - const BitVector &AliveAfter, - SILInstruction *InsertionPoint, - std::optional Location) { - if (!AliveBefore.test(AliveAfter)) - return false; - - // The order matters here if we have to insert more than one - // deallocation. We already ensured in setup() that the bit numbers - // are allocated in the right order. - bool changesMade = false; - for (int LocNr = AliveBefore.find_first(); LocNr >= 0; - LocNr = AliveBefore.find_next(LocNr)) { - if (!AliveAfter.test(LocNr)) { - auto *Alloc = StackLocs[LocNr].Alloc; - InsertionPoint = createDealloc(Alloc, InsertionPoint, - Location.has_value() ? Location.value() : Alloc->getLoc()); - changesMade = true; +/// Pop and emit deallocations for any allocations on top of the +/// active allocations stack that are pending deallocation. +/// +/// This operation is called whenever we pop an allocation; it +/// restores the invariant that the top of the stack is never in a +/// pending state. +static void emitPendingDeallocations(State &state, + SILInstruction *insertAfterDealloc, + bool &madeChanges) { + std::optional builder; + + while (!state.allocations.empty() && + state.allocations.back().isPending()) { + auto entry = state.allocations.pop_back_val(); + SILInstruction *alloc = entry.getValue(); + + // Create a builder that inserts after the initial dealloc, if we + // haven't already. Re-using the same builder for subsequent deallocs + // means we order them correctly w.r.t each other, which we wouldn't + // if we made a fresh builder after the initial dealloc each time. + if (!builder) { + // We want to use the location of (and inherit debug scopes from) + // the initial dealloc that we're inserting after. + builder.emplace(/*insertion point*/ + std::next(insertAfterDealloc->getIterator()), + /*inherit scope from*/insertAfterDealloc); } + + createDealloc(*builder, insertAfterDealloc->getLoc(), alloc); + madeChanges = true; } - return changesMade; } -// Insert deallocations at block boundaries. -// This can be necessary for unreachable blocks. Example: -// -// %1 = alloc_stack -// %2 = alloc_stack -// cond_br %c, bb2, bb3 -// bb2: <--- need to insert a dealloc_stack %2 at the begin of bb2 -// dealloc_stack %1 -// unreachable -// bb3: -// dealloc_stack %2 -// dealloc_stack %1 -StackNesting::Changes StackNesting::insertDeallocsAtBlockBoundaries() { - Changes changes = Changes::None; - for (auto bd : llvm::reverse(BlockInfos)) { - // Collect the alive-bits (at the block exit) from the successor blocks. - for (auto succAndIdx : llvm::enumerate(bd.block.getSuccessorBlocks())) { - BlockInfo &SuccBI = BlockInfos[succAndIdx.value()]; - if (SuccBI.AliveStackLocsAtEntry == bd.data.AliveStackLocsAtExit) - continue; +/// The main entrypoint for clients. +StackNesting::Changes StackNesting::fixNesting(SILFunction *F) { + bool madeChanges = false; + + // The index in the allocation stack for each allocation. Multiple + // allocations can map to the same index, since ultimately it's a stack; + // we should be looking up an allocation while it's still in use. + // This is very lazily filled in, because we don't want to do unnecessary + // work if nothing is unscoped. See setAllocationAsPending for invariants. + // This function never accesses it directly. + IndexForAllocationMap indexForAllocation; + + // Visit each block of the function in an order consistent with dominance. + // The state represents the stack of active allocations, so it's appropriate + // that it starts with an empty stack. We're not worried about states + // potentially being different for different paths to the same block because + // that can only happen if deallocations don't properly post-dominate + // their allocations. + runInDominanceOrder(*F, State(), [&](SILBasicBlock *B, State &state) { + + // We can't use a foreach loop because we sometimes remove the + // current instruction or add instructions (that we shouldn't visit) + // after it. Advancing the iterator immediately within the loop is + // sufficient to protect against both. + for (auto II = B->begin(), IE = B->end(); II != IE; ) { + SILInstruction *I = &*II++; + + // Invariant: the top of the stack is never pending. + assert(state.allocations.empty() || + !state.allocations.back().isPending()); - // Insert deallocations for all locations which are alive at the end of - // the current block, but not at the begin of the successor block. - SILBasicBlock *InsertionBlock = succAndIdx.value(); - if (!InsertionBlock->getSinglePredecessorBlock()) { - // If the current block is not the only predecessor of the successor - // block, we have to insert a new block where we can add the - // deallocations. - InsertionBlock = splitEdge(bd.block.getTerminator(), succAndIdx.index()); - changes = Changes::CFG; + // Push allocations onto the current stack in the non-pending state. + if (I->isAllocatingStack()) { + state.allocations.push_back(I); + continue; } - if (insertDeallocs(bd.data.AliveStackLocsAtExit, - SuccBI.AliveStackLocsAtEntry, &InsertionBlock->front(), - std::nullopt)) { - if (changes == Changes::None) - changes = Changes::Instructions; + + // Ignore instructions other than allocations and deallocations. + if (!I->isDeallocatingStack()) { + continue; } - } - } - return changes; -} -StackNesting::Changes StackNesting::adaptDeallocs() { - bool InstChanged = false; - BitVector Bits(StackLocs.size()); - - // Visit all blocks. Actually the order doesn't matter, but let's to it in - // the same order as in solve(). - for (auto bd : llvm::reverse(BlockInfos)) { - Bits = bd.data.AliveStackLocsAtExit; - - // Insert/remove deallocations inside blocks. - for (SILInstruction *StackInst : llvm::reverse(bd.data.StackInsts)) { - if (StackInst->isAllocatingStack()) { - // For allocations we just update the bit-set. - int BitNr = bitNumberForAlloc(StackInst); - assert(Bits == StackLocs[BitNr].AliveLocs && - "dataflow didn't converge"); - Bits.reset(BitNr); - } else if (StackInst->isDeallocatingStack()) { - // Handle deallocations. - SILLocation Loc = StackInst->getLoc(); - int BitNr = bitNumberForDealloc(StackInst); - SILInstruction *InsertionPoint = &*std::next(StackInst->getIterator()); - if (Bits.test(BitNr)) { - // The location of StackInst is alive after StackInst. So we have to - // remove this deallocation. - StackInst->eraseFromParent(); - InstChanged = true; - } else { - // Avoid inserting another deallocation for BitNr (which is already - // StackInst). - Bits.set(BitNr); - } - - // Insert deallocations for all locations which are not alive after - // StackInst but _are_ alive at the StackInst. - InstChanged |= insertDeallocs(StackLocs[BitNr].AliveLocs, Bits, - InsertionPoint, Loc); - Bits |= StackLocs[BitNr].AliveLocs; + // Get the allocation for the deallocation. + SILInstruction *dealloc = I; + SILInstruction *alloc = getAllocForDealloc(dealloc); + +#ifndef NDEBUG + if (state.allocations.empty()) { + state.abortForUnknownAllocation(alloc, dealloc); } - } - assert(Bits == bd.data.AliveStackLocsAtEntry && "dataflow didn't converge"); - } - return InstChanged ? Changes::Instructions : Changes::None; -} +#endif -StackNesting::Changes StackNesting::fixNesting(SILFunction *F) { - Changes changes = Changes::None; - { - StackNesting SN(F); - if (!SN.analyze()) - return Changes::None; - - // Insert deallocs at block boundaries. This might be necessary in CFG sub - // graphs which don't reach a function exit, but only an unreachable. - changes = SN.insertDeallocsAtBlockBoundaries(); - if (changes == Changes::None) { - // Do the real work: extend lifetimes by moving deallocs. - return SN.adaptDeallocs(); - } - } - { - // Those inserted deallocs make it necessary to re-compute the analysis. - StackNesting SN(F); - SN.analyze(); - // Do the real work: extend lifetimes by moving deallocs. - return std::max(SN.adaptDeallocs(), changes); - } -} + // If the allocation is the top of the allocations stack, we can + // leave it alone. + if (alloc == state.allocations.back().getValue()) { + // Pop off our record of the allocation. + state.allocations.pop_back(); + + // We may need to emit any pending deallocations still on the stack. + // Pop and emit them in order. + emitPendingDeallocations(state, /*after*/ dealloc, madeChanges); -void StackNesting::dump() const { - for (auto bd : BlockInfos) { - llvm::dbgs() << "Block " << bd.block.getDebugID(); - if (bd.data.isDeadEnd) - llvm::dbgs() << "(deadend)"; - llvm::dbgs() << ": entry-bits="; - dumpBits(bd.data.AliveStackLocsAtEntry); - llvm::dbgs() << ": exit-bits="; - dumpBits(bd.data.AliveStackLocsAtExit); - llvm::dbgs() << '\n'; - for (SILInstruction *StackInst : bd.data.StackInsts) { - if (StackInst->isAllocatingStack()) { - auto AllocInst = StackInst; - int BitNr = StackLoc2BitNumbers.lookup(AllocInst); - llvm::dbgs() << " alloc #" << BitNr << ": alive="; - dumpBits(StackLocs[BitNr].AliveLocs); - llvm::dbgs() << ", " << *StackInst; - } else if (StackInst->isDeallocatingStack()) { - auto *AllocInst = getAllocForDealloc(StackInst); - int BitNr = StackLoc2BitNumbers.lookup(AllocInst); - llvm::dbgs() << " dealloc for #" << BitNr << "\n" - " " << *StackInst; + continue; } + + // Otherwise, just remove the deallocation and set the allocation + // as having a pending deallocation on this path. + // + // When we mark `alloc` as having a pending deallocation, we are + // deferring its deallocation on this path to the deallocation + // points of some non-pending allocation that's on top of it on the + // allocation stack. That may not end up being the current top of + // the stack: + // + // - the current top may itself get deferred later, e.g. + // + // %alloc = alloc_stack $Int + // %top = alloc_stack $Int + // dealloc_stack %alloc // gets deferred + // %new = alloc_stack $Int + // dealloc_stack %top // gets deferred + // dealloc_stack %new // dealloc %alloc and %top after this + // + // - there might be some other allocation between `alloc` and the + // current that "inherits" the deferral after we dealloc the + // current top, e.g.: + // + // %alloc = alloc_stack $Int + // %middle = alloc_stack $Int + // %top = alloc_stack $Int + // dealloc_stack %alloc // gets deferred + // dealloc_stack %top + // dealloc_stack %middle // dealloc %alloc after this + // + // The key is that, no matter what happens, there's always a last + // thing on the stack above `alloc` that hasn't been deferred yet. + // Joint post-dominance means that, on every path we can reach + // from this point, we'll either eventually reach a dealloc for + // that last undeferred allocation (in which case, we'll deallocate + // `alloc` then), or we'll reach a dead-end (in which case it's fine + // that we never deallocated `alloc`). And whatever those points + // are, replacing this deallocation of `alloc` with those points + // will re-establish the joint post-dominance of `alloc` by its + // deallocations with respect to this path. + + dealloc->eraseFromParent(); + madeChanges = true; + setAllocationAsPending(state, alloc, dealloc, indexForAllocation); } - } -} + }); -void StackNesting::dumpBits(const BitVector &Bits) { - llvm::dbgs() << '<'; - const char *separator = ""; - for (int Bit = Bits.find_first(); Bit >= 0; Bit = Bits.find_next(Bit)) { - llvm::dbgs() << separator << Bit; - separator = ","; - } - llvm::dbgs() << '>'; + // We never make changes to the CFG. + return (madeChanges ? Changes::Instructions : Changes::None); } namespace swift::test { diff --git a/test/SILOptimizer/allocbox_to_stack.sil b/test/SILOptimizer/allocbox_to_stack.sil index 26f568cf31549..4ffe484cf2156 100644 --- a/test/SILOptimizer/allocbox_to_stack.sil +++ b/test/SILOptimizer/allocbox_to_stack.sil @@ -1086,6 +1086,9 @@ bb3: %3 = load %as1 : $*Bool unreachable } +// StackNesting used to split the edge to bb3, but that's not really +// necessary. Also, this input very arguably should not be allowed +// because there isn't a consistent stack depth on entry to bb3. // CHECK-LABEL: sil @nesting_and_unreachable_critical_edge // CHECK: bb0(%0 : $Int): // CHECK-NEXT: [[BOX:%[0-9]+]] = alloc_stack $Int @@ -1095,17 +1098,15 @@ bb3: // CHECK-NEXT: [[STACK2:%[0-9]+]] = alloc_stack $Bool // CHECK-NEXT: cond_br // CHECK: bb2: -// CHECK-NEXT: dealloc_stack [[STACK2]] -// CHECK-NEXT: br bb4 -// CHECK: bb3: // CHECK: store {{%[0-9]+}} // CHECK: dealloc_stack [[STACK2]] // CHECK-NEXT: dealloc_stack [[STACK1]] // CHECK-NEXT: dealloc_stack [[BOX]] // CHECK-NEXT: tuple // CHECK-NEXT: return -// CHECK: bb4: +// CHECK: bb3: // CHECK-NEXT: unreachable +// CHECK: // end sil function 'nesting_and_unreachable_critical_edge' sil @nesting_and_unreachable_critical_edge : $(Int) -> () { bb0(%0 : $Int): %1 = alloc_box ${ var Int } diff --git a/test/SILOptimizer/stack_promotion.sil b/test/SILOptimizer/stack_promotion.sil index 40edd69d12807..dfb588229ff91 100644 --- a/test/SILOptimizer/stack_promotion.sil +++ b/test/SILOptimizer/stack_promotion.sil @@ -1058,19 +1058,32 @@ bb0(%0 : $Int, %another: $Array): return %24 : $() } - +// The input to StackNesting seems sketchy here; it isn't really +// obeying joint post-dominance, and is just skating by under the +// laxer rules for blocks that don't exit. // CHECK-LABEL: sil @promote_with_unreachable_block_nest_bug -// CHECK: bb3: -// CHECK: dealloc_stack_ref %{{.*}} -// CHECK: bb4: -// CHECK: br bb5 -// CHECK: bb5: -// CHECK: dealloc_stack_ref %{{.*}} -// CHECK: bb6: -// CHECK: dealloc_stack_ref %{{.*}} -// CHECK: bb11: -// CHECK: alloc_ref [stack] $XX -// CHECK: return +// CHECK: bb3: +// CHECK-NEXT: strong_release [[APPLY_RESULT:%[0-9]+]] +// CHECK-NEXT: dealloc_stack_ref [[ALLOC:%[0-9]+]] +// CHECK-NEXT: br bb6 +// CHECK: bb4: +// CHECK-NEXT: integer_literal +// CHECK-NEXT: struct +// CHECK-NEXT: br bb13 +// CHECK: bb5: +// CHECK-NEXT: strong_release [[APPLY_RESULT]] +// CHECK-NEXT: dealloc_stack_ref [[ALLOC]] +// CHECK-NEXT: br bb6 +// CHECK: bb10: +// CHECK-NEXT: [[ALLOC]] = alloc_ref [stack] $XX +// CHECK-NEXT: // function_ref +// CHECK-NEXT: function_ref +// CHECK-NEXT: [[APPLY_RESULT]] = apply +// CHECK-NEXT: ref_element_addr +// CHECK-NEXT: load +// CHECK-NEXT: cond_br undef, bb2, bb4 +// CHECK: bb13( +// CHECK-NEXT: unreachable sil @promote_with_unreachable_block_nest_bug : $@convention(thin) () -> Int32 { bb0: %0 = alloc_stack $Builtin.Int32 // user: %30