Skip to content

Commit 1f64871

Browse files
committed
MemAccessUtils: unify Box/Class/Tail storage for consistency and usability
It was originally convenient for exclusivity optimization to treat boxes specially. We wanted to know that the 'Box' kind was always uniquely identified. But that's not really important. And now that AccessedStorage is being used more generally, the inconsistency is problematic. A consistent model is also must easier to understand and explain. This also make the implementation of the utility simpler and more powerful. Functional changes: isRCIdentical will look through mark_dependence and mark_uninitialized. findReferenceRoot is used consistently everywhere increasing analysis precision.
1 parent 88df783 commit 1f64871

File tree

8 files changed

+193
-130
lines changed

8 files changed

+193
-130
lines changed

docs/SILProgrammersManual.md

Lines changed: 60 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,9 @@ idioms, it becomes overly burdensome to evolve these APIs over time.
161161

162162
## `AccessedStorage` and `AccessPath`
163163

164+
TODO: move this section to a separate document and refer to it from
165+
SIL.rst.
166+
164167
The `AccessedStorage` and `AccessPath` types formalize memory access
165168
in SIL. Given an address-typed SIL value, it is possible to
166169
reliably identify the storage location of the accessed
@@ -193,17 +196,17 @@ address is immutable for the duration of its access scope
193196

194197
Computing `AccessedStorage` and `AccessPath` for any given SIL address
195198
involves a use-def traversal to determine the origin of the
196-
address. It may traverse operations on address, pointer, box, and
197-
reference types. The logic that formalizes which SIL operations may be
198-
involved in the def-use chain is encapsulated with the
199-
`AccessUseDefChainVisitor`. The traversal can be customized by
200-
implementing this visitor. Customization is not expected to change the
201-
meaning of AccessedStorage or AccessPath. Rather, it is intended for
202-
additional pass-specific book-keeping or for higher-level convenience
203-
APIs that operate on the use-def chain bypassing AccessedStorage
204-
completely.
205-
206-
Access def-use chains are divided by four points: the "root", the
199+
address. It may traverse operations on values of type address,
200+
Builtin.RawPointer, box, and reference. The logic that
201+
formalizes which SIL operations may be involved in the def-use chain
202+
is encapsulated with the `AccessUseDefChainVisitor`. The traversal can
203+
be customized by implementing this visitor. Customization is not
204+
expected to change the meaning of AccessedStorage or
205+
AccessPath. Rather, it is intended for additional pass-specific
206+
book-keeping or for higher-level convenience APIs that operate on the
207+
use-def chain bypassing AccessedStorage completely.
208+
209+
Access def-use chains are divided by four points: the object "root", the
207210
access "base", the outer-most "access" scope, and the "address" of a
208211
memory operation. For example:
209212
```
@@ -222,18 +225,28 @@ memory operation. For example:
222225
end_access %access : $*S
223226
```
224227

228+
OR
229+
230+
```
231+
%root = alloc_box $S
232+
%base = project_box %root : ${ var S }
233+
%access = begin_access [read] [static] %base : $*S
234+
%address = struct_element_addr %access : $*S, #.field
235+
%value = load [trivial] %address : $*Int64
236+
end_access %access : $*S
237+
```
238+
225239
#### Reference root
226240

227241
The first part of the def-use chain computes the formal access base
228-
from the root of the object (e.g. `alloc_ref ->
229-
ref_element_addr`). The reference root might be a locally allocated
230-
object, a function argument, a function result, or a reference loaded
231-
from storage. There is no enforcement on the type of operation that
232-
can produce a reference; however, only reference types or
233-
Builtin.BridgeObject types are only allowed in this part of the
242+
from the root of the object (e.g. `alloc_ref -> ref_element_addr` and
243+
`alloc_box -> project_box`). The reference root might be a locally
244+
allocated object, a function argument, a function result, or a
245+
reference loaded from storage. There is no enforcement on the type of
246+
operation that can produce a reference; however, only reference types, Builtin.BridgeObject types, and box types are allowed in this part of the
234247
def-use chain. The reference root is the greatest common ancestor in
235248
the def-use graph that can identify an object by a single SILValue. If
236-
the root as an `alloc_ref`, then it is *uniquely identified*. The
249+
the root is an `alloc_ref`, then it is *uniquely identified*. The
237250
def-use chain from the root to the base may contain reference casts
238251
(`isRCIdentityPreservingCast`) and phis.
239252

@@ -268,29 +281,45 @@ formal access base. The reference root is only one component of an
268281
`AccessedStorage` location. AccessedStorage also identifies the class
269282
property being accessed within that object.
270283

284+
A reference root may be borrowed, so the use-def path from the base to
285+
the root may cross a borrow scope. This means that uses of one base
286+
may not be replaced with a different base even if it has the same
287+
AccessedStorage because they may not be contained within the same
288+
borrow scope. However, this is the only part of the access path that
289+
may be borrowed. Address uses with the same base can be substituted
290+
without checking the borrow scope.
291+
271292
#### Access base
272293

273-
The access base is the SILValue produced by an instruction that
274-
directly identifies the kind of storage being accessed without further
275-
use-def traversal. Common access bases are `alloc_box`, `alloc_stack`,
276-
`global_addr`, `ref_element_addr`, and function arguments (see
294+
The access base is the address or Builtin.RawPointer type SILValue
295+
produced by an instruction that directly identifies the kind of
296+
storage being accessed without further use-def traversal. Common
297+
access bases are `alloc_stack`, `global_addr`,
298+
`ref_element_addr`, `project_box`, and function arguments (see
277299
`AccessedStorage::Kind`).
278300

279301
The access base is the same as the "root" SILValue for all storage
280-
kinds except global and class storage. Global storage has no root. For
281-
class storage the root is the SILValue that identifies object,
282-
described as the "reference root" above.
302+
kinds except global and reference storage. Reference storage includes
303+
class, tail and box storage. Global storage has no root. For reference
304+
storage the root is the SILValue that identifies object, described as
305+
the "reference root" above.
283306

284307
"Box" storage is uniquely identified by an `alloc_box`
285308
instruction. Therefore, we consider the `alloc_box` to be the base of
286309
the access. Box storage does not apply to all box types or box
287310
projections, which may instead originate from arguments or indirect
288311
enums for example.
289312

313+
An access scope, identified by a `begin_access` marker, may only occur
314+
on the def-use path between the access base and any address
315+
projections. The def-use path from the root to the base cannot cross
316+
an access scope. Likewise, the def-use between an access projection
317+
and the memory operation cannot cross an access scope.
318+
290319
Typically, the base is the address-type source operand of a
291320
`begin_access`. However, the path from the access base to the
292321
`begin_access` may include *storage casts* (see
293-
`isAccessedStorageCast`). It may involve address, pointer, and box
322+
`isAccessedStorageCast`). It may involve address an pointer
294323
types, and may traverse phis. For some kinds of storage, the base may
295324
itself even be a non-address pointer. For phis that cannot be uniquely
296325
resolved, the base may even be a box type.
@@ -322,9 +351,9 @@ which address storage is always uniquely determined. Currently, if a
322351
(non-address) phi on the access path from `base` to `access` does not
323352
have a common base, then it is considered an invalid access (the
324353
AccessedStorage object is not valid). SIL verification ensures that a
325-
formal access always has valid AccessedStorage (WIP). In other words, the
326-
source of a `begin_access` marker must be a single, non-phi base. In
327-
the future, for further simplicity, we may generally disallow box and
354+
formal access always has valid AccessedStorage (WIP). In other words,
355+
the source of a `begin_access` marker must be a single, non-phi
356+
base. In the future, for further simplicity, we may also disallow
328357
pointer phis unless they have a common base.
329358

330359
Not all SIL memory access is part of a formal access, but the
@@ -334,8 +363,8 @@ the use-def search does not begin at a `begin_access` marker. For
334363
non-formal access, SIL verification is not as strict. An invalid
335364
access is allowed, but handled conservatively. This is safe as long as
336365
those non-formal accesses can never alias with class and global
337-
storage. Class and global access is always guarded by formal access
338-
markers--at least until static markers are stripped from SIL.
366+
storage. Class and global access must always be guarded by formal
367+
access markers--at least until static markers are stripped from SIL.
339368

340369
#### Nested access
341370

include/swift/SIL/MemAccessUtils.h

Lines changed: 55 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
/// 3. getAccessBase(): Find the ultimate base of any address corresponding to
4545
/// the accessed object, regardless of whether the address is nested within
4646
/// access scopes, and regardless of any storage casts. This returns either an
47-
/// address type, pointer type, or box type, but never a reference type.
47+
/// address or pointer type, but never a reference or box type.
4848
/// Each object's property or its tail storage is separately accessed.
4949
///
5050
/// For better identification an access base, use
@@ -166,6 +166,11 @@ SILValue getAccessScope(SILValue address);
166166
/// return the same base address, then they must also have the same storage.
167167
SILValue getAccessBase(SILValue address);
168168

169+
/// Find the root of a reference, which may be a non-trivial type, box type, or
170+
/// BridgeObject. This is guaranteed to be consistent with
171+
/// AccessedStorage::getRoot() and AccessPath::getRoot().
172+
SILValue findReferenceRoot(SILValue ref);
173+
169174
/// Return true if \p address points to a let-variable.
170175
///
171176
/// let-variables are only written during let-variable initialization, which is
@@ -237,7 +242,8 @@ enum class AccessUseType { Exact, Inner, Overlapping };
237242
/// represent any arbitrary base address--it must at least been proven not to
238243
/// correspond to any class or global variable access, unless it's nested within
239244
/// another access to the same object. So, Unidentified can overlap with
240-
/// Class/Global access, but it cannot be the only formal access to that memory.
245+
/// Class/Global access because it was derived from another Class/Global access,
246+
/// but Unidentified can never be the only formal access to that memory.
241247
///
242248
/// An *invalid* AccessedStorage object is Unidentified and associated with an
243249
/// invalid SILValue. This signals that analysis has failed to recognize an
@@ -268,7 +274,7 @@ class AccessedStorage {
268274
Global,
269275
Class,
270276
Tail,
271-
Argument,
277+
Argument, // Address or RawPointer argument
272278
Yield,
273279
Nested,
274280
Unidentified,
@@ -280,22 +286,11 @@ class AccessedStorage {
280286
// Give object tail storage a fake large property index for convenience.
281287
static constexpr unsigned TailIndex = std::numeric_limits<int>::max();
282288

283-
/// Directly create an AccessedStorage for class or tail property access.
284-
static AccessedStorage forClass(SILValue object, unsigned propertyIndex) {
285-
AccessedStorage storage;
286-
if (propertyIndex == TailIndex)
287-
storage.initKind(Tail);
288-
else
289-
storage.initKind(Class, propertyIndex);
290-
storage.value = object;
291-
return storage;
292-
}
293-
294289
/// Return an AccessedStorage value that best identifies a formally accessed
295290
/// variable pointed to by \p sourceAddress, looking through any nested
296291
/// formal accesses to find the underlying storage.
297292
///
298-
/// \p sourceAddress may be an address, pointer, or box type.
293+
/// \p sourceAddress may be an address type or Builtin.RawPointer.
299294
///
300295
/// If \p sourceAddress is within a formal access scope, which does not have
301296
/// "Unsafe" enforcement, then this always returns valid storage.
@@ -308,7 +303,7 @@ class AccessedStorage {
308303
/// Return an AccessedStorage object that identifies formal access scope that
309304
/// immediately encloses \p sourceAddress.
310305
///
311-
/// \p sourceAddress may be an address, pointer, or box type.
306+
/// \p sourceAddress may be an address type or Builtin.RawPointer.
312307
///
313308
/// If \p sourceAddress is within a formal access scope, this always returns a
314309
/// valid "Nested" storage value.
@@ -317,6 +312,14 @@ class AccessedStorage {
317312
/// the formal storage if possible, otherwise returning invalid storage.
318313
static AccessedStorage computeInScope(SILValue sourceAddress);
319314

315+
/// Create storage for the tail elements of \p object.
316+
static AccessedStorage forObjectTail(SILValue object) {
317+
AccessedStorage storage;
318+
storage.initKind(Tail, TailIndex);
319+
storage.value = findReferenceRoot(object);
320+
return storage;
321+
}
322+
320323
protected:
321324
// Checking the storage kind is far more common than other fields. Make sure
322325
// it can be byte load with no shift.
@@ -389,7 +392,7 @@ class AccessedStorage {
389392
SILGlobalVariable *global;
390393
};
391394

392-
void initKind(Kind k, unsigned elementIndex = InvalidElementIndex) {
395+
void initKind(Kind k, unsigned elementIndex) {
393396
Bits.opaqueBits = 0;
394397
Bits.AccessedStorage.kind = k;
395398
Bits.AccessedStorage.elementIndex = elementIndex;
@@ -401,7 +404,7 @@ class AccessedStorage {
401404
}
402405

403406
public:
404-
AccessedStorage() : value() { initKind(Unidentified); }
407+
AccessedStorage() : value() { initKind(Unidentified, InvalidElementIndex); }
405408

406409
AccessedStorage(SILValue base, Kind kind);
407410

@@ -418,6 +421,7 @@ class AccessedStorage {
418421

419422
SILValue getValue() const {
420423
assert(getKind() != Global && getKind() != Class && getKind() != Tail);
424+
assert(value && "Invalid storage has an invalid value");
421425
return value;
422426
}
423427

@@ -436,7 +440,9 @@ class AccessedStorage {
436440
return global;
437441
}
438442

439-
bool isReference() const { return getKind() == Class || getKind() == Tail; }
443+
bool isReference() const {
444+
return getKind() == Box || getKind() == Class || getKind() == Tail;
445+
}
440446

441447
SILValue getObject() const {
442448
assert(isReference());
@@ -447,6 +453,15 @@ class AccessedStorage {
447453
return getElementIndex();
448454
}
449455

456+
/// Return a new AccessedStorage for Class/Tail/Box access based on
457+
/// existing storage and a new object.
458+
AccessedStorage transformReference(SILValue object) const {
459+
AccessedStorage storage;
460+
storage.initKind(getKind(), getElementIndex());
461+
storage.value = findReferenceRoot(object);
462+
return storage;
463+
}
464+
450465
/// Return the address or reference root that the storage was based
451466
/// on. Returns an invalid SILValue for globals or invalid storage.
452467
SILValue getRoot() const {
@@ -457,7 +472,7 @@ class AccessedStorage {
457472
case AccessedStorage::Argument:
458473
case AccessedStorage::Yield:
459474
case AccessedStorage::Unidentified:
460-
return getValue(); // Can be invalid for Unidentified storage.
475+
return getValue();
461476
case AccessedStorage::Global:
462477
return SILValue();
463478
case AccessedStorage::Class:
@@ -511,6 +526,7 @@ class AccessedStorage {
511526
bool isLocal() const {
512527
switch (getKind()) {
513528
case Box:
529+
return isa<AllocBoxInst>(value);
514530
case Stack:
515531
return true;
516532
case Global:
@@ -538,6 +554,7 @@ class AccessedStorage {
538554
bool isUniquelyIdentified() const {
539555
switch (getKind()) {
540556
case Box:
557+
return isa<AllocBoxInst>(value);
541558
case Stack:
542559
case Global:
543560
return true;
@@ -686,11 +703,14 @@ template <> struct DenseMapInfo<swift::AccessedStorage> {
686703

687704
static unsigned getHashValue(swift::AccessedStorage storage) {
688705
switch (storage.getKind()) {
706+
case swift::AccessedStorage::Unidentified:
707+
if (!storage)
708+
return DenseMapInfo<swift::SILValue>::getHashValue(swift::SILValue());
709+
LLVM_FALLTHROUGH;
689710
case swift::AccessedStorage::Box:
690711
case swift::AccessedStorage::Stack:
691712
case swift::AccessedStorage::Nested:
692713
case swift::AccessedStorage::Yield:
693-
case swift::AccessedStorage::Unidentified:
694714
return DenseMapInfo<swift::SILValue>::getHashValue(storage.getValue());
695715
case swift::AccessedStorage::Argument:
696716
return storage.getParamIndex();
@@ -1008,15 +1028,19 @@ class AccessPath {
10081028
// recover the def-use chain for a specific global_addr or ref_element_addr.
10091029
struct AccessPathWithBase {
10101030
AccessPath accessPath;
1011-
// The address-type value that is the base of the formal access. For
1012-
// class storage, it is the ref_element_addr. For global storage it is the
1013-
// global_addr or initializer apply. For other storage, it is the same as
1014-
// accessPath.getRoot().
1031+
// The address-type value that is the base of the formal access. For class
1032+
// storage, it is the ref_element_addr; for box storage, the project_box; for
1033+
// global storage the global_addr or initializer apply. For other
1034+
// storage, it is the same as accessPath.getRoot().
10151035
//
1016-
// Note: base may be invalid for global_addr -> address_to_pointer -> phi
1017-
// patterns, while the accessPath is still valid.
1036+
// Note: base may be invalid for phi patterns, even though the accessPath is
1037+
// valid because we don't currently keep track of multiple bases. Multiple
1038+
// bases for the same storage can happen with global_addr, ref_element_addr,
1039+
// ref_tail_addr, and project_box.
10181040
//
1019-
// FIXME: add a structural requirement to SIL so base is always valid in OSSA.
1041+
// FIXME: add a structural requirement to SIL/OSSA so valid storage has
1042+
// a single base. For most cases, it is as simple by sinking the
1043+
// projection. For index_addr, it may require hoisting ref_tail_addr.
10201044
SILValue base;
10211045

10221046
/// Compute the access path at \p address, and record the access base. This
@@ -1309,14 +1333,7 @@ inline bool isAccessedStorageCast(SingleValueInstruction *svi) {
13091333
case SILInstructionKind::MarkUninitializedInst:
13101334
case SILInstructionKind::UncheckedAddrCastInst:
13111335
case SILInstructionKind::MarkDependenceInst:
1312-
// Look through a project_box to identify the underlying alloc_box as the
1313-
// accesed object. It must be possible to reach either the alloc_box or the
1314-
// containing enum in this loop, only looking through simple value
1315-
// propagation such as copy_value and begin_borrow.
1316-
case SILInstructionKind::ProjectBoxInst:
1317-
case SILInstructionKind::ProjectBlockStorageInst:
13181336
case SILInstructionKind::CopyValueInst:
1319-
case SILInstructionKind::BeginBorrowInst:
13201337
// Casting to RawPointer does not affect the AccessPath. When converting
13211338
// between address types, they must be layout compatible (with truncation).
13221339
case SILInstructionKind::AddressToPointerInst:
@@ -1366,7 +1383,7 @@ class AccessUseDefChainVisitor {
13661383
Result visitArgumentAccess(SILFunctionArgument *arg) {
13671384
return asImpl().visitBase(arg, AccessedStorage::Argument);
13681385
}
1369-
Result visitBoxAccess(AllocBoxInst *box) {
1386+
Result visitBoxAccess(ProjectBoxInst *box) {
13701387
return asImpl().visitBase(box, AccessedStorage::Box);
13711388
}
13721389
/// \p global may be either a GlobalAddrInst or the ApplyInst for a global
@@ -1414,9 +1431,8 @@ Result AccessUseDefChainVisitor<Impl, Result>::visit(SILValue sourceAddr) {
14141431

14151432
// MARK: Handle immediately-identifiable instructions.
14161433

1417-
// An AllocBox is a fully identified memory location.
1418-
case ValueKind::AllocBoxInst:
1419-
return asImpl().visitBoxAccess(cast<AllocBoxInst>(sourceAddr));
1434+
case ValueKind::ProjectBoxInst:
1435+
return asImpl().visitBoxAccess(cast<ProjectBoxInst>(sourceAddr));
14201436

14211437
// An AllocStack is a fully identified memory location, which may occur
14221438
// after inlining code already subjected to stack promotion.

0 commit comments

Comments
 (0)