Skip to content

Commit cccf6db

Browse files
committed
[𝘀𝗽𝗿] initial version
Created using spr 1.3.8-beta.1
2 parents efd96af + acd5695 commit cccf6db

File tree

10 files changed

+645
-82
lines changed

10 files changed

+645
-82
lines changed

llvm/docs/LangRef.rst

Lines changed: 108 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -660,42 +660,122 @@ Non-Integral Pointer Type
660660
Note: non-integral pointer types are a work in progress, and they should be
661661
considered experimental at this time.
662662

663-
LLVM IR optionally allows the frontend to denote pointers in certain address
664-
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
665-
Non-integral pointer types represent pointers that have an *unspecified* bitwise
666-
representation; that is, the integral representation may be target dependent or
667-
unstable (not backed by a fixed integer).
663+
For most targets, the pointer representation is a direct mapping from the
664+
bitwise representation to the address of the underlying memory allocation.
665+
Such pointers are considered "integral", and any pointers where the
666+
representation is not just an integer address are called "non-integral".
667+
668+
In most cases pointers with a non-integral representation behave exactly the
669+
same as an integral pointer, the only difference is that it is not possible to
670+
create a pointer just from an address unless all the non-address bits were
671+
also recreated correctly in a target-specific way.
672+
Since the address width of a non-integral pointer is not equal to the bitwise
673+
representation, extracting the address will need to truncate to the index width
674+
of the pointer.
675+
An example of such a non-integral pointer representation are the AMDGPU buffer
676+
descriptors which are a 128-bit fat pointer and a 32-bit offset.
677+
678+
Additionally, LLVM IR optionally allows the frontend to denote pointers in
679+
certain address spaces as "unstable" or having "external state"
680+
(or combinations of these) via the :ref:`datalayout string<langref_datalayout>`.
681+
682+
The exact implications of these properties are target-specific, but the
683+
following IR semantics and restrictions to optimization passes apply:
684+
685+
Unstable pointer representation
686+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
687+
688+
Pointers in this address space have an *unspecified* bitwise representation
689+
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
690+
allowed to change in a target-specific way. For example, this could be a pointer
691+
type used with copying garbage collection where the garbage collector could
692+
update the pointer at any time in the collection sweep.
668693

669694
``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
670695
integral (i.e., normal) pointers in that they convert integers to and from
671-
corresponding pointer types, but there are additional implications to be
672-
aware of. Because the bit-representation of a non-integral pointer may
673-
not be stable, two identical casts of the same operand may or may not
696+
corresponding pointer types, but there are additional implications to be aware
697+
of.
698+
699+
For "unstable" pointer representations, the bit-representation of the pointer
700+
may not be stable, so two identical casts of the same operand may or may not
674701
return the same value. Said differently, the conversion to or from the
675-
non-integral type depends on environmental state in an implementation
702+
"unstable" pointer type depends on environmental state in an implementation
676703
defined manner.
677-
678704
If the frontend wishes to observe a *particular* value following a cast, the
679705
generated IR must fence with the underlying environment in an implementation
680706
defined manner. (In practice, this tends to require ``noinline`` routines for
681707
such operations.)
682708

683709
From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
684-
non-integral types are analogous to ones on integral types with one
710+
"unstable" pointer types are analogous to ones on integral types with one
685711
key exception: the optimizer may not, in general, insert new dynamic
686712
occurrences of such casts. If a new cast is inserted, the optimizer would
687713
need to either ensure that a) all possible values are valid, or b)
688714
appropriate fencing is inserted. Since the appropriate fencing is
689715
implementation defined, the optimizer can't do the latter. The former is
690716
challenging as many commonly expected properties, such as
691-
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
717+
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
692718
Similar restrictions apply to intrinsics that might examine the pointer bits,
693719
such as :ref:`llvm.ptrmask<int_ptrmask>`.
694720

695-
The alignment information provided by the frontend for a non-integral pointer
721+
The alignment information provided by the frontend for an "unstable" pointer
696722
(typically using attributes or metadata) must be valid for every possible
697723
representation of the pointer.
698724

725+
Non-integral pointers with external state
726+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
727+
728+
A further special case of non-integral pointers is ones that include external
729+
state (such as bounds information or a type tag) with a target-defined size.
730+
An example of such a type is a CHERI capability, where there is an additional
731+
validity bit that is part of all pointer-typed registers, but is located in
732+
memory at an implementation-defined address separate from the pointer itself.
733+
Another example would be a fat-pointer scheme where pointers remain plain
734+
integers, but the associated bounds are stored in an out-of-band table.
735+
736+
Unless also marked as "unstable", the bit-wise representation of pointers with
737+
external state is stable and ``ptrtoint(x)`` always yields a deterministic
738+
value. This means transformation passes are still permitted to insert new
739+
``ptrtoint`` instructions.
740+
741+
The following restrictions apply to IR level optimization passes:
742+
743+
The ``inttoptr`` instruction does not recreate the external state and therefore
744+
it is target dependent whether it can be used to create a dereferenceable
745+
pointer. In general passes should assume that the result of such an inttoptr
746+
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
747+
yield a capability with the external state (the validity tag bit) set to zero,
748+
which will cause any dereference to trap.
749+
The ``ptrtoint`` instruction also only returns the "in-band" state and omits
750+
all external state.
751+
These two properties mean that ``inttoptr(ptrtoint(x))`` cannot be folded to
752+
``x`` since the ``ptrtoint`` operation does not include the external state
753+
needed to reconstruct the original pointer and ``inttoptr`` cannot set it.
754+
755+
When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
756+
is performed, the external metadata is also stored to an implementation-defined
757+
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
758+
external metadata and make it available for all uses of ``%val``.
759+
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
760+
external state. This is essential to allow frontends to efficiently emit copies
761+
of structures containing such pointers, since expanding all these copies as
762+
individual loads and stores would affect compilation speed and inhibit
763+
optimizations.
764+
765+
Notionally, these external bits are part of the pointer, but since
766+
``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
767+
and the external bits are not explicitly exposed, they are not included in the
768+
size specified in the :ref:`datalayout string<langref_datalayout>`.
769+
770+
When a pointer type has external state, all roundtrips via memory must
771+
be performed as loads and stores of the correct type since stores of other
772+
types may not propagate the external data.
773+
Therefore it is not legal to convert an existing load/store (or a
774+
``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
775+
state to a load/store of an integer type with same bitwidth, as that may drop
776+
the external state.
777+
778+
699779
.. _globalvars:
700780

701781
Global Variables
@@ -3179,8 +3259,8 @@ as follows:
31793259
``A<address space>``
31803260
Specifies the address space of objects created by '``alloca``'.
31813261
Defaults to the default address space of 0.
3182-
``p[n]:<size>:<abi>[:<pref>[:<idx>]]``
3183-
This specifies the properties of a pointer in address space ``n``.
3262+
``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]``
3263+
This specifies the properties of a pointer in address space ``as``.
31843264
The ``<size>`` parameter specifies the size of the bitwise representation.
31853265
For :ref:`non-integral pointers <nointptrtype>` the representation size may
31863266
be larger than the address width of the underlying address space (e.g. to
@@ -3193,9 +3273,14 @@ as follows:
31933273
default index size is equal to the pointer size.
31943274
The index size also specifies the width of addresses in this address space.
31953275
All sizes are in bits.
3196-
The address space, ``n``, is optional, and if not specified,
3197-
denotes the default address space 0. The value of ``n`` must be
3198-
in the range [1,2^24).
3276+
The address space, ``<as>``, is optional, and if not specified, denotes the
3277+
default address space 0. The value of ``<as>`` must be in the range [1,2^24).
3278+
The optional ``<flags>`` are used to specify properties of pointers in this
3279+
address space: the character ``u`` marks pointers as having an unstable
3280+
representation, ``n`` marks pointers as non-integral (i.e. having
3281+
additional metadata), ``e`` marks pointers having external state
3282+
(``n`` must also be set). See :ref:`Non-Integral Pointer Types <nointptrtype>`.
3283+
31993284
``i<size>:<abi>[:<pref>]``
32003285
This specifies the alignment for an integer type of a given bit
32013286
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -3248,9 +3333,11 @@ as follows:
32483333
this set are considered to support most general arithmetic operations
32493334
efficiently.
32503335
``ni:<address space0>:<address space1>:<address space2>...``
3251-
This specifies pointer types with the specified address spaces
3252-
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
3253-
address space cannot be specified as non-integral.
3336+
This marks pointer types with the specified address spaces
3337+
as :ref:`non-integral and unstable <nointptrtype>`.
3338+
The ``0`` address space cannot be specified as non-integral.
3339+
It is only supported for backwards compatibility, the flags of the ``p``
3340+
specifier should be used instead for new code.
32543341

32553342
``<abi>`` is a lower bound on what is required for a type to be considered
32563343
aligned. This is used in various places, such as:
@@ -31402,4 +31489,3 @@ Semantics:
3140231489

3140331490
The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
3140431491
as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.
31405-

llvm/include/llvm/IR/DataLayout.h

Lines changed: 100 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,21 @@ class DataLayout {
7777
uint32_t BitWidth;
7878
Align ABIAlign;
7979
Align PrefAlign;
80+
/// The index bit width also defines the address size in this address space.
81+
/// If the index width is less than the representation bit width, the
82+
/// pointer is non-integral and bits beyond the index width could be used
83+
/// for additional metadata (e.g. AMDGPU buffer fat pointers with bounds
84+
/// and other flags or CHERI capabilities that contain bounds+permissions.
8085
uint32_t IndexBitWidth;
8186
/// Pointers in this address space don't have a well-defined bitwise
82-
/// representation (e.g. may be relocated by a copying garbage collector).
83-
/// Additionally, they may also be non-integral (i.e. containing additional
84-
/// metadata such as bounds information/permissions).
85-
bool IsNonIntegral;
87+
/// representation (e.g. they may be relocated by a copying garbage
88+
/// collector and thus have different addresses at different times).
89+
bool HasUnstableRepresentation;
90+
/// Pointers in this address space have additional state bits that are
91+
/// located at a target-defined location when stored in memory. An example
92+
/// of this would be CHERI capabilities where the validity bit is stored
93+
/// separately from the pointer address+bounds information.
94+
bool HasExternalState;
8695
LLVM_ABI bool operator==(const PointerSpec &Other) const;
8796
};
8897

@@ -149,7 +158,7 @@ class DataLayout {
149158
/// Sets or updates the specification for pointer in the given address space.
150159
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
151160
Align PrefAlign, uint32_t IndexBitWidth,
152-
bool IsNonIntegral);
161+
bool HasUnstableRepr, bool HasExternalState);
153162

154163
/// Internal helper to get alignment for integer of given bitwidth.
155164
LLVM_ABI Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -355,30 +364,110 @@ class DataLayout {
355364
/// \sa DataLayout::getAddressSizeInBits
356365
unsigned getAddressSize(unsigned AS) const { return getIndexSize(AS); }
357366

358-
/// Return the address spaces containing non-integral pointers. Pointers in
359-
/// this address space don't have a well-defined bitwise representation.
360-
SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
367+
/// Return the address spaces with special pointer semantics (such as being
368+
/// unstable or non-integral).
369+
SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
361370
SmallVector<unsigned, 8> AddrSpaces;
362371
for (const PointerSpec &PS : PointerSpecs) {
363-
if (PS.IsNonIntegral)
372+
if (PS.HasUnstableRepresentation || PS.HasExternalState ||
373+
PS.BitWidth != PS.IndexBitWidth)
364374
AddrSpaces.push_back(PS.AddrSpace);
365375
}
366376
return AddrSpaces;
367377
}
368378

379+
/// Returns whether this address space has a non-integral pointer
380+
/// representation, i.e. the pointer is not just an integer address but some
381+
/// other bitwise representation. When true, passes cannot assume that all
382+
/// bits of the representation map directly to the allocation address.
383+
/// NOTE: This also returns true for "unstable" pointers where the
384+
/// representation may be just an address, but this value can change at any
385+
/// given time (e.g. due to copying garbage collection).
386+
/// Examples include AMDGPU buffer descriptors
387+
/// with a 128-bit fat pointer and a 32-bit offset or CHERI capabilities that
388+
/// contain bounds, permissions and an out-of-band validity bit.
389+
///
390+
/// In general, more specialized functions such as shouldAvoidIntToPtr(),
391+
/// shouldAvoidPtrToInt(), or hasExternalState() should be preferred over
392+
/// this one when reasoning about the behavior of IR analysis/transforms.
393+
/// TODO: should remove/deprecate this once all uses have migrated.
369394
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
370-
return getPointerSpec(AddrSpace).IsNonIntegral;
395+
const auto &PS = getPointerSpec(AddrSpace);
396+
return PS.BitWidth != PS.IndexBitWidth || PS.HasUnstableRepresentation;
397+
}
398+
399+
/// Returns whether this address space has an "unstable" pointer
400+
/// representation. The bitwise pattern of such pointers is allowed to change
401+
/// in a target-specific way. For example, this could be used for copying
402+
/// garbage collection where the garbage collector could update the pointer
403+
/// value as part of the collection sweep.
404+
bool hasUnstableRepresentation(unsigned AddrSpace) const {
405+
return getPointerSpec(AddrSpace).HasUnstableRepresentation;
406+
}
407+
bool hasUnstableRepresentation(Type *Ty) const {
408+
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
409+
return PTy && hasUnstableRepresentation(PTy->getPointerAddressSpace());
410+
}
411+
412+
/// Returns whether this address space has external state (implies having
413+
/// a non-integral pointer representation).
414+
/// These pointer types must be loaded and stored using appropriate
415+
/// instructions and cannot use integer loads/stores as this would not
416+
/// propagate the out-of-band state. An example of such a pointer type is a
417+
/// CHERI capability that contain bounds, permissions and an out-of-band
418+
/// validity bit that is invalidated whenever an integer/FP store is performed
419+
/// to the associated memory location.
420+
bool hasExternalState(unsigned AddrSpace) const {
421+
return getPointerSpec(AddrSpace).HasExternalState;
422+
}
423+
bool hasExternalState(Type *Ty) const {
424+
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
425+
return PTy && hasExternalState(PTy->getPointerAddressSpace());
426+
}
427+
428+
/// Returns whether passes should avoid introducing `inttoptr` instructions
429+
/// for this address space.
430+
///
431+
/// This is currently the case for non-integral pointer representations with
432+
/// external state (hasExternalState()) since `inttoptr` cannot recreate the
433+
/// external state bits.
434+
/// New `inttoptr` instructions should also be avoided for "unstable" bitwise
435+
/// representations (hasUnstableRepresentation()) unless the pass knows it is
436+
/// within a critical section that retains the current representation.
437+
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
438+
return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
439+
}
440+
441+
/// Returns whether passes should avoid introducing `ptrtoint` instructions
442+
/// for this address space.
443+
///
444+
/// This is currently the case for pointer address spaces that have an
445+
/// "unstable" representation (hasUnstableRepresentation()) since the
446+
/// bitwise pattern of such pointers could change unless the pass knows it is
447+
/// within a critical section that retains the current representation.
448+
bool shouldAvoidPtrToInt(unsigned AddrSpace) const {
449+
return hasUnstableRepresentation(AddrSpace);
371450
}
372451

373452
bool isNonIntegralPointerType(PointerType *PT) const {
374453
return isNonIntegralAddressSpace(PT->getAddressSpace());
375454
}
376455

377456
bool isNonIntegralPointerType(Type *Ty) const {
378-
auto *PTy = dyn_cast<PointerType>(Ty);
457+
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
379458
return PTy && isNonIntegralPointerType(PTy);
380459
}
381460

461+
bool shouldAvoidPtrToInt(Type *Ty) const {
462+
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
463+
return PTy && shouldAvoidPtrToInt(PTy->getPointerAddressSpace());
464+
}
465+
466+
bool shouldAvoidIntToPtr(Type *Ty) const {
467+
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
468+
return PTy && shouldAvoidIntToPtr(PTy->getPointerAddressSpace());
469+
}
470+
382471
/// The size in bits of the pointer representation in a given address space.
383472
/// This is not necessarily the same as the integer address of a pointer (e.g.
384473
/// for fat pointers).

llvm/lib/Analysis/ConstantFolding.cpp

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -951,21 +951,22 @@ Constant *SymbolicallyEvaluateGEP(const GEPOperator *GEP,
951951

952952
// If the base value for this address is a literal integer value, fold the
953953
// getelementptr to the resulting integer value casted to the pointer type.
954-
APInt BasePtr(DL.getPointerTypeSizeInBits(Ptr->getType()), 0);
954+
APInt BaseIntVal(DL.getPointerTypeSizeInBits(Ptr->getType()), 0);
955955
if (auto *CE = dyn_cast<ConstantExpr>(Ptr)) {
956956
if (CE->getOpcode() == Instruction::IntToPtr) {
957957
if (auto *Base = dyn_cast<ConstantInt>(CE->getOperand(0)))
958-
BasePtr = Base->getValue().zextOrTrunc(BasePtr.getBitWidth());
958+
BaseIntVal = Base->getValue().zextOrTrunc(BaseIntVal.getBitWidth());
959959
}
960960
}
961961

962962
auto *PTy = cast<PointerType>(Ptr->getType());
963-
if ((Ptr->isNullValue() || BasePtr != 0) &&
964-
!DL.isNonIntegralPointerType(PTy)) {
963+
if ((Ptr->isNullValue() || BaseIntVal != 0) &&
964+
!DL.shouldAvoidIntToPtr(Ptr->getType())) {
965+
965966
// If the index size is smaller than the pointer size, add to the low
966967
// bits only.
967-
BasePtr.insertBits(BasePtr.trunc(BitWidth) + Offset, 0);
968-
Constant *C = ConstantInt::get(Ptr->getContext(), BasePtr);
968+
BaseIntVal.insertBits(BaseIntVal.trunc(BitWidth) + Offset, 0);
969+
Constant *C = ConstantInt::get(Ptr->getContext(), BaseIntVal);
969970
return ConstantExpr::getIntToPtr(C, ResTy);
970971
}
971972

0 commit comments

Comments
 (0)