Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 108 additions & 22 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -660,42 +660,122 @@ Non-Integral Pointer Type
Note: non-integral pointer types are a work in progress, and they should be
considered experimental at this time.

LLVM IR optionally allows the frontend to denote pointers in certain address
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
Non-integral pointer types represent pointers that have an *unspecified* bitwise
representation; that is, the integral representation may be target dependent or
unstable (not backed by a fixed integer).
For most targets, the pointer representation is a direct mapping from the
bitwise representation to the address of the underlying memory allocation.
Such pointers are considered "integral", and any pointers where the
representation is not just an integer address are called "non-integral".

In most cases pointers with a non-integral representation behave exactly the
same as an integral pointer, the only difference is that it is not possible to
create a pointer just from an address unless all the non-address bits were
also recreated correctly in a target-specific way.
Since the address width of a non-integral pointer is not equal to the bitwise
representation, extracting the address will need to truncate to the index width
of the pointer.
An example of such a non-integral pointer representation are the AMDGPU buffer
descriptors which are a 128-bit fat pointer and a 32-bit offset.

Additionally, LLVM IR optionally allows the frontend to denote pointers in
certain address spaces as "unstable" or having "external state"
(or combinations of these) via the :ref:`datalayout string<langref_datalayout>`.

The exact implications of these properties are target-specific, but the
following IR semantics and restrictions to optimization passes apply:

Unstable pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers in this address space have an *unspecified* bitwise representation
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
allowed to change in a target-specific way. For example, this could be a pointer
type used with copying garbage collection where the garbage collector could
update the pointer at any time in the collection sweep.

``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
integral (i.e., normal) pointers in that they convert integers to and from
corresponding pointer types, but there are additional implications to be
aware of. Because the bit-representation of a non-integral pointer may
not be stable, two identical casts of the same operand may or may not
corresponding pointer types, but there are additional implications to be aware
of.

For "unstable" pointer representations, the bit-representation of the pointer
may not be stable, so two identical casts of the same operand may or may not
return the same value. Said differently, the conversion to or from the
non-integral type depends on environmental state in an implementation
"unstable" pointer type depends on environmental state in an implementation
defined manner.

If the frontend wishes to observe a *particular* value following a cast, the
generated IR must fence with the underlying environment in an implementation
defined manner. (In practice, this tends to require ``noinline`` routines for
such operations.)

From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
non-integral types are analogous to ones on integral types with one
"unstable" pointer types are analogous to ones on integral types with one
key exception: the optimizer may not, in general, insert new dynamic
occurrences of such casts. If a new cast is inserted, the optimizer would
need to either ensure that a) all possible values are valid, or b)
appropriate fencing is inserted. Since the appropriate fencing is
implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
such as :ref:`llvm.ptrmask<int_ptrmask>`.

The alignment information provided by the frontend for a non-integral pointer
The alignment information provided by the frontend for an "unstable" pointer
(typically using attributes or metadata) must be valid for every possible
representation of the pointer.

Non-integral pointers with external state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A further special case of non-integral pointers is ones that include external
state (such as bounds information or a type tag) with a target-defined size.
An example of such a type is a CHERI capability, where there is an additional
validity bit that is part of all pointer-typed registers, but is located in
memory at an implementation-defined address separate from the pointer itself.
Another example would be a fat-pointer scheme where pointers remain plain
integers, but the associated bounds are stored in an out-of-band table.

Unless also marked as "unstable", the bit-wise representation of pointers with
external state is stable and ``ptrtoint(x)`` always yields a deterministic
value. This means transformation passes are still permitted to insert new
``ptrtoint`` instructions.

The following restrictions apply to IR level optimization passes:

The ``inttoptr`` instruction does not recreate the external state and therefore
it is target dependent whether it can be used to create a dereferenceable
pointer. In general passes should assume that the result of such an inttoptr
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
yield a capability with the external state (the validity tag bit) set to zero,
which will cause any dereference to trap.
The ``ptrtoint`` instruction also only returns the "in-band" state and omits
all external state.
These two properties mean that ``inttoptr(ptrtoint(x))`` cannot be folded to
``x`` since the ``ptrtoint`` operation does not include the external state
needed to reconstruct the original pointer and ``inttoptr`` cannot set it.

When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
is performed, the external metadata is also stored to an implementation-defined
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
external metadata and make it available for all uses of ``%val``.
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
external state. This is essential to allow frontends to efficiently emit copies
of structures containing such pointers, since expanding all these copies as
individual loads and stores would affect compilation speed and inhibit
optimizations.

Notionally, these external bits are part of the pointer, but since
``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
and the external bits are not explicitly exposed, they are not included in the
size specified in the :ref:`datalayout string<langref_datalayout>`.

When a pointer type has external state, all roundtrips via memory must
be performed as loads and stores of the correct type since stores of other
types may not propagate the external data.
Therefore it is not legal to convert an existing load/store (or a
``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
state to a load/store of an integer type with same bitwidth, as that may drop
the external state.


.. _globalvars:

Global Variables
Expand Down Expand Up @@ -3179,8 +3259,8 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
``p[n]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``n``.
``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``as``.
The ``<size>`` parameter specifies the size of the bitwise representation.
For :ref:`non-integral pointers <nointptrtype>` the representation size may
be larger than the address width of the underlying address space (e.g. to
Expand All @@ -3193,9 +3273,14 @@ as follows:
default index size is equal to the pointer size.
The index size also specifies the width of addresses in this address space.
All sizes are in bits.
The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^24).
The address space, ``<as>``, is optional, and if not specified, denotes the
default address space 0. The value of ``<as>`` must be in the range [1,2^24).
The optional ``<flags>`` are used to specify properties of pointers in this
address space: the character ``u`` marks pointers as having an unstable
representation, ``n`` marks pointers as non-integral (i.e. having
additional metadata), ``e`` marks pointers having external state
(``n`` must also be set). See :ref:`Non-Integral Pointer Types <nointptrtype>`.

``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
Expand Down Expand Up @@ -3248,9 +3333,11 @@ as follows:
this set are considered to support most general arithmetic operations
efficiently.
``ni:<address space0>:<address space1>:<address space2>...``
This specifies pointer types with the specified address spaces
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
address space cannot be specified as non-integral.
This marks pointer types with the specified address spaces
as :ref:`non-integral and unstable <nointptrtype>`.
The ``0`` address space cannot be specified as non-integral.
It is only supported for backwards compatibility, the flags of the ``p``
specifier should be used instead for new code.

``<abi>`` is a lower bound on what is required for a type to be considered
aligned. This is used in various places, such as:
Expand Down Expand Up @@ -31402,4 +31489,3 @@ Semantics:

The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.

111 changes: 100 additions & 11 deletions llvm/include/llvm/IR/DataLayout.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,21 @@ class DataLayout {
uint32_t BitWidth;
Align ABIAlign;
Align PrefAlign;
/// The index bit width also defines the address size in this address space.
/// If the index width is less than the representation bit width, the
/// pointer is non-integral and bits beyond the index width could be used
/// for additional metadata (e.g. AMDGPU buffer fat pointers with bounds
/// and other flags or CHERI capabilities that contain bounds+permissions.
uint32_t IndexBitWidth;
/// Pointers in this address space don't have a well-defined bitwise
/// representation (e.g. may be relocated by a copying garbage collector).
/// Additionally, they may also be non-integral (i.e. containing additional
/// metadata such as bounds information/permissions).
bool IsNonIntegral;
/// representation (e.g. they may be relocated by a copying garbage
/// collector and thus have different addresses at different times).
bool HasUnstableRepresentation;
/// Pointers in this address space have additional state bits that are
/// located at a target-defined location when stored in memory. An example
/// of this would be CHERI capabilities where the validity bit is stored
/// separately from the pointer address+bounds information.
bool HasExternalState;
LLVM_ABI bool operator==(const PointerSpec &Other) const;
};

Expand Down Expand Up @@ -149,7 +158,7 @@ class DataLayout {
/// Sets or updates the specification for pointer in the given address space.
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
Align PrefAlign, uint32_t IndexBitWidth,
bool IsNonIntegral);
bool HasUnstableRepr, bool HasExternalState);

/// Internal helper to get alignment for integer of given bitwidth.
LLVM_ABI Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
Expand Down Expand Up @@ -355,30 +364,110 @@ class DataLayout {
/// \sa DataLayout::getAddressSizeInBits
unsigned getAddressSize(unsigned AS) const { return getIndexSize(AS); }

/// Return the address spaces containing non-integral pointers. Pointers in
/// this address space don't have a well-defined bitwise representation.
SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
/// Return the address spaces with special pointer semantics (such as being
/// unstable or non-integral).
SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
SmallVector<unsigned, 8> AddrSpaces;
for (const PointerSpec &PS : PointerSpecs) {
if (PS.IsNonIntegral)
if (PS.HasUnstableRepresentation || PS.HasExternalState ||
PS.BitWidth != PS.IndexBitWidth)
AddrSpaces.push_back(PS.AddrSpace);
}
return AddrSpaces;
}

/// Returns whether this address space has a non-integral pointer
/// representation, i.e. the pointer is not just an integer address but some
/// other bitwise representation. When true, passes cannot assume that all
/// bits of the representation map directly to the allocation address.
/// NOTE: This also returns true for "unstable" pointers where the
/// representation may be just an address, but this value can change at any
/// given time (e.g. due to copying garbage collection).
/// Examples include AMDGPU buffer descriptors
/// with a 128-bit fat pointer and a 32-bit offset or CHERI capabilities that
/// contain bounds, permissions and an out-of-band validity bit.
///
/// In general, more specialized functions such as shouldAvoidIntToPtr(),
/// shouldAvoidPtrToInt(), or hasExternalState() should be preferred over
/// this one when reasoning about the behavior of IR analysis/transforms.
/// TODO: should remove/deprecate this once all uses have migrated.
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).IsNonIntegral;
const auto &PS = getPointerSpec(AddrSpace);
return PS.BitWidth != PS.IndexBitWidth || PS.HasUnstableRepresentation;
}

/// Returns whether this address space has an "unstable" pointer
/// representation. The bitwise pattern of such pointers is allowed to change
/// in a target-specific way. For example, this could be used for copying
/// garbage collection where the garbage collector could update the pointer
/// value as part of the collection sweep.
bool hasUnstableRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasUnstableRepresentation;
}
bool hasUnstableRepresentation(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && hasUnstableRepresentation(PTy->getPointerAddressSpace());
}

/// Returns whether this address space has external state (implies having
/// a non-integral pointer representation).
/// These pointer types must be loaded and stored using appropriate
/// instructions and cannot use integer loads/stores as this would not
/// propagate the out-of-band state. An example of such a pointer type is a
/// CHERI capability that contain bounds, permissions and an out-of-band
/// validity bit that is invalidated whenever an integer/FP store is performed
/// to the associated memory location.
bool hasExternalState(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasExternalState;
}
bool hasExternalState(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && hasExternalState(PTy->getPointerAddressSpace());
}

/// Returns whether passes should avoid introducing `inttoptr` instructions
/// for this address space.
///
/// This is currently the case for non-integral pointer representations with
/// external state (hasExternalState()) since `inttoptr` cannot recreate the
/// external state bits.
/// New `inttoptr` instructions should also be avoided for "unstable" bitwise
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
}

/// Returns whether passes should avoid introducing `ptrtoint` instructions
/// for this address space.
///
/// This is currently the case for pointer address spaces that have an
/// "unstable" representation (hasUnstableRepresentation()) since the
/// bitwise pattern of such pointers could change unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidPtrToInt(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace);
}

bool isNonIntegralPointerType(PointerType *PT) const {
return isNonIntegralAddressSpace(PT->getAddressSpace());
}

bool isNonIntegralPointerType(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty);
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && isNonIntegralPointerType(PTy);
}

bool shouldAvoidPtrToInt(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && shouldAvoidPtrToInt(PTy->getPointerAddressSpace());
}

bool shouldAvoidIntToPtr(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && shouldAvoidIntToPtr(PTy->getPointerAddressSpace());
}

/// The size in bits of the pointer representation in a given address space.
/// This is not necessarily the same as the integer address of a pointer (e.g.
/// for fat pointers).
Expand Down
13 changes: 7 additions & 6 deletions llvm/lib/Analysis/ConstantFolding.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -951,21 +951,22 @@ Constant *SymbolicallyEvaluateGEP(const GEPOperator *GEP,

// If the base value for this address is a literal integer value, fold the
// getelementptr to the resulting integer value casted to the pointer type.
APInt BasePtr(DL.getPointerTypeSizeInBits(Ptr->getType()), 0);
APInt BaseIntVal(DL.getPointerTypeSizeInBits(Ptr->getType()), 0);
if (auto *CE = dyn_cast<ConstantExpr>(Ptr)) {
if (CE->getOpcode() == Instruction::IntToPtr) {
if (auto *Base = dyn_cast<ConstantInt>(CE->getOperand(0)))
BasePtr = Base->getValue().zextOrTrunc(BasePtr.getBitWidth());
BaseIntVal = Base->getValue().zextOrTrunc(BaseIntVal.getBitWidth());
}
}

auto *PTy = cast<PointerType>(Ptr->getType());
if ((Ptr->isNullValue() || BasePtr != 0) &&
!DL.isNonIntegralPointerType(PTy)) {
if ((Ptr->isNullValue() || BaseIntVal != 0) &&
!DL.shouldAvoidIntToPtr(Ptr->getType())) {

// If the index size is smaller than the pointer size, add to the low
// bits only.
BasePtr.insertBits(BasePtr.trunc(BitWidth) + Offset, 0);
Constant *C = ConstantInt::get(Ptr->getContext(), BasePtr);
BaseIntVal.insertBits(BaseIntVal.trunc(BitWidth) + Offset, 0);
Constant *C = ConstantInt::get(Ptr->getContext(), BaseIntVal);
return ConstantExpr::getIntToPtr(C, ResTy);
}

Expand Down
Loading
Loading