Skip to content

[DataLayout][LangRef] Split non-integral and unstable pointer properties #105735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: users/arichardson/spr/main.datalayoutlangref-split-non-integral-and-unstable-pointer-properties
Choose a base branch
from
Open
143 changes: 119 additions & 24 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -650,48 +650,136 @@ literal types are uniqued in recent versions of LLVM.

.. _nointptrtype:

Non-Integral Pointer Type
-------------------------
Non-Integral and Unstable Pointer Types
---------------------------------------

Note: non-integral pointer types are a work in progress, and they should be
considered experimental at this time.
Note: non-integral/unstable pointer types are a work in progress, and they
should be considered experimental at this time.

LLVM IR optionally allows the frontend to denote pointers in certain address
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
Non-integral pointer types represent pointers that have an *unspecified* bitwise
representation; that is, the integral representation may be target dependent or
unstable (not backed by a fixed integer).
spaces as "unstable", "non-integral", or "non-integral with external state"
(or combinations of these) via the :ref:`datalayout string<langref_datalayout>`.

The exact implications of these properties are target-specific, but the
following IR semantics and restrictions to optimization passes apply:

Unstable pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers in this address space have an *unspecified* bitwise representation
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
allowed to change in a target-specific way. For example, this could be a pointer
type used with copying garbage collection where the garbage collector could
update the pointer at any time in the collection sweep.

``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
integral (i.e. normal) pointers in that they convert integers to and from
corresponding pointer types, but there are additional implications to be
aware of. Because the bit-representation of a non-integral pointer may
not be stable, two identical casts of the same operand may or may not
corresponding pointer types, but there are additional implications to be aware
of.

For "unstable" pointer representations, the bit-representation of the pointer
may not be stable, so two identical casts of the same operand may or may not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this applies to only an SSA value of an unstable pointer type? What about an in-memory value with the unstable pointer type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with how GC pointers are used in LLVM, I just tried to split out the existing "copying GC" non-integral pointers properties into a separate property to allow for "fat pointers", CHERI capabilities, etc to use non-integral pointers without incurring all the restrictions imposed by GC pointers.

Not sure who is best to comment on this, probably someone from azul who has worked on it recently.

return the same value. Said differently, the conversion to or from the
non-integral type depends on environmental state in an implementation
"unstable" pointer type depends on environmental state in an implementation
defined manner.

If the frontend wishes to observe a *particular* value following a cast, the
generated IR must fence with the underlying environment in an implementation
defined manner. (In practice, this tends to require ``noinline`` routines for
such operations.)

From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
non-integral types are analogous to ones on integral types with one
"unstable" pointer types are analogous to ones on integral types with one
key exception: the optimizer may not, in general, insert new dynamic
occurrences of such casts. If a new cast is inserted, the optimizer would
need to either ensure that a) all possible values are valid, or b)
appropriate fencing is inserted. Since the appropriate fencing is
implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
such as :ref:`llvm.ptrmask<int_ptrmask>`.

The alignment information provided by the frontend for a non-integral pointer
The alignment information provided by the frontend for an "unstable" pointer
(typically using attributes or metadata) must be valid for every possible
representation of the pointer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is non-integral the right term for something that is more than just an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard - I kept this pre-existing name since it can also be interpreted as not just an integer, i.e. it can be anything else (such as integer+metadata).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just to toss out a drive-by name suggestion (though I'm fine with keeping non-integral): how about "annotated" pointers? That is, the pointer does (without unstable) have a fixed representation and point to some address, but there are bits in that representation that "annotate" the address, and so inttoptr(ptrtoint(v) + x) ??= gep i8, v, x

Non-integral pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers are not represented as just an address, but may instead include
additional metadata such as bounds information or a temporal identifier.
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
32-bit offset, or CHERI capabilities that contain bounds, permissions and a
type field (as well as an out-of-band validity bit, see next section).

In most cases pointers with a non-integral representation behave exactly the
same as an integral pointer, the only difference is that it is not possible to
create a pointer just from an address unless all the metadata bits were
also recreated correctly.

"Non-integral" pointers also impose restrictions on transformation passes, but
in general these are less restrictive than for "unstable" pointers. The main
difference compared to integral pointers is that the address width of a
non-integral pointer is not equal to the bitwise representation, so extracting
the address needs to truncate to the index width of the pointer.

Note: Currently all supported targets require that truncating the ``ptrtoint``
result to address width yields the memory address of the pointer but this may
not hold for all future targets so optimizations should not rely on this.

Unlike "unstable" pointers, the bit-wise representation is stable and
``ptrtoint(x)`` always yields a deterministic value.
This means transformation passes are still permitted to insert new ``ptrtoint``
instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Wait, hold on, I thought one of the firmer outcomes of the big ptrtoint semantics thread is that ptrtoint is definitionally the same as a type-punned store + load

That is

%y = ptrtoint ptr addrspace(N) %x to i[ptrsize(N)]

is exactly

%m = alloca i[ptrmemsize(N)]
store ptr addrspace(N) %x, ptr %m
%y = load i[ptrsize(N)], ptr %m

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apologies for the delay here - I need to get around to rebasing my changes on top of the outcome of the discussion. I hope to have something next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, just wanted to flag that

(Also, re that discussion - it might be good to get your thoughts on the ptrtoaddr - and in particular, ptrtoaddr as inverse of GEP - formulation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, hopefully all issues resolved in the new wording.

Non-integral pointers with external state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A special case of non-integral pointers is ones that include external state
(such as implicit bounds information or a type tag) with a target-defined size.
An example of such a type is a CHERI capability, where there is an additional
validity bit that is part of all pointer-typed registers, but is located in
memory at an implementation-defined address separate from the pointer itself.
Another example would be a fat-pointer scheme where pointers remain plain
integers, but the associated bounds are stored in an out-of-band table.

The following restrictions apply to IR level optimization passes:

The ``inttoptr`` instruction does not recreate the external state and therefore
it is target dependent whether it can be used to create a dereferenceable
pointer. In general passes should assume that the result of such an inttoptr
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
yield a capability with the external state (the validity tag bit) set to zero,
which will cause any dereference to trap.
The ``ptrtoint`` instruction also only returns the "in-band" state and omits
all external state.
These two properties mean that ``inttoptr(ptrtoint(x))`` cannot be folded to
``x`` since the ``ptrtoint`` operation does not include the external state
needed to reconstruct the original pointer and ``inttoptr`` cannot set it.

When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
is performed, the external metadata is also stored to an implementation-defined
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
external metadata and make it available for all uses of ``%val``.
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
external state. This is essential to allow frontends to efficiently emit copies
of structures containing such pointers, since expanding all these copies as
individual loads and stores would affect compilation speed and inhibit
optimizations.

Notionally, these external bits are part of the pointer, but since
``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
and the external bits are not explicitly exposed, they are not included in the
size specified in the :ref:`datalayout string<langref_datalayout>`.

When a pointer type has external state, all roundtrips via memory must
be performed as loads and stores of the correct type since stores of other
types may not propagate the external data.
Therefore it is not legal to convert an existing load/store (or a ``llvm.memcpy`` /
``llvm.memmove`` intrinsic) of pointer types with external state to a load/store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy of pointer types feels a bit weird

Is this basically saying you're now allowed to ever split a copy into smaller copies because it might contain a pointer with external state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Or at least on an architecture with at least one e pointer (which might mean we'll want a function for .mayHaveExternalPointers() on DataLayout or something

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Though if you somehow know (whether by analysis or by frontend-provided metadata due to language semantics, like strict aliasing in C) that there are no pointers in a given range then you can still perform that optimisation. Maybe worth adding a throwaway "unless it is known no pointers with external state are present in the source"?

Also you can split into loads and stores of the pointer with external state (if there is just one such type). Just not a type that won't preserve the external state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

I'm just worried that this forbids memcpy() expansion

Like, it's weird to be in a situation where you can't unconditionally pessimize memcpy(it* dst, i8* src, usize len) to

for(usize i = 0; i < len; ++i) { 
  *(dst++) = *(src++);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is somewhat annoying. In the downstream CHERI LLVM forks, we included an attribute on memcpy/memmove that says "this copy does not contain capabilities" and then it can be expanded to integer loads/stores stores in IR/the backend.

of an integer type with same bitwidth, as that may drop the external state.


.. _globalvars:

Global Variables
Expand Down Expand Up @@ -3167,8 +3255,8 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
``p[n]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``n``.
``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``as``.
The ``<size>`` parameter specifies the size of the bitwise representation.
For :ref:`non-integral pointers <nointptrtype>` the representation size may
be larger than the address width of the underlying address space (e.g. to
Expand All @@ -3181,9 +3269,14 @@ as follows:
default index size is equal to the pointer size.
The index size also specifies the width of addresses in this address space.
All sizes are in bits.
The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^24).
The address space, ``<as>``, is optional, and if not specified, denotes the
default address space 0. The value of ``<as>`` must be in the range [1,2^24).
The optional ``<flags>`` are used to specify properties of pointers in this
address space: the character ``u`` marks pointers as having an unstable
representation, ``n`` marks pointers as non-integral (i.e. having
additional metadata), ``e`` marks pointers having external state
(``n`` must also be set). See :ref:`Non-Integral Pointer Types <nointptrtype>`.

``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
Expand Down Expand Up @@ -3236,9 +3329,11 @@ as follows:
this set are considered to support most general arithmetic operations
efficiently.
``ni:<address space0>:<address space1>:<address space2>...``
This specifies pointer types with the specified address spaces
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
address space cannot be specified as non-integral.
This marks pointer types with the specified address spaces
as :ref:`non-integral and unstable <nointptrtype>`.
The ``0`` address space cannot be specified as non-integral.
It is only supported for backwards compatibility, the flags of the ``p``
specifier should be used instead for new code.

``<abi>`` is a lower bound on what is required for a type to be considered
aligned. This is used in various places, such as:
Expand Down
103 changes: 92 additions & 11 deletions llvm/include/llvm/IR/DataLayout.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,19 @@ class DataLayout {
Align PrefAlign;
uint32_t IndexBitWidth;
/// Pointers in this address space don't have a well-defined bitwise
/// representation (e.g. may be relocated by a copying garbage collector).
/// Additionally, they may also be non-integral (i.e. containing additional
/// metadata such as bounds information/permissions).
bool IsNonIntegral;
/// representation (e.g. they may be relocated by a copying garbage
/// collector and thus have different addresses at different times).
bool HasUnstableRepresentation;
/// Pointers in this address space are non-integral, i.e. don't have a
/// integer representation that simply maps to the address. An example of
/// this would be e.g. AMDGPU buffer fat pointers with bounds information
/// and various flags or CHERI capabilities that contain bounds+permissions.
bool HasNonIntegralRepresentation;
/// Pointers in this address space have additional state bits that are
/// located at a target-defined location when stored in memory. An example
/// of this would be CHERI capabilities where the validity bit is stored
/// separately from the pointer address+bounds information.
bool HasExternalState;
LLVM_ABI bool operator==(const PointerSpec &Other) const;
};

Expand Down Expand Up @@ -149,7 +158,8 @@ class DataLayout {
/// Sets or updates the specification for pointer in the given address space.
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
Align PrefAlign, uint32_t IndexBitWidth,
bool IsNonIntegral);
bool HasUnstableRepr, bool HasNonIntegralRepr,
bool HasExternalState);

/// Internal helper to get alignment for integer of given bitwidth.
LLVM_ABI Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
Expand Down Expand Up @@ -357,30 +367,101 @@ class DataLayout {
/// \sa DataLayout::getAddressSizeInBits
unsigned getAddressSize(unsigned AS) const { return getIndexSize(AS); }

/// Return the address spaces containing non-integral pointers. Pointers in
/// this address space don't have a well-defined bitwise representation.
SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
/// Return the address spaces with special pointer semantics (such as being
/// unstable or non-integral).
SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
SmallVector<unsigned, 8> AddrSpaces;
for (const PointerSpec &PS : PointerSpecs) {
if (PS.IsNonIntegral)
if (PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation ||
PS.HasExternalState)
AddrSpaces.push_back(PS.AddrSpace);
}
return AddrSpaces;
}

/// Returns whether this address space is "non-integral" and "unstable".
/// This means that passes should not introduce inttoptr or ptrtoint
/// instructions operating on pointers of this address space.
/// TODO: remove this function after migrating to finer-grained properties.
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).IsNonIntegral;
return hasUnstableRepresentation(AddrSpace) ||
hasNonIntegralRepresentation(AddrSpace);
}

/// Returns whether this address space has an "unstable" pointer
/// representation. The bitwise pattern of such pointers is allowed to change
/// in a target-specific way. For example, this could be used for copying
/// garbage collection where the garbage collector could update the pointer
/// value as part of the collection sweep.
bool hasUnstableRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasUnstableRepresentation;
}

/// Returns whether this address space has a non-integral pointer
/// representation, i.e. the pointer is not just an integer address but some
/// other bitwise representation. Examples include AMDGPU buffer descriptors
/// with a 128-bit fat pointer and a 32-bit offset or CHERI capabilities that
/// contain bounds, permissions and an out-of-band validity bit. In general,
/// these pointers cannot be re-created from just an integer value.
bool hasNonIntegralRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasNonIntegralRepresentation;
}

/// Returns whether this address space has external state (implies being
/// a non-integral pointer representation).
/// These pointer types must be loaded and stored using appropriate
/// instructions and cannot use integer loads/stores as this would not
/// propagate the out-of-band state. An example of such a pointer type is a
/// CHERI capability that contain bounds, permissions and an out-of-band
/// validity bit that is invalidated whenever an integer/FP store is performed
/// to the associated memory location.
bool hasExternalState(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasExternalState;
}

/// Returns whether passes should avoid introducing `inttoptr` instructions
/// for this address space.
///
/// This is currently the case for non-integral pointer representations with
/// external state (hasExternalState()) since `inttoptr` cannot recreate the
/// external state bits.
/// New `inttoptr` instructions should also be avoided for "unstable" bitwise
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
}

/// Returns whether passes should avoid introducing `ptrtoint` instructions
/// for this address space.
///
/// This is currently the case for pointer address spaces that have an
/// "unstable" representation (hasUnstableRepresentation()) since the
/// bitwise pattern of such pointers could change unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidPtrToInt(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace);
}

bool isNonIntegralPointerType(PointerType *PT) const {
return isNonIntegralAddressSpace(PT->getAddressSpace());
}

bool isNonIntegralPointerType(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty);
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && isNonIntegralPointerType(PTy);
}

bool shouldAvoidPtrToInt(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && shouldAvoidPtrToInt(PTy->getPointerAddressSpace());
}

bool shouldAvoidIntToPtr(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && shouldAvoidIntToPtr(PTy->getPointerAddressSpace());
}

/// The size in bits of the pointer representation in a given address space.
/// This is not necessarily the same as the integer address of a pointer (e.g.
/// for fat pointers).
Expand Down
Loading
Loading