Skip to content

[DataLayout][LangRef] Split non-integral and unstable pointer properties #105735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: users/arichardson/spr/main.datalayoutlangref-split-non-integral-and-unstable-pointer-properties
Choose a base branch
from
Open
106 changes: 83 additions & 23 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -649,48 +649,99 @@ literal types are uniqued in recent versions of LLVM.

.. _nointptrtype:

Non-Integral Pointer Type
-------------------------
Non-Integral and Unstable Pointer Types
---------------------------------------

Note: non-integral pointer types are a work in progress, and they should be
considered experimental at this time.
Note: non-integral/unstable pointer types are a work in progress, and they
should be considered experimental at this time.

LLVM IR optionally allows the frontend to denote pointers in certain address
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
Non-integral pointer types represent pointers that have an *unspecified* bitwise
representation; that is, the integral representation may be target dependent or
unstable (not backed by a fixed integer).
spaces as "non-integral" or "unstable" (or both "non-integral" and "unstable")
via the :ref:`datalayout string<langref_datalayout>`.

The exact implications of these properties are target-specific, but the
following IR semantics and restrictions to optimization passes apply:

Unstable pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers in this address space have an *unspecified* bitwise representation
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
allowed to change in a target-specific way. For example, this could be a pointer
type used with copying garbage collection where the garbage collector could
update the pointer at any time in the collection sweep.

``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
integral (i.e. normal) pointers in that they convert integers to and from
corresponding pointer types, but there are additional implications to be
aware of. Because the bit-representation of a non-integral pointer may
not be stable, two identical casts of the same operand may or may not
corresponding pointer types, but there are additional implications to be aware
of.

For "unstable" pointer representations, the bit-representation of the pointer
may not be stable, so two identical casts of the same operand may or may not
return the same value. Said differently, the conversion to or from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this applies to only an SSA value of an unstable pointer type? What about an in-memory value with the unstable pointer type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with how GC pointers are used in LLVM, I just tried to split out the existing "copying GC" non-integral pointers properties into a separate property to allow for "fat pointers", CHERI capabilities, etc to use non-integral pointers without incurring all the restrictions imposed by GC pointers.

Not sure who is best to comment on this, probably someone from azul who has worked on it recently.

non-integral type depends on environmental state in an implementation
"unstable" pointer type depends on environmental state in an implementation
defined manner.

If the frontend wishes to observe a *particular* value following a cast, the
generated IR must fence with the underlying environment in an implementation
defined manner. (In practice, this tends to require ``noinline`` routines for
such operations.)

From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
non-integral types are analogous to ones on integral types with one
"unstable" pointer types are analogous to ones on integral types with one
key exception: the optimizer may not, in general, insert new dynamic
occurrences of such casts. If a new cast is inserted, the optimizer would
need to either ensure that a) all possible values are valid, or b)
appropriate fencing is inserted. Since the appropriate fencing is
implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
such as :ref:`llvm.ptrmask<int_ptrmask>`.

The alignment information provided by the frontend for a non-integral pointer
The alignment information provided by the frontend for an "unstable" pointer
(typically using attributes or metadata) must be valid for every possible
representation of the pointer.

Non-integral pointer representation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is non-integral the right term for something that is more than just an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard - I kept this pre-existing name since it can also be interpreted as not just an integer, i.e. it can be anything else (such as integer+metadata).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just to toss out a drive-by name suggestion (though I'm fine with keeping non-integral): how about "annotated" pointers? That is, the pointer does (without unstable) have a fixed representation and point to some address, but there are bits in that representation that "annotate" the address, and so inttoptr(ptrtoint(v) + x) ??= gep i8, v, x

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers are not represented as just an address, but may instead include
additional metadata such as bounds information or a temporal identifier.
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
32-bit offset, or CHERI capabilities that contain bounds, permissions and an
out-of-band validity bit. In general, valid non-integral pointers cannot be
created from just an integer value: while ``inttoptr`` yields a deterministic
bitwise pattern, the resulting value is not guaranteed to be a valid
dereferenceable pointer.

In most cases pointers with a non-integral representation behave exactly the
same as an integral pointer, the only difference is that it is not possible to
create a pointer just from an address.

"Non-integral" pointers also impose restrictions on transformation passes, but
in general these are less restrictive than for "unstable" pointers. The main
difference compared to integral pointers is that ``inttoptr`` instructions
should not be inserted by passes as they may not be able to create a valid
pointer. This property also means that ``inttoptr(ptrtoint(x))`` cannot be
folded to ``x`` as the ``ptrtoint`` operation may destroy the necessary metadata
to reconstruct the pointer.
Additionally, since there could be out-of-band state, it is also not legal to
convert a load/store of a non-integral pointer type to a load/store of an
integer type with same bitwidth, as that may not copy all the state.
However, it is legal to use appropriately-aligned ``llvm.memcpy`` and
``llvm.memmove`` for copies of non-integral pointers.
NOTE: Lowering of ``llvm.memcpy`` containing non-integral pointer types must use
appropriately-aligned and sized types instead of smaller integer types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Wait, hold on, I thought one of the firmer outcomes of the big ptrtoint semantics thread is that ptrtoint is definitionally the same as a type-punned store + load

That is

%y = ptrtoint ptr addrspace(N) %x to i[ptrsize(N)]

is exactly

%m = alloca i[ptrmemsize(N)]
store ptr addrspace(N) %x, ptr %m
%y = load i[ptrsize(N)], ptr %m

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apologies for the delay here - I need to get around to rebasing my changes on top of the outcome of the discussion. I hope to have something next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, just wanted to flag that

(Also, re that discussion - it might be good to get your thoughts on the ptrtoaddr - and in particular, ptrtoaddr as inverse of GEP - formulation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, hopefully all issues resolved in the new wording.


Unlike "unstable" pointers, the bit-wise representation is stable and
``ptrtoint(x)`` always yields a deterministic value.
This means transformation passes are still permitted to insert new ``ptrtoint``
instructions.
However, it is important to note that ``ptrtoint`` may not yield the same value
as storing the pointer via memory and reading it back as an integer, even if the
bitwidth of the two types matches (since ptrtoint could involve some form of
arithmetic or strip parts of the non-integral pointer representation).

.. _globalvars:

Global Variables
Expand Down Expand Up @@ -3082,16 +3133,21 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
``p[n]:<size>:<abi>[:<pref>][:<idx>]``
``p[<flags>][<address space>]:<size>:<abi>[:<pref>][:<idx>]``
This specifies the *size* of a pointer and its ``<abi>`` and
``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
index that used for address calculation, which must be less than or equal
to the pointer size. If not
specified, the default index size is equal to the pointer size. All sizes
are in bits. The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^24).
are in bits. The ``<address space>``, is optional, and if not specified,
denotes the default address space 0. The value of ``<address space>`` must
be in the range [1,2^24).
The optional``<flags>`` are used to specify properties of pointers in this
address space: the character ``u`` marks pointers as having an unstable
representation and ``n`` marks pointers as non-integral (i.e. having
additional metadata). See :ref:`Non-Integral Pointer Types <nointptrtype>`.

``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
Expand Down Expand Up @@ -3146,9 +3202,11 @@ as follows:
this set are considered to support most general arithmetic operations
efficiently.
``ni:<address space0>:<address space1>:<address space2>...``
This specifies pointer types with the specified address spaces
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
address space cannot be specified as non-integral.
This marks pointer types with the specified address spaces
as :ref:`non-integral and unstable <nointptrtype>`.
The ``0`` address space cannot be specified as non-integral.
It is only supported for backwards compatibility, the flags of the ``p``
specifier should be used instead for new code.

On every specification that takes a ``<abi>:<pref>``, specifying the
``<pref>`` alignment is optional. If omitted, the preceding ``:``
Expand Down Expand Up @@ -12135,6 +12193,8 @@ If ``value`` is smaller than ``ty2`` then a zero extension is done. If
``value`` is larger than ``ty2`` then a truncation is done. If they are
the same size, then nothing is done (*no-op cast*) other than a type
change.
For :ref:`non-integral pointers <nointptrtype>` the ``ptrtoint`` instruction
may involve additional transformations beyond truncations or extension.

Example:
""""""""
Expand Down
83 changes: 73 additions & 10 deletions llvm/include/llvm/IR/DataLayout.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,14 @@ class DataLayout {
Align PrefAlign;
uint32_t IndexBitWidth;
/// Pointers in this address space don't have a well-defined bitwise
/// representation (e.g. may be relocated by a copying garbage collector).
/// Additionally, they may also be non-integral (i.e. containing additional
/// metadata such as bounds information/permissions).
bool IsNonIntegral;
/// representation (e.g. they may be relocated by a copying garbage
/// collector and thus have different addresses at different times).
bool HasUnstableRepresentation;
/// Pointers in this address spacs are non-integral, i.e. don't have a
/// integer representation that simply maps to the address. An example of
/// this would be fat pointers with bounds information or CHERI capabilities
/// that include metadata as well as one out-of-band validity bit.
bool HasNonIntegralRepresentation;
bool operator==(const PointerSpec &Other) const;
};

Expand Down Expand Up @@ -148,7 +152,7 @@ class DataLayout {
/// Sets or updates the specification for pointer in the given address space.
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
Align PrefAlign, uint32_t IndexBitWidth,
bool IsNonIntegral);
bool HasUnstableRepr, bool HasNonIntegralRepr);

/// Internal helper to get alignment for integer of given bitwidth.
Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
Expand Down Expand Up @@ -337,19 +341,68 @@ class DataLayout {
/// rounded up to a whole number of bytes.
unsigned getIndexSize(unsigned AS) const;

/// Return the address spaces containing non-integral pointers. Pointers in
/// this address space don't have a well-defined bitwise representation.
SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
/// Return the address spaces with special pointer semantics (such as being
/// unstable or non-integral).
SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
SmallVector<unsigned, 8> AddrSpaces;
for (const PointerSpec &PS : PointerSpecs) {
if (PS.IsNonIntegral)
if (PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation)
AddrSpaces.push_back(PS.AddrSpace);
}
return AddrSpaces;
}

/// Returns whether this address space is "non-integral" and "unstable".
/// This means that passes should not introduce inttoptr or ptrtoint
/// instructions operating on pointers of this address space.
/// TODO: remove this function after migrating to finer-grained properties.
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).IsNonIntegral;
return hasUnstableRepresentation(AddrSpace) ||
hasNonIntegralRepresentation(AddrSpace);
}

/// Returns whether this address space has an "unstable" pointer
/// representation. The bitwise pattern of such pointers is allowed to change
/// in a target-specific way. For example, this could be used for copying
/// garbage collection where the garbage collector could update the pointer
/// value as part of the collection sweep.
bool hasUnstableRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasUnstableRepresentation;
}

/// Returns whether this address space has a non-integral pointer
/// representation, i.e. the pointer is not just an integer address but some
/// other bitwise representation. Examples include AMDGPU buffer descriptors
/// with a 128-bit fat pointer and a 32-bit offset or CHERI capabilities that
/// contain bounds, permissions and an out-of-band validity bit. In general,
/// these pointers cannot be re-created from just an integer value.
bool hasNonIntegralRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasNonIntegralRepresentation;
}

/// Returns whether passes should avoid introducing `inttoptr` instructions
/// for this address space.
///
/// This is currently the case "non-integral" pointer representations
/// (hasNonIntegralRepresentation()) since such pointers generally require
/// additional metadata beyond just an address.
/// New `inttoptr` instructions should also be avoided for "unstable" bitwise
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace) ||
hasNonIntegralRepresentation(AddrSpace);
}

/// Returns whether passes should avoid introducing `ptrtoint` instructions
/// for this address space.
///
/// This is currently the case for pointer address spaces that have an
/// "unstable" representation (hasUnstableRepresentation()) since the
/// bitwise pattern of such pointers could change unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidPtrToInt(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace);
}

bool isNonIntegralPointerType(PointerType *PT) const {
Expand All @@ -361,6 +414,16 @@ class DataLayout {
return PTy && isNonIntegralPointerType(PTy);
}

bool shouldAvoidPtrToInt(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty);
return PTy && shouldAvoidPtrToInt(PTy->getPointerAddressSpace());
}

bool shouldAvoidIntToPtr(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty);
return PTy && shouldAvoidIntToPtr(PTy->getPointerAddressSpace());
}

/// Layout pointer size, in bits
/// FIXME: The defaults need to be removed once all of
/// the backends/clients are updated.
Expand Down
38 changes: 29 additions & 9 deletions llvm/lib/IR/DataLayout.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,8 @@ bool DataLayout::PointerSpec::operator==(const PointerSpec &Other) const {
return AddrSpace == Other.AddrSpace && BitWidth == Other.BitWidth &&
ABIAlign == Other.ABIAlign && PrefAlign == Other.PrefAlign &&
IndexBitWidth == Other.IndexBitWidth &&
IsNonIntegral == Other.IsNonIntegral;
HasUnstableRepresentation == Other.HasUnstableRepresentation &&
HasNonIntegralRepresentation == Other.HasNonIntegralRepresentation;
}

namespace {
Expand Down Expand Up @@ -208,7 +209,7 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
// Default pointer type specifications.
constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
// p0:64:64:64:64
{0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
{0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false, false},
};

DataLayout::DataLayout()
Expand Down Expand Up @@ -419,9 +420,25 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {

// Address space. Optional, defaults to 0.
unsigned AddrSpace = 0;
if (!Components[0].empty())
if (Error Err = parseAddrSpace(Components[0], AddrSpace))
bool UnstableRepr = false;
bool NonIntegralRepr = false;
StringRef AddrSpaceStr = Components[0].drop_while([&](char C) {
if (C == 'n') {
NonIntegralRepr = true;
return true;
} else if (C == 'u') {
UnstableRepr = true;
return true;
}
return false;
});
if (!AddrSpaceStr.empty()) {
if (Error Err = parseAddrSpace(AddrSpaceStr, AddrSpace))
return Err;
}
if (AddrSpace == 0 && (NonIntegralRepr || UnstableRepr))
return createStringError(
"address space 0 cannot be non-integral or unstable");

// Size. Required, cannot be zero.
unsigned BitWidth;
Expand Down Expand Up @@ -455,7 +472,7 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
"index size cannot be larger than the pointer size");

setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
false);
UnstableRepr, NonIntegralRepr);
return Error::success();
}

Expand Down Expand Up @@ -631,7 +648,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
// the spec for AS0, and we then update that to mark it non-integral.
const PointerSpec &PS = getPointerSpec(AS);
setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
true);
true, true);
}

return Error::success();
Expand Down Expand Up @@ -679,17 +696,20 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {

void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
Align ABIAlign, Align PrefAlign,
uint32_t IndexBitWidth, bool IsNonIntegral) {
uint32_t IndexBitWidth, bool HasUnstableRepr,
bool HasNonIntegralRepr) {
auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
IndexBitWidth, IsNonIntegral});
IndexBitWidth, HasUnstableRepr,
HasNonIntegralRepr});
} else {
I->BitWidth = BitWidth;
I->ABIAlign = ABIAlign;
I->PrefAlign = PrefAlign;
I->IndexBitWidth = IndexBitWidth;
I->IsNonIntegral = IsNonIntegral;
I->HasUnstableRepresentation = HasUnstableRepr;
I->HasNonIntegralRepresentation = HasNonIntegralRepr;
}
}

Expand Down
Loading
Loading