-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[DataLayout][LangRef] Split non-integral and unstable pointer properties #105735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/arichardson/spr/main.datalayoutlangref-split-non-integral-and-unstable-pointer-properties
Are you sure you want to change the base?
Changes from all commits
90068f1
e4bd118
35afb97
db97145
94ecfa3
278ce21
df9bdfe
7615db9
142a3ff
bdb6acc
de449dd
2c49735
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -650,48 +650,136 @@ literal types are uniqued in recent versions of LLVM. | |
|
||
.. _nointptrtype: | ||
|
||
Non-Integral Pointer Type | ||
------------------------- | ||
Non-Integral and Unstable Pointer Types | ||
--------------------------------------- | ||
|
||
Note: non-integral pointer types are a work in progress, and they should be | ||
considered experimental at this time. | ||
Note: non-integral/unstable pointer types are a work in progress, and they | ||
should be considered experimental at this time. | ||
|
||
LLVM IR optionally allows the frontend to denote pointers in certain address | ||
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`. | ||
Non-integral pointer types represent pointers that have an *unspecified* bitwise | ||
representation; that is, the integral representation may be target dependent or | ||
unstable (not backed by a fixed integer). | ||
spaces as "unstable", "non-integral", or "non-integral with external state" | ||
(or combinations of these) via the :ref:`datalayout string<langref_datalayout>`. | ||
|
||
The exact implications of these properties are target-specific, but the | ||
following IR semantics and restrictions to optimization passes apply: | ||
|
||
Unstable pointer representation | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Pointers in this address space have an *unspecified* bitwise representation | ||
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is | ||
allowed to change in a target-specific way. For example, this could be a pointer | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
type used with copying garbage collection where the garbage collector could | ||
update the pointer at any time in the collection sweep. | ||
|
||
``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for | ||
integral (i.e. normal) pointers in that they convert integers to and from | ||
corresponding pointer types, but there are additional implications to be | ||
aware of. Because the bit-representation of a non-integral pointer may | ||
not be stable, two identical casts of the same operand may or may not | ||
corresponding pointer types, but there are additional implications to be aware | ||
of. | ||
|
||
For "unstable" pointer representations, the bit-representation of the pointer | ||
may not be stable, so two identical casts of the same operand may or may not | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this applies to only an SSA value of an unstable pointer type? What about an in-memory value with the unstable pointer type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not familiar with how GC pointers are used in LLVM, I just tried to split out the existing "copying GC" non-integral pointers properties into a separate property to allow for "fat pointers", CHERI capabilities, etc to use non-integral pointers without incurring all the restrictions imposed by GC pointers. Not sure who is best to comment on this, probably someone from azul who has worked on it recently. |
||
return the same value. Said differently, the conversion to or from the | ||
non-integral type depends on environmental state in an implementation | ||
"unstable" pointer type depends on environmental state in an implementation | ||
defined manner. | ||
|
||
If the frontend wishes to observe a *particular* value following a cast, the | ||
generated IR must fence with the underlying environment in an implementation | ||
defined manner. (In practice, this tends to require ``noinline`` routines for | ||
such operations.) | ||
|
||
From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for | ||
non-integral types are analogous to ones on integral types with one | ||
"unstable" pointer types are analogous to ones on integral types with one | ||
key exception: the optimizer may not, in general, insert new dynamic | ||
occurrences of such casts. If a new cast is inserted, the optimizer would | ||
need to either ensure that a) all possible values are valid, or b) | ||
appropriate fencing is inserted. Since the appropriate fencing is | ||
implementation defined, the optimizer can't do the latter. The former is | ||
challenging as many commonly expected properties, such as | ||
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types. | ||
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types. | ||
Similar restrictions apply to intrinsics that might examine the pointer bits, | ||
such as :ref:`llvm.ptrmask<int_ptrmask>`. | ||
|
||
The alignment information provided by the frontend for a non-integral pointer | ||
The alignment information provided by the frontend for an "unstable" pointer | ||
(typically using attributes or metadata) must be valid for every possible | ||
representation of the pointer. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is non-integral the right term for something that is more than just an integer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Naming is hard - I kept this pre-existing name since it can also be interpreted as not just an integer, i.e. it can be anything else (such as integer+metadata). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, just to toss out a drive-by name suggestion (though I'm fine with keeping non-integral): how about "annotated" pointers? That is, the pointer does (without unstable) have a fixed representation and point to some address, but there are bits in that representation that "annotate" the address, and so |
||
Non-integral pointer representation | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Pointers are not represented as just an address, but may instead include | ||
additional metadata such as bounds information or a temporal identifier. | ||
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
32-bit offset, or CHERI capabilities that contain bounds, permissions and a | ||
type field (as well as an out-of-band validity bit, see next section). | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In most cases pointers with a non-integral representation behave exactly the | ||
same as an integral pointer, the only difference is that it is not possible to | ||
create a pointer just from an address unless all the metadata bits were | ||
also recreated correctly. | ||
|
||
"Non-integral" pointers also impose restrictions on transformation passes, but | ||
in general these are less restrictive than for "unstable" pointers. The main | ||
difference compared to integral pointers is that the address width of a | ||
non-integral pointer is not equal to the bitwise representation, so extracting | ||
the address needs to truncate to the index width of the pointer. | ||
|
||
Note: Currently all supported targets require that truncating the ``ptrtoint`` | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result to address width yields the memory address of the pointer but this may | ||
not hold for all future targets so optimizations should not rely on this. | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Unlike "unstable" pointers, the bit-wise representation is stable and | ||
``ptrtoint(x)`` always yields a deterministic value. | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This means transformation passes are still permitted to insert new ``ptrtoint`` | ||
instructions. | ||
|
||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... Wait, hold on, I thought one of the firmer outcomes of the big ptrtoint semantics thread is that That is
is exactly
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, apologies for the delay here - I need to get around to rebasing my changes on top of the outcome of the discussion. I hope to have something next week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, just wanted to flag that (Also, re that discussion - it might be good to get your thoughts on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, hopefully all issues resolved in the new wording. |
||
Non-integral pointers with external state | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
A special case of non-integral pointers is ones that include external state | ||
(such as implicit bounds information or a type tag) with a target-defined size. | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
An example of such a type is a CHERI capability, where there is an additional | ||
validity bit that is part of all pointer-typed registers, but is located in | ||
memory at an implementation-defined address separate from the pointer itself. | ||
Another example would be a fat-pointer scheme where pointers remain plain | ||
integers, but the associated bounds are stored in an out-of-band table. | ||
|
||
The following restrictions apply to IR level optimization passes: | ||
|
||
The ``inttoptr`` instruction does not recreate the external state and therefore | ||
it is target dependent whether it can be used to create a dereferenceable | ||
pointer. In general passes should assume that the result of such an inttoptr | ||
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will | ||
yield a capability with the external state (the validity tag bit) set to zero, | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
which will cause any dereference to trap. | ||
The ``ptrtoint`` instruction also only returns the "in-band" state and omits | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
all external state. | ||
These two properties mean that ``inttoptr(ptrtoint(x))`` cannot be folded to | ||
``x`` since the ``ptrtoint`` operation does not include the external state | ||
needed to reconstruct the original pointer and ``inttoptr`` cannot set it. | ||
|
||
When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is performed, the external metadata is also stored to an implementation-defined | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the | ||
external metadata and make it available for all uses of ``%val``. | ||
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the | ||
external state. This is essential to allow frontends to efficiently emit copies | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
of structures containing such pointers, since expanding all these copies as | ||
individual loads and stores would affect compilation speed and inhibit | ||
optimizations. | ||
|
||
Notionally, these external bits are part of the pointer, but since | ||
``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer | ||
and the external bits are not explicitly exposed, they are not included in the | ||
arichardson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
size specified in the :ref:`datalayout string<langref_datalayout>`. | ||
|
||
When a pointer type has external state, all roundtrips via memory must | ||
be performed as loads and stores of the correct type since stores of other | ||
types may not propagate the external data. | ||
Therefore it is not legal to convert an existing load/store (or a ``llvm.memcpy`` / | ||
``llvm.memmove`` intrinsic) of pointer types with external state to a load/store | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is this basically saying you're now allowed to ever split a copy into smaller copies because it might contain a pointer with external state? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... Or at least on an architecture with at least one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Though if you somehow know (whether by analysis or by frontend-provided metadata due to language semantics, like strict aliasing in C) that there are no pointers in a given range then you can still perform that optimisation. Maybe worth adding a throwaway "unless it is known no pointers with external state are present in the source"? Also you can split into loads and stores of the pointer with external state (if there is just one such type). Just not a type that won't preserve the external state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it I'm just worried that this forbids memcpy() expansion Like, it's weird to be in a situation where you can't unconditionally pessimize memcpy(it* dst, i8* src, usize len) to
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah this is somewhat annoying. In the downstream CHERI LLVM forks, we included an attribute on memcpy/memmove that says "this copy does not contain capabilities" and then it can be expanded to integer loads/stores stores in IR/the backend. |
||
of an integer type with same bitwidth, as that may drop the external state. | ||
|
||
|
||
.. _globalvars: | ||
|
||
Global Variables | ||
|
@@ -3167,8 +3255,8 @@ as follows: | |
``A<address space>`` | ||
Specifies the address space of objects created by '``alloca``'. | ||
Defaults to the default address space of 0. | ||
``p[n]:<size>:<abi>[:<pref>[:<idx>]]`` | ||
This specifies the properties of a pointer in address space ``n``. | ||
``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]`` | ||
This specifies the properties of a pointer in address space ``as``. | ||
The ``<size>`` parameter specifies the size of the bitwise representation. | ||
For :ref:`non-integral pointers <nointptrtype>` the representation size may | ||
be larger than the address width of the underlying address space (e.g. to | ||
|
@@ -3181,9 +3269,14 @@ as follows: | |
default index size is equal to the pointer size. | ||
The index size also specifies the width of addresses in this address space. | ||
All sizes are in bits. | ||
The address space, ``n``, is optional, and if not specified, | ||
denotes the default address space 0. The value of ``n`` must be | ||
in the range [1,2^24). | ||
The address space, ``<as>``, is optional, and if not specified, denotes the | ||
default address space 0. The value of ``<as>`` must be in the range [1,2^24). | ||
The optional ``<flags>`` are used to specify properties of pointers in this | ||
address space: the character ``u`` marks pointers as having an unstable | ||
representation, ``n`` marks pointers as non-integral (i.e. having | ||
additional metadata), ``e`` marks pointers having external state | ||
(``n`` must also be set). See :ref:`Non-Integral Pointer Types <nointptrtype>`. | ||
|
||
``i<size>:<abi>[:<pref>]`` | ||
This specifies the alignment for an integer type of a given bit | ||
``<size>``. The value of ``<size>`` must be in the range [1,2^24). | ||
|
@@ -3236,9 +3329,11 @@ as follows: | |
this set are considered to support most general arithmetic operations | ||
efficiently. | ||
``ni:<address space0>:<address space1>:<address space2>...`` | ||
This specifies pointer types with the specified address spaces | ||
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0`` | ||
address space cannot be specified as non-integral. | ||
This marks pointer types with the specified address spaces | ||
as :ref:`non-integral and unstable <nointptrtype>`. | ||
The ``0`` address space cannot be specified as non-integral. | ||
It is only supported for backwards compatibility, the flags of the ``p`` | ||
specifier should be used instead for new code. | ||
|
||
``<abi>`` is a lower bound on what is required for a type to be considered | ||
aligned. This is used in various places, such as: | ||
|
Uh oh!
There was an error while loading. Please reload this page.