@@ -660,19 +660,60 @@ Non-Integral Pointer Type
660660Note: non-integral pointer types are a work in progress, and they should be
661661considered experimental at this time.
662662
663- LLVM IR optionally allows the frontend to denote pointers in certain address
664- spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
665- Non-integral pointer types represent pointers that have an *unspecified* bitwise
666- representation; that is, the integral representation may be target dependent or
667- unstable (not backed by a fixed integer).
663+ For most targets, the pointer representation is a direct mapping from the
664+ bitwise representation to the address of the underlying memory location.
665+ Such pointers are considered "integral", and any pointers where the
666+ representation is not just an integer address are called "non-integral".
667+ 
668+ Non-integral pointers have at least one of the following three properties:
669+ 
670+ * the pointer representation contains non-address bits
671+ * the pointer representation is unstable (may changed at any time in a
672+   target-specific way)
673+ * the pointer representation has external state
674+ 
675+ These properties (or combinations thereof) can be applied to pointers via the
676+ :ref:`datalayout string<langref_datalayout>`.
677+ 
678+ The exact implications of these properties are target-specific. The following
679+ subsections describe the IR semantics and restrictions to optimization passes
680+ for each of these properties.
681+ 
682+ Pointers with non-address bits
683+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
684+ 
685+ Pointers in this address space have a bitwise representation that not only
686+ has address bits, but also some other target-specific metadata.
687+ In most cases pointers with non-address bits behave exactly the same as
688+ integral pointers, the only difference is that it is not possible to create a
689+ pointer just from an address unless all the non-address bits are also recreated
690+ correctly in a target-specific way.
691+ 
692+ An example of pointers with non-address bits are the AMDGPU buffer descriptors
693+ which are 160 bits: a 128-bit fat pointer and a 32-bit offset.
694+ Similarly, CHERI capabilities contain a 32 or 64 bit address as well as the
695+ same number of metadata bits, but unlike the AMDGPU buffer descriptors they have
696+ external state in addition to non-address bits.
697+ 
698+ 
699+ Unstable pointer representation
700+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
701+ 
702+ Pointers in this address space have an *unspecified* bitwise representation
703+ (i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
704+ allowed to change in a target-specific way. For example, this could be a pointer
705+ type used with copying garbage collection where the garbage collector could
706+ update the pointer at any time in the collection sweep.
668707
669708``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
670709integral (i.e., normal) pointers in that they convert integers to and from
671- corresponding pointer types, but there are additional implications to be
672- aware of.  Because the bit-representation of a non-integral pointer may
673- not be stable, two identical casts of the same operand may or may not
710+ corresponding pointer types, but there are additional implications to be aware
711+ of.
712+ 
713+ For "unstable" pointer representations, the bit-representation of the pointer
714+ may not be stable, so two identical casts of the same operand may or may not
674715return the same value.  Said differently, the conversion to or from the
675- non-integral  type depends on environmental state in an implementation
716+ "unstable" pointer  type depends on environmental state in an implementation
676717defined manner.
677718
678719If the frontend wishes to observe a *particular* value following a cast, the
@@ -681,21 +722,72 @@ defined manner. (In practice, this tends to require ``noinline`` routines for
681722such operations.)
682723
683724From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
684- non-integral  types are analogous to ones on integral types with one
725+ "unstable" pointer  types are analogous to ones on integral types with one
685726key exception: the optimizer may not, in general, insert new dynamic
686727occurrences of such casts.  If a new cast is inserted, the optimizer would
687728need to either ensure that a) all possible values are valid, or b)
688729appropriate fencing is inserted.  Since the appropriate fencing is
689730implementation defined, the optimizer can't do the latter.  The former is
690731challenging as many commonly expected properties, such as
691- ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral  types.
732+ ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer  types.
692733Similar restrictions apply to intrinsics that might examine the pointer bits,
693734such as :ref:`llvm.ptrmask<int_ptrmask>`.
694735
695- The alignment information provided by the frontend for a non-integral  pointer
736+ The alignment information provided by the frontend for an "unstable"  pointer
696737(typically using attributes or metadata) must be valid for every possible
697738representation of the pointer.
698739
740+ Pointers with external state
741+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
742+ 
743+ A further special case of non-integral pointers is ones that include external
744+ state (such as bounds information or a type tag) with a target-defined size.
745+ An example of such a type is a CHERI capability, where there is an additional
746+ validity bit that is part of all pointer-typed registers, but is located in
747+ memory at an implementation-defined address separate from the pointer itself.
748+ Another example would be a fat-pointer scheme where pointers remain plain
749+ integers, but the associated bounds are stored in an out-of-band table.
750+ 
751+ Unless also marked as "unstable", the bit-wise representation of pointers with
752+ external state is stable and ``ptrtoint(x)`` always yields a deterministic
753+ value. This means transformation passes are still permitted to insert new
754+ ``ptrtoint`` instructions.
755+ 
756+ The following restrictions apply to IR level optimization passes:
757+ 
758+ The ``inttoptr`` instruction does not recreate the external state and therefore
759+ it is target dependent whether it can be used to create a dereferenceable
760+ pointer. In general passes should assume that the result of such an inttoptr
761+ is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
762+ yield a capability with the external state (the validity tag bit) set to zero,
763+ which will cause any dereference to trap.
764+ The ``ptrtoint`` instruction also only returns the "in-band" state and omits
765+ all external  state.
766+ 
767+ When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
768+ is performed, the external metadata is also stored to an implementation-defined
769+ location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
770+ external metadata and make it available for all uses of ``%val``.
771+ Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
772+ external state. This is essential to allow frontends to efficiently emit copies
773+ of structures containing such pointers, since expanding all these copies as
774+ individual loads and stores would affect compilation speed and inhibit
775+ optimizations.
776+ 
777+ Notionally, these external bits are part of the pointer, but since
778+ ``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
779+ and the external bits are not explicitly exposed, they are not included in the
780+ size specified in the :ref:`datalayout string<langref_datalayout>`.
781+ 
782+ When a pointer type has external state, all roundtrips via memory must
783+ be performed as loads and stores of the correct type since stores of other
784+ types may not propagate the external data.
785+ Therefore it is not legal to convert an existing load/store (or a
786+ ``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
787+ state to a load/store of an integer type with same bitwidth, as that may drop
788+ the external state.
789+ 
790+ 
699791.. _globalvars:
700792
701793Global Variables
@@ -3179,8 +3271,8 @@ as follows:
31793271``A<address space>``
31803272    Specifies the address space of objects created by '``alloca``'.
31813273    Defaults to the default address space of 0.
3182- ``p[n ]:<size>:<abi>[:<pref>[:<idx>]]``
3183-     This specifies the properties of a pointer in address space ``n ``.
3274+ ``p[<flags>][<as> ]:<size>:<abi>[:<pref>[:<idx>]]``
3275+     This specifies the properties of a pointer in address space ``as ``.
31843276    The ``<size>`` parameter specifies the size of the bitwise representation.
31853277    For :ref:`non-integral pointers <nointptrtype>` the representation size may
31863278    be larger than the address width of the underlying address space (e.g. to
@@ -3193,9 +3285,13 @@ as follows:
31933285    default index size is equal to the pointer size.
31943286    The index size also specifies the width of addresses in this address space.
31953287    All sizes are in bits.
3196-     The address space, ``n``, is optional, and if not specified,
3197-     denotes the default address space 0. The value of ``n`` must be
3198-     in the range [1,2^24).
3288+     The address space, ``<as>``, is optional, and if not specified, denotes the
3289+     default address space 0. The value of ``<as>`` must be in the range [1,2^24).
3290+     The optional ``<flags>`` are used to specify properties of pointers in this
3291+     address space: the character ``u`` marks pointers as having an unstable
3292+     representation, and ``e`` marks pointers having external state. See
3293+     :ref:`Non-Integral Pointer Types <nointptrtype>`.
3294+ 
31993295``i<size>:<abi>[:<pref>]``
32003296    This specifies the alignment for an integer type of a given bit
32013297    ``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -3248,9 +3344,11 @@ as follows:
32483344    this set are considered to support most general arithmetic operations
32493345    efficiently.
32503346``ni:<address space0>:<address space1>:<address space2>...``
3251-     This specifies pointer types with the specified address spaces
3252-     as :ref:`Non-Integral Pointer Type <nointptrtype>` s.  The ``0``
3253-     address space cannot be specified as non-integral.
3347+     This marks pointer types with the specified address spaces
3348+     as :ref:`unstable <nointptrtype>`.
3349+     The ``0`` address space cannot be specified as non-integral.
3350+     It is only supported for backwards compatibility, the flags of the ``p``
3351+     specifier should be used instead for new code.
32543352
32553353``<abi>`` is a lower bound on what is required for a type to be considered
32563354aligned. This is used in various places, such as:
@@ -31402,4 +31500,3 @@ Semantics:
3140231500
3140331501The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
3140431502as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.
31405- 
0 commit comments