@@ -318,39 +318,43 @@ Examples:
318318 %ptr = call ptr @llvm.dx.resource.getpointer.p0.tdx.TypedBuffer_v4f32_0_0_0t(
319319 target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
320320
321- 16-byte Loads, Samples, and Gathers
322- -----------------------------------
323-
324- *relevant types: TypedBuffer, CBuffer, and Textures *
325-
326- TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
327- return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
328- data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
329- it generally treats these operations as returning 4 32-bit values. For 16-bit
330- elements the values are 16-bit values, and for 64-bit values the operations
331- return 4 32-bit integers and emit further code to construct the double.
332-
333- In DXIL, these operations return `ResRet `_ and `CBufRet `_ values, are structs
334- containing 4 elements of the same type, and in the case of `ResRet ` a 5th
335- element that is used by the `CheckAccessFullyMapped `_ operation.
336-
337- In LLVM IR the intrinsics will return the contained type of the resource
338- instead. That is, ``llvm.dx.resource.load.typedbuffer `` from a
339- ``Buffer<float> `` would return a single float, from ``Buffer<float4> `` a vector
340- of 4 floats, and from ``Buffer<double2> `` a vector of two doubles, etc. The
341- operations are then expanded out to match DXIL's format during lowering.
342-
343- In order to support ``CheckAccessFullyMapped ``, we need these intrinsics to
344- return an anonymous struct with element-0 being the contained type, and
345- element-1 being the ``i1 `` result of a ``CheckAccessFullyMapped `` call. We
346- don't have a separate call to ``CheckAccessFullyMapped `` at all, since that's
347- the only operation that can possibly be done on this value. In practice this
348- may mean we insert a DXIL operation for the check when this was missing in the
349- HLSL source, but this actually matches DXC's behaviour in practice.
321+ Loads, Samples, and Gathers
322+ ---------------------------
323+
324+ *relevant types: Buffers, CBuffers, and Textures *
325+
326+ All load, sample, and gather operations in DXIL return a `ResRet `_ type, and
327+ CBuffer loads return a similar `CBufRet `_ type. These types are structs
328+ containing 4 elements of some basic type, and in the case of `ResRet ` a 5th
329+ element that is used by the `CheckAccessFullyMapped `_ operation. Some of these
330+ operations, like `RawBufferLoad `_ include a mask and/or alignment that tell us
331+ some information about how to interpret those four values.
332+
333+ In the LLVM IR representations of these operations we instead return scalars or
334+ vectors, but we keep the requirement that we only return up to 4 elements of a
335+ basic type. This avoids some unnecessary casting and structure manipulation in
336+ the intermediate format while also keeping lowering to DXIL straightforward.
337+
338+ LLVM intrinsics that map to operations returning `ResRet ` return an anonymous
339+ struct with element-0 being the scalar or vector type, and element-1 being the
340+ ``i1 `` result of a ``CheckAccessFullyMapped `` call. We don't have a separate
341+ call to ``CheckAccessFullyMapped `` at all, since that's the only operation that
342+ can possibly be done on this value. In practice this may mean we insert a DXIL
343+ operation for the check when this was missing in the HLSL source, but this
344+ actually matches DXC's behaviour in practice.
345+
346+ For TypedBuffer and Texture, we map directly from the contained type of the
347+ resource to the return value of the intrinsic. Since these resources are
348+ constrained to contain only scalars and vectors of up to 4 elements, the
349+ lowering to DXIL ops is generally straightforward. The one exception we have
350+ here is that `double ` types in the elements are special - these are allowed in
351+ the LLVM intrinsics, but are lowered to pairs of `i32 ` followed by
352+ ``MakeDouble `` operations for DXIL.
350353
351354.. _ResRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
352355.. _CBufRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
353356.. _CheckAccessFullyMapped : https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
357+ .. _RawBufferLoad : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
354358
355359.. list-table :: ``@llvm.dx.resource.load.typedbuffer``
356360 :header-rows: 1
@@ -392,6 +396,101 @@ Examples:
392396 @llvm.dx.resource.load.typedbuffer.v2f64.tdx.TypedBuffer_v2f64_0_0t(
393397 target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
394398
399+ For RawBuffer, an HLSL load operation may return an arbitrarily sized result,
400+ but we still constrain the LLVM intrinsic to return only up to 4 elements of a
401+ basic type. This means that larger loads are represented as a series of loads,
402+ which matches DXIL. Unlike in the `RawBufferLoad `_ operation, we do not need
403+ arguments for the mask/type size and alignment, since we can calculate these
404+ from the return type of the load during lowering.
405+
406+ .. _RawBufferLoad : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#rawbufferload
407+
408+ .. list-table :: ``@llvm.dx.resource.load.rawbuffer``
409+ :header-rows: 1
410+
411+ * - Argument
412+ -
413+ - Type
414+ - Description
415+ * - Return value
416+ -
417+ - A structure of a scalar or vector and the check bit
418+ - The data loaded from the buffer and the check bit
419+ * - ``%buffer ``
420+ - 0
421+ - ``target(dx.RawBuffer, ...) ``
422+ - The buffer to load from
423+ * - ``%index ``
424+ - 1
425+ - ``i32 ``
426+ - Index into the buffer
427+ * - ``%offset ``
428+ - 2
429+ - ``i32 ``
430+ - Offset into the structure at the given index
431+
432+ Examples:
433+
434+ .. code-block :: llvm
435+
436+ ; float
437+ %ret = call {float, i1}
438+ @llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_f32_0_0_0t(
439+ target("dx.RawBuffer", float, 0, 0, 0) %buffer,
440+ i32 %index,
441+ i32 0)
442+ %ret = call {float, i1}
443+ @llvm.dx.resource.load.rawbuffer.f32.tdx.RawBuffer_i8_0_0_0t(
444+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
445+ i32 %byte_offset,
446+ i32 0)
447+
448+ ; float4
449+ %ret = call {<4 x float>, i1}
450+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_v4f32_0_0_0t(
451+ target("dx.RawBuffer", float, 0, 0, 0) %buffer,
452+ i32 %index,
453+ i32 0)
454+ %ret = call {float, i1}
455+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_i8_0_0_0t(
456+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
457+ i32 %byte_offset,
458+ i32 0)
459+
460+ ; struct S0 { float4 f; int4 i; };
461+ %ret = call {<4 x float>, i1}
462+ @llvm.dx.resource.load.rawbuffer.v4f32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
463+ target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
464+ i32 %index,
465+ i32 0)
466+ %ret = call {<4 x i32>, i1}
467+ @llvm.dx.resource.load.rawbuffer.v4i32.tdx.RawBuffer_sl_v4f32v4i32s_0_0t(
468+ target("dx.RawBuffer", {<4 x float>, <4 x i32>}, 0, 0, 0) %buffer,
469+ i32 %index,
470+ i32 1)
471+
472+ ; struct Q { float4 f; int3 i; }
473+ ; struct R { int z; S x; }
474+ %ret = call {i32, i1}
475+ @llvm.dx.resource.load.rawbuffer.i32(
476+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
477+ %buffer, i32 %index, i32 0)
478+ %ret = call {<4 x float>, i1}
479+ @llvm.dx.resource.load.rawbuffer.i32(
480+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
481+ %buffer, i32 %index, i32 4)
482+ %ret = call {<3 x i32>, i1}
483+ @llvm.dx.resource.load.rawbuffer.i32(
484+ target("dx.RawBuffer", {i32, {<4 x float>, <3 x i32>}}, 0, 0, 0)
485+ %buffer, i32 %index, i32 20)
486+
487+ ; byteaddressbuf.Load<int64_t4>
488+ %ret = call {<4 x i64>, i1}
489+ @llvm.dx.resource.load.rawbuffer.v4i64.tdx.RawBuffer_i8_0_0t(
490+ target("dx.RawBuffer", i8, 0, 0, 0) %buffer,
491+ i32 %byte_offset,
492+ i32 0)
493+
395494 Texture and Typed Buffer Stores
396495-------------------------------
397496
0 commit comments