groupshared improvement discussion

This is a general issue I wanted to file to encourage open discussion about future improvements to `groupshared` LDS/shmem usage and allocation.

## Problem Discussion

This is unlikely exhaustive, but here are the problems that are likely to crop up for users of `groupshared` memory (I will use `groupshared`, LDS, and shmem interchangeably).

- LDS allocation usually scales linearly with the number of waves in a thread group, however, because `WaveGetLaneCount` is not a compile-time constant, the wave size cannot readily be used in a `groupshared` declaration. The current workaround for this is to either make (bad) assumptions about the wave size, or produce multiple specializations of each shader and dispatching the correct one with a matching `WaveSize` at runtime. Neither option is ideal, with the former resulting in brittle hardware-specific code, and the latter resulting in build and runtime complexity.
- Currently, `groupshared` data _must_ be declared in the global declaration context. This impedes code composability -- for example, if we wanted to include a header to use a function defined in that header, we may hurt occupancy by inadvertently dragging along `groupshared` declarations. This is a real footgun in larger (and sometimes smaller) codebases, and it is difficult to detect when it occurs (or at least, it takes some work to understand why occupancy is lower than expected).
- Along the same vein of code composability, LDS memory usage must interact directly with the `groupshared` variable as declared, due to the lack of a user-accessible `ref` function parameter qualifier. A function that operates over some input data and exports output data cannot always rely on "copy-in" and "copy-out" semantics, because the `in` and `out` semantics do not permit usage of the various `Interlocked*` functions (which internally are modeled using `ref`-qualified parameters)

## Possible Solutions

If possible, I think there are a few things that would immediately improve quality-of-life for compute shader authors. These suggestions are written from the perspective of an ISV (the shader writer), and it's understood that other solutions may end up being preferable due to practicality, performance, ease-of-implementation, or all of the above from the perspective of a hardware vendor or DXC compiler implementer.

1. Permit the use of `WaveGetLaneCount` in the declaration type for `groupshared` storage.
2. Permit local variables to be declared as `groupshared`.
3. Add a `ref` keyword that would permit use of `Interlocked*` intrinsics for ref-qualified parameters

The first item would allow developers to conceptually treat `WaveGetLaneCount` as a `constexpr` function, whose value is realized only when a PSO is actually created at runtime. This has implications beyond LDS allocation, but would be a very useful tool in the toolbox for other use cases.

For the second item, because functions are still fully unrolled currently, the total storage needed per-thread-group for a given compute shader should still be statically known, although DXIL may require modifications to properly alias types allocated from the virtual shmem pool. The idea here is that a static analysis pass would determine the amount of LDS memory needed in the "middle swell" of the program, accounting for all possible branches taken where `groupshared` variables are declared.

The counterargument to the second item is that statically knowing how much LDS is used precludes future HLSL code in a world where function calling is possible. At this point, one option would be to permit functions to allocate LDS (similar to `alloca`) using the same semantics as locally declared `groupshared` variables. The driver would need to be able to suspend thread groups if LDS isn't available, or possibly demote allocated LDS to slower vram (possibly from a fixed size pool of reserved memory).

The last item addresses the ability to perform operations on memory in LDS, regardless of where or how that LDS memory was allocated.

All that said, my main goal is to encourage discussion, and not attempt to be overly prescriptive about the solutions. I think starting from a well-defined problem statement is likely step one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupshared improvement discussion #83

Problem Discussion

Possible Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

groupshared improvement discussion #83

Description

Problem Discussion

Possible Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions