Support loadAligned on CUDA backend. by csyonghe · Pull Request #10098 · shader-slang/slang

csyonghe · 2026-02-20T01:09:58Z

This change adds proper code generation for loadAligned calls when emitting cuda code.

This is implemented by extending the existing lowerImmutableBufferLoadForCUDA to lowerImmutableOrAlignedBufferLoadForCUDA.

In the pass, when we see a load(ptr:T*, aligned(16)), we will produce a struct T_aligned16 { T value; } type that wraps a T, with a [Alignment(16)] decoration on the wrapper struct type. Then we rewrite the load to load(bit_cast<T_aligned16*>(ptr)).value. The cuda backend is extended to recognize the Alignment decoration and emit it as a __align(16)__ attribute in the resulting cuda code.

coderabbitai · 2026-02-20T01:10:17Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Aligned-load support was added across IR, core API, CUDA lowering, and emitter: alignment decorations and builder helpers, core loadAligned signatures now accept pointer-with-access types, CUDA lowering wraps/unwraps types for aligned loads, and CUDA emitter emits align(N) when present.

Changes

Cohort / File(s)	Summary
Core API `source/slang/core.meta.slang`	Changed `__load_aligned` and `loadAligned` signatures to accept `Ptr<T, access, AddressSpace.Device>` with an `Access` template parameter instead of raw `T*`.
IR: decorations & stable names `source/slang/slang-ir-insts.lua`, `source/slang/slang-ir-insts-stable-names.lua`	Added `AlignmentDecoration` (integer operand) to IR decorations and a stable-name entry for `Decoration.AlignmentDecoration`.
IR: APIs `source/slang/slang-ir-insts.h`	Added `IRLoad::getPtrOperand()`, `IRBuilder::getPtrType(..., oldPtrType)` overload, and `IRBuilder::addAlignmentDecoration(...)` helper (duplicate insertion for availability).
CUDA lowering `source/slang/slang-ir-cuda-immutable-load.h`, `source/slang/slang-ir-cuda-immutable-load.cpp`, `source/slang/slang-emit.cpp`	Renamed lowering to `lowerImmutableOrAlignedBufferLoadForCUDA`; added aligned-wrapper key/cache, `getOrCreateAlignedWrapper`, pointer bitcast to wrapper, load-through-wrapper lowering, and unwrap/extract-and-rewire behavior.
CUDA emitter `source/slang/slang-emit-cuda.h`, `source/slang/slang-emit-cuda.cpp`	Declared/implemented `emitPostKeywordTypeAttributesImpl(IRInst*)` which emits `__align__(N)` when an `AlignmentDecoration` is present.
Tests `tests/spirv/aligned-load-store.slang`	Added `uniform ImmutablePtr<C> data3;` and a `loadAligned<16>(data3)`/`storeAligned<16>(...)` sequence; updated expectations for additional PTX load/store paths.

Sequence Diagram(s)

sequenceDiagram
    participant User as User Code
    participant CoreAPI as Core API
    participant IRBuilder as IR Builder
    participant LowerPass as CUDA Lowering Pass
    participant CUDAEmitter as CUDA Emitter

    User->>CoreAPI: call loadAligned<16>(ptr)
    CoreAPI->>IRBuilder: emit IRLoad + AlignmentDecoration
    IRBuilder->>LowerPass: provide IR with alignment metadata
    LowerPass->>LowerPass: getOrCreate aligned-wrapper type
    LowerPass->>LowerPass: bitcast ptr -> wrapped ptr and load wrapped struct
    LowerPass->>LowerPass: extract field, replace uses (unwrap)
    LowerPass->>CUDAEmitter: emit lowered IR
    CUDAEmitter->>CUDAEmitter: detect AlignmentDecoration and emit __align__(N)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰
I tuck a field in a cozy wrap,
hop, bitcast, then gently unwrap,
a tiny decoration points the way,
CUDA lines up bytes to play,
hooray — aligned hops for the day!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Support loadAligned on CUDA backend' accurately and concisely describes the main objective of the pull request, which is to add code generation for loadAligned calls in CUDA.
Description check	✅ Passed	The description is directly related to the changeset, providing a clear technical explanation of how loadAligned support is implemented in the CUDA backend through wrapper structs and alignment decorations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds support for generating CUDA code with proper alignment attributes for loadAligned calls. The implementation extends the existing immutable buffer load lowering pass to handle aligned loads by creating wrapper struct types with alignment decorations.

Changes:

Extends lowerImmutableBufferLoadForCUDA to lowerImmutableOrAlignedBufferLoadForCUDA to handle both immutable and aligned buffer loads for CUDA targets
Adds AlignmentDecoration to the IR instruction system to represent alignment requirements on struct types
Implements wrapper type creation that adds __align__ attributes in generated CUDA code
Updates loadAligned signature to accept Ptr<T, access, AddressSpace.Device> for better type flexibility

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/spirv/aligned-load-store.slang	Adds PTX target test expectations for aligned loads on CUDA backend
source/slang/slang-ir-insts.lua	Defines new AlignmentDecoration IR instruction
source/slang/slang-ir-insts.h	Adds getPtrOperand helper method to IRLoad and getPtrType overload to IRBuilder
source/slang/slang-ir-insts-stable-names.lua	Assigns stable ID (728) to AlignmentDecoration
source/slang/slang-ir-cuda-immutable-load.h	Renames function to reflect expanded functionality
source/slang/slang-ir-cuda-immutable-load.cpp	Implements aligned wrapper type creation and load rewriting logic
source/slang/slang-emit.cpp	Updates pass invocation to use renamed function
source/slang/slang-emit-cuda.h	Declares emitPostKeywordTypeAttributesImpl override
source/slang/slang-emit-cuda.cpp	Implements align attribute emission for types with AlignmentDecoration; includes minor formatting cleanup
source/slang/core.meta.slang	Updates __load_aligned and loadAligned signatures to support Ptr with access qualifiers

source/slang/slang-ir-cuda-immutable-load.cpp

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/slang/slang-emit-cuda.cpp`:
- Around line 1275-1277: The three-line block assigning rowCount, colCount, and
matrixUse from coopMatType has formatting drift; reformat this block to match
the project's clang-format style (run the repo's formatting script or
clang-format) so spacing/indentation and casting align with surrounding code.
Locate the block that uses coopMatType and IRIntLit with getRowCount(),
getColumnCount(), and getMatrixUse() and re-run the formatter so the lines
assigning uint32_t rowCount, uint32_t colCount, and uint32_t matrixUse conform
to the project's style.

In `@source/slang/slang-ir-cuda-immutable-load.cpp`:
- Around line 349-404: The FieldExtract is currently emitted before the load
because builder.setInsertBefore(load) is used; when needUnwrap is true we must
emit the extract after the load has been produced to avoid use-before-def.
Change the insertion point right before calling emitFieldExtract (e.g., call
builder.setInsertAfter(loadedValue) or builder.setInsertAfter(load) once
loadedValue is available) so the builder emits the extract after the load, then
perform the replaceOperand loop and/or inst replacement/remove as before
(symbols: builder, load, needUnwrap, loadedValue, emitFieldExtract, inst,
replaceUsesWith, removeAndDeallocate).

source/slang/slang-emit-cuda.cpp

slangbot · 2026-02-20T03:50:19Z

⚠️ IR Instruction Files Changed

This PR modifies IR instruction definition files. Please review if you need to update the following constants in source/slang/slang-ir.h:

k_minSupportedModuleVersion: Should be incremented if you're removing instructions or making breaking changes
k_maxSupportedModuleVersion: Should be incremented when adding new instructions

These version numbers help ensure compatibility between different versions of compiled modules.

csyonghe · 2026-02-20T19:30:53Z

/format

slangbot · 2026-02-20T19:31:43Z

🌈 Formatted, please merge the changes from this PR

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/slang/slang-ir-cuda-immutable-load.cpp`:
- Around line 349-405: Cache the root address of the load's pointer before you
insert the alignment bitcast so the immutable-location check runs against the
original root; specifically, compute auto rootAddr = getRootAddr(load->getPtr())
before calling getOrCreateAlignedWrapper/ builder.replaceOperand, then use
isPointerToImmutableLocation(rootAddr) (and, if needed, pass rootAddr into the
immutable-load path) instead of calling getRootAddr(load->getPtr()) after the
bitcast so emitImmutableLoad/ isPointerToImmutableLocation see the unmodified
root.

source/slang/slang-ir-cuda-immutable-load.cpp

Format code for PR shader-slang#10098

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/slang/slang-ir-cuda-immutable-load.cpp`:
- Around line 82-85: The call to getNaturalSizeAndAlignment currently ignores
its Result return value causing naturalSizeAlignment to be potentially
uninitialized; modify the code around getNaturalSizeAndAlignment (and the local
IRSizeAndAlignment naturalSizeAlignment) to check the Result and handle failure:
if the call fails, do not use naturalSizeAlignment—either propagate the failure
up (return an error/Result from this function) or return a safe fallback (e.g.,
the original innerType) depending on the surrounding API contract, and adjust
callers if you choose to propagate the error.

source/slang/slang-ir-cuda-immutable-load.cpp

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/slang/slang-ir-cuda-immutable-load.cpp`:
- Around line 382-405: The code in the needUnwrap branch replaces all recorded
uses with the extracted field value but never removes the original load
instruction (inst), leaving an orphaned IR node; after you call
builder.replaceOperand for the saved uses (inside the needUnwrap block where you
set loadedValue = builder.emitFieldExtract(...)), check whether loadedValue !=
inst and if so call inst->replaceUsesWith(loadedValue) or simply
inst->removeAndDeallocate() as appropriate to fully remove the original load;
update the needUnwrap branch to mirror the cleanup behavior of the else-if that
handles loadedValue != inst so inst is deallocated when the unwrap path produced
a new value.

source/slang/slang-ir-cuda-immutable-load.cpp

csyonghe · 2026-02-20T20:56:57Z

/format

slangbot · 2026-02-20T20:57:44Z

🌈 Formatted, please merge the changes from this PR

Format code for PR shader-slang#10098

coderabbitai

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@source/slang/slang-ir-cuda-immutable-load.cpp`:
- Around line 352-407: The unwrap branch currently replaces uses by iterating
uses and calling builder.replaceOperand but never removes the original inst,
leaving an orphaned IR node; update the needUnwrap branch after extracting the
field so that if the immutable lowering actually produced a different value
(i.e., loadedValue != inst) you call inst->replaceUsesWith(loadedValue) (or
ensure all uses are replaced) and then inst->removeAndDeallocate(); in short,
after the builder.emitFieldExtract and the loop that replaces uses, check
loadedValue != inst and call inst->replaceUsesWith(loadedValue) if needed and
then inst->removeAndDeallocate() so the original instruction is cleaned up
(referencing symbols: needUnwrap, emitImmutableLoad, loadedValue, inst,
builder.emitFieldExtract, builder.replaceOperand, replaceUsesWith,
removeAndDeallocate).

Support loadAligned on CUDA backend.

2c9c08d

Copilot AI review requested due to automatic review settings February 20, 2026 01:09

csyonghe requested a review from a team as a code owner February 20, 2026 01:09

csyonghe requested review from bmillsNV and removed request for a team February 20, 2026 01:09

Copilot started reviewing on behalf of csyonghe February 20, 2026 01:10 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

source/slang/slang-emit-cuda.cpp Outdated Show resolved Hide resolved

csyonghe added 2 commits February 20, 2026 11:28

Address review comments.

9cae4dd

Address comment.

87c161f

format code

c2c6efa

slangbot mentioned this pull request Feb 20, 2026

Format code for PR #10098 csyonghe/slang#45

Merged

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

source/slang/slang-ir-cuda-immutable-load.cpp Show resolved Hide resolved

csyonghe added 2 commits February 20, 2026 12:12

address review.

08c9d5e

Merge pull request #45 from slangbot/format-10098-loadaligned-cuda

83035b2

Format code for PR shader-slang#10098

csyonghe added the pr: non-breaking PRs without breaking changes label Feb 20, 2026

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

source/slang/slang-ir-cuda-immutable-load.cpp Show resolved Hide resolved

Update source/slang/slang-ir-cuda-immutable-load.cpp

b1d860e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

source/slang/slang-ir-cuda-immutable-load.cpp Show resolved Hide resolved

format code

cc4d549

slangbot mentioned this pull request Feb 20, 2026

Format code for PR #10098 csyonghe/slang#46

Merged

Merge pull request #46 from slangbot/format-10098-loadaligned-cuda

6e7e3ff

Format code for PR shader-slang#10098

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Comments

Conversation

csyonghe commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slangbot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csyonghe commented Feb 20, 2026

Uh oh!

slangbot commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

csyonghe commented Feb 20, 2026

Uh oh!

slangbot commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

slangbot commented Feb 20, 2026 •

edited

Loading