Add compilation benchmark and call optixModuleCreate in parallel by ksavoie-nv · Pull Request #662 · shader-slang/slang-rhi

ksavoie-nv · 2026-02-12T21:13:45Z

Compile Benchmarking Tool

Add a tool to test how well compilation scales and performs. The tool is in the benchmarks/benchmark-compile directory, and contains its own README. The intent is to make it easy to iterate on compilation performance.

Here are the current results (without parallel optixModuleCreate calls), taken on a system with an AMD Ryzen 9 7950X processor:

Device Type	# Mods	Size	Frontend(ms)	Codegen(ms)	Downstrm(ms)	Driver (ms)	Total (ms)
Vulkan	1	complex	64.72	40.02	103.99	8266.07	8474.81
D3D12	1	complex	57.70	40.72	14829.35	7158.51	22086.28
CUDA	1	complex	67.85	57.59	23345.34	8598.85	32069.62

Device Type	# Mods	Size	Frontend(ms)	Codegen(ms)	Downstrm(ms)	Driver (ms)	Total (ms)
Vulkan	16	simple	40.65	30.06	2.24	74.97	147.91
D3D12	16	simple	39.48	53.37	27.73	71.51	192.09
CUDA	16	simple	39.71	92.38	1967.69	158.26	2258.04

Questions

Parallel thread pool implementation: The tool has a concurrent task pool implementing the ITaskPool interface, located in the benchmarks/benchmark-compile directory. Would it be worth moving that implementation to src/core/task-pool.cpp next to the existing BlockingTaskPool implementation?
Parallelism in tests: If we decide to move the concurrent task pool into slang-rhi's core, should we add a way to enable it in slang-rhi's tests?

Parallelize optixModuleCreate

optixModuleCreate is currently called serially in a loop in createPipeline. That function is thread-safe, so an easy, early win is to call it in parallel via the global task pool. Running the benchmark, we can see that this improves the time we spend in driver compilation:

Device Type	Threads	# Mods	Size	Frontend(ms)	Codegen(ms)	Downstrm(ms)	Driver (ms)	Total (ms)
CUDA	serial	32	simple	74.77	200.13	3465.18	300.58	4040.66
CUDA	1	32	simple	70.37	199.46	3434.02	295.23	3999.08
CUDA	2	32	simple	70.32	201.86	3433.03	135.93	3841.14
CUDA	4	32	simple	69.74	198.29	3433.53	80.49	3782.05
CUDA	8	32	simple	70.00	195.79	3410.63	52.47	3728.89
CUDA	16	32	simple	70.36	193.81	3389.02	47.98	3701.16
CUDA	32	32	simple	73.22	200.19	3408.34	53.59	3735.34

Note how driver compile time improves with the number of threads.

Next steps

Most compile time is spent in the downstream compiler (DXC/NVRTC). Investigate dispatching those compilers in parallel in Slang.
nvrtc seems excessively slow, especially for small modules. Check if small modules contain unnecessary code or if nvrtc is simply slower

Summary by CodeRabbit

Release Notes

New Features
- Introduced benchmark-compile tool to measure shader compilation performance across Vulkan, D3D12, and CUDA backends with detailed timing breakdowns (Frontend, Codegen, Downstream, Driver)
- Added synthetic module generation supporting configurable module counts and complexity levels for flexible benchmarking scenarios
Documentation
- Added comprehensive README documenting benchmark-compile usage, CLI options, output interpretation, and caching strategies

…ation

ksavoie-nv · 2026-02-12T21:14:52Z

@skallweitNV do you think it would be worth pulling this tool into slang-rhi's repo, or should we keep it separate?

skallweitNV · 2026-02-23T13:09:38Z

@ksavoie-nv I think adding benchmarks to slang-rhi is a good thing. I've opened #674 to add a threaded task pool implementation that I've already had in a dev branch. Maybe take a look at that and see if it fits the purpose.

…ation

coderabbitai · 2026-02-23T23:19:20Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive benchmarking suite for ray tracing shader compilation, including CMake build configuration, synthetic shader module generation with configurable complexity, a detailed benchmark harness measuring compilation phases across backends, parallelized OptiX module compilation via task pools, and proper task pool lifecycle management in tests.

Changes

Cohort / File(s)	Summary
Build Configuration `CMakeLists.txt`	Added SLANG_RHI_BUILD_BENCHMARKS build option and benchmark-compile executable target with C++20, linking against slang-rhi and slang libraries.
Synthetic Module Generation `benchmarks/benchmark-compile/synthetic-modules.h`, `benchmarks/benchmark-compile/synthetic-modules.cpp`	Introduced public API (generateSyntheticModules, SizeLevel enum, module descriptor structs) to generate ray-tracing shader modules with configurable complexity; simple and complex ClosestHit variants with 20 vs. 5000+ lines.
Benchmark Harness `benchmarks/benchmark-compile/main.cpp`	Implemented comprehensive benchmark orchestration with CLI parsing, device/session/thread-pool lifecycle management, module compilation, ray-tracing pipeline creation, timing measurement across frontend/codegen/downstream/driver phases, driver cache clearing, and formatted result table reporting.
Benchmark Documentation `benchmarks/benchmark-compile/README.md`	Comprehensive documentation covering tool purpose, results tables, build/run instructions, CLI reference, timing breakdown phases, cache defeat strategies, synthetic module composition, and thread pool behavior.
OptiX Module Compilation Parallelization `src/cuda/optix-api-impl.cpp`	Parallelized optixModuleCreate calls via task pool while preserving sequential program group creation; introduced ModuleCompileTask struct, per-module payload handling, and module index mapping to decouple module compilation from program group creation.
Test Infrastructure `tests/main.cpp`	Added explicit global task pool cleanup (setGlobalTaskPool(nullptr)) after test execution to prevent resource leaks and ensure proper lifecycle management.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Arguments
    participant Harness as Benchmark Harness
    participant SynMod as SyntheticModules
    participant Compiler as Slang Compiler
    participant RHI as RHI Device
    participant Driver as GPU Driver
    participant TaskPool as Task Pool
    
    CLI->>Harness: Parse device, module count, size, threads
    Harness->>Harness: For each iteration
    Harness->>SynMod: generateSyntheticModules(seed, size, count)
    SynMod-->>Harness: Shader source + entry points
    
    Harness->>Compiler: compileModules (Start Frontend timing)
    Compiler->>Compiler: Load & compile each module
    Compiler-->>Harness: Compiled program (End Frontend timing)
    
    Harness->>Compiler: Access codegen/downstream elapsed times
    Harness->>RHI: createRayTracingPipeline (Start Driver timing)
    RHI->>TaskPool: Submit parallel module creation
    TaskPool->>Driver: optixModuleCreate (parallel)
    Driver-->>TaskPool: Module handles
    TaskPool-->>RHI: All modules ready
    RHI->>Driver: Create program groups & pipeline (sequential)
    Driver-->>RHI: Pipeline handle (End Driver timing)
    
    Harness->>Harness: Aggregate timings across iterations
    Harness-->>CLI: Print results table with breakdowns

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add threaded task pool #674 – Implements ThreadedTaskPool, global task pool init/set APIs, and ITaskPool semantics that the benchmark harness directly depends on for parallel module compilation and test cleanup.

Suggested reviewers

bmillsNV
skallweitNV

Poem

🐰 Hopping through benchmarks with glee,
Threads now compile in harmony,
From raygen to hit, we measure each bit,
CUDA runs fast, OptiX won't quit,
Performance revealed, compile time healed! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the two main changes: adding a compilation benchmark tool and parallelizing optixModuleCreate calls, which aligns with the substantial changeset including new benchmark infrastructure and parallel module creation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ksavoie-nv · 2026-02-23T23:51:20Z

@skallweitNV I switched to the new threaded task pool, and it works great.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

CMakeLists.txt (1)
964-973: Consider linking warning flags for benchmark code quality.

The test target (line 936) links slang-rhi-warnings slang-rhi-warnings-as-errors, but the benchmark target does not. While examples also omit this, the benchmark is first-party code that would benefit from the same warning discipline.
♻️ Suggested change
     target_compile_definitions(benchmark-compile PRIVATE NOMINMAX)
     target_link_libraries(benchmark-compile PRIVATE slang-rhi slang)
+    target_link_libraries(benchmark-compile PRIVATE slang-rhi-warnings)
     target_include_directories(benchmark-compile PRIVATE include src)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMakeLists.txt` around lines 964 - 973, The benchmark target
benchmark-compile currently omits the project warning flag libraries; update the
benchmark-compile target to use the same warning discipline as the test target
by linking the slang-rhi-warnings and slang-rhi-warnings-as-errors targets (via
target_link_libraries for benchmark-compile) so the benchmark builds with the
same warning and warnings-as-errors settings used by the test target.
benchmarks/benchmark-compile/synthetic-modules.h (1)
16-28: Struct member variables lack the m_ prefix.

Per coding guidelines, member variables should start with m_ prefix and be in camelCase (e.g., m_source, m_entryPointName, m_stage, m_moduleCount, m_sizeLevel, m_seed). These are benchmark-only structs so this is low priority, but worth noting for consistency with the rest of the codebase.

As per coding guidelines: "Member variables should start with 'm_' prefix and be in camelCase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/synthetic-modules.h` around lines 16 - 28,
Rename the struct member variables to follow the m_ camelCase convention: in
SyntheticModuleDesc change source -> m_source, entryPointName ->
m_entryPointName, stage -> m_stage; in SyntheticModuleParams change moduleCount
-> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed; update all
references/usages of these symbols (constructors, initializers, assignments, and
access sites) to use the new names and keep existing default values and types
unchanged.
benchmarks/benchmark-compile/main.cpp (1)
829-832: Prefer setenv over putenv with const_cast on string literals.

putenv(const_cast<char*>("...")) passes a pointer to read-only memory. While putenv typically doesn't modify the string, the const_cast is technically undefined behavior since the underlying data is a string literal. On POSIX, setenv is the safer alternative:
♻️ Suggested change
 `#else`
-    putenv(const_cast<char*>("OPTIX_CACHE_MAXSIZE=0"));
-    putenv(const_cast<char*>("__GL_SHADER_DISK_CACHE=0"));
+    setenv("OPTIX_CACHE_MAXSIZE", "0", 1);
+    setenv("__GL_SHADER_DISK_CACHE", "0", 1);
 `#endif`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/main.cpp` around lines 829 - 832, Replace the
unsafe putenv(const_cast<char*>("...")) calls with POSIX setenv to avoid casting
string literals to mutable char*; specifically change the putenv usages for
"OPTIX_CACHE_MAXSIZE=0" and "__GL_SHADER_DISK_CACHE=0" to calls like
setenv("OPTIX_CACHE_MAXSIZE","0",1) and setenv("__GL_SHADER_DISK_CACHE","0",1)
(or the equivalent wrapper in your codebase), removing the const_cast and
ensuring the overwrite flag is set.
benchmarks/benchmark-compile/README.md (1)
49-51: Add language identifiers to fenced code blocks.

Static analysis (markdownlint MD040) flags several fenced code blocks without language specifiers (lines 49, 97, 149, 191, 207). Adding text or an appropriate identifier would satisfy the linter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/README.md` around lines 49 - 51, Update the
README fenced code blocks that currently use plain ``` (for example the block
showing "build/<config>/benchmark-compile[.exe]") to include a language
identifier such as `text` (i.e., change ``` to ```text) for all flagged blocks
(around the occurrences at lines shown by the linter) so markdownlint MD040 is
satisfied.
src/cuda/optix-api-impl.cpp (1)
669-681: Use the m_ prefix for ModuleCompileTask fields.

These are member variables of a struct, so they should follow the member naming convention (e.g., m_deviceContext, m_moduleOptions, etc.).
As per coding guidelines, Member variables should start with 'm_' prefix and be in camelCase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuda/optix-api-impl.cpp` around lines 669 - 681, The struct
ModuleCompileTask's fields must use the member naming convention: rename each
field to start with m_ in camelCase (e.g., deviceContext -> m_deviceContext,
moduleOptions -> m_moduleOptions, pipelineOptions -> m_pipelineOptions, ptxCode
-> m_ptxCode, ptxSize -> m_ptxSize, outModule -> m_outModule, result ->
m_result, moduleIndex -> m_moduleIndex, entryPointName -> m_entryPointName) and
update all call sites that access these symbols (references to
ModuleCompileTask::deviceContext, .moduleOptions, .ptxCode, .moduleIndex, etc.)
to the new names to keep consistency across functions that create and consume
ModuleCompileTask instances.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 222-237: The hitGroupNames vector's c_str() pointers are stored
into HitGroupDesc entries (hitGroups), but push_back(name) can reallocate and
invalidate those pointers; to fix, compute how many SLANG_STAGE_CLOSEST_HIT
modules exist, reserve that capacity on hitGroupNames (or otherwise ensure
stable storage) before the loop, then create names, push them into
hitGroupNames, and only then assign hg.hitGroupName =
hitGroupNames.back().c_str() and push hg into hitGroups so the pointers remain
valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the for (const auto&
mod : modules) loop to locate where to apply the reserve/stable-storage change.

In `@benchmarks/benchmark-compile/synthetic-modules.cpp`:
- Around line 184-197: The midLayer calculation can equal topLayer when
numLayers == 2, causing duplicate calls; change the midLayer assignment so it
picks a distinct layer when numLayers is small (e.g., if numLayers > 2 keep
midLayer = numLayers/2, otherwise set midLayer = 0) so the second loop calls a
different layer than topLayer; update the midLayer variable used in the loop
(reference: topLayer, midLayer, numLayers, functionsPerLayer) accordingly and
keep the rest of the loop logic unchanged.

In `@src/cuda/optix-api-impl.cpp`:
- Around line 731-747: The loop checking compilation errors returns immediately
on first failure without freeing already-created OptiX modules, leaking device
resources; modify the error path in the block that iterates moduleCount and
inspects taskPayloads (using isOptixError and reportOptixError) to first iterate
over all taskPayloads up to moduleCount and call optixModuleDestroy() for any
non-null module handle (the created OptixModule objects) before returning
SLANG_FAIL, ensuring created modules are cleaned up when compilation fails (the
success path still transfers modules to PipelineImpl).

---

Nitpick comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 829-832: Replace the unsafe putenv(const_cast<char*>("...")) calls
with POSIX setenv to avoid casting string literals to mutable char*;
specifically change the putenv usages for "OPTIX_CACHE_MAXSIZE=0" and
"__GL_SHADER_DISK_CACHE=0" to calls like setenv("OPTIX_CACHE_MAXSIZE","0",1) and
setenv("__GL_SHADER_DISK_CACHE","0",1) (or the equivalent wrapper in your
codebase), removing the const_cast and ensuring the overwrite flag is set.

In `@benchmarks/benchmark-compile/README.md`:
- Around line 49-51: Update the README fenced code blocks that currently use
plain ``` (for example the block showing
"build/<config>/benchmark-compile[.exe]") to include a language identifier such
as `text` (i.e., change ``` to ```text) for all flagged blocks (around the
occurrences at lines shown by the linter) so markdownlint MD040 is satisfied.

In `@benchmarks/benchmark-compile/synthetic-modules.h`:
- Around line 16-28: Rename the struct member variables to follow the m_
camelCase convention: in SyntheticModuleDesc change source -> m_source,
entryPointName -> m_entryPointName, stage -> m_stage; in SyntheticModuleParams
change moduleCount -> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed;
update all references/usages of these symbols (constructors, initializers,
assignments, and access sites) to use the new names and keep existing default
values and types unchanged.

In `@CMakeLists.txt`:
- Around line 964-973: The benchmark target benchmark-compile currently omits
the project warning flag libraries; update the benchmark-compile target to use
the same warning discipline as the test target by linking the slang-rhi-warnings
and slang-rhi-warnings-as-errors targets (via target_link_libraries for
benchmark-compile) so the benchmark builds with the same warning and
warnings-as-errors settings used by the test target.

In `@src/cuda/optix-api-impl.cpp`:
- Around line 669-681: The struct ModuleCompileTask's fields must use the member
naming convention: rename each field to start with m_ in camelCase (e.g.,
deviceContext -> m_deviceContext, moduleOptions -> m_moduleOptions,
pipelineOptions -> m_pipelineOptions, ptxCode -> m_ptxCode, ptxSize ->
m_ptxSize, outModule -> m_outModule, result -> m_result, moduleIndex ->
m_moduleIndex, entryPointName -> m_entryPointName) and update all call sites
that access these symbols (references to ModuleCompileTask::deviceContext,
.moduleOptions, .ptxCode, .moduleIndex, etc.) to the new names to keep
consistency across functions that create and consume ModuleCompileTask
instances.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4ea8c1 and 6b970c9.

📒 Files selected for processing (7)

CMakeLists.txt
benchmarks/benchmark-compile/README.md
benchmarks/benchmark-compile/main.cpp
benchmarks/benchmark-compile/synthetic-modules.cpp
benchmarks/benchmark-compile/synthetic-modules.h
src/cuda/optix-api-impl.cpp
tests/main.cpp

coderabbitai · 2026-02-23T23:58:49Z

benchmarks/benchmark-compile/main.cpp

+    std::vector<HitGroupDesc> hitGroups;
+    std::vector<std::string> hitGroupNames; // keep strings alive
+
+    for (const auto& mod : modules)
+    {
+        if (mod.stage == SLANG_STAGE_CLOSEST_HIT)
+        {
+            std::string name = "hitgroup_" + mod.entryPointName;
+            hitGroupNames.push_back(name);
+
+            HitGroupDesc hg = {};
+            hg.hitGroupName = hitGroupNames.back().c_str();
+            hg.closestHitEntryPoint = mod.entryPointName.c_str();
+            hitGroups.push_back(hg);
+        }
+    }


⚠️ Potential issue | 🔴 Critical

Dangling c_str() pointers: hitGroupNames reallocation invalidates previously stored pointers.

Each hitGroupNames.push_back(name) may reallocate the vector, invalidating all c_str() pointers already stored in hitGroups. With multiple closesthit modules, this causes hitGroups[i].hitGroupName to become a dangling pointer.

Fix by reserving capacity upfront so no reallocation occurs:

🐛 Proposed fix

std::vector<HitGroupDesc> hitGroups; std::vector<std::string> hitGroupNames; // keep strings alive + // Reserve to prevent reallocation (which would invalidate c_str() pointers). + hitGroupNames.reserve(modules.size()); + for (const auto& mod : modules)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::vector<HitGroupDesc> hitGroups;

std::vector<std::string> hitGroupNames; // keep strings alive

for (const auto& mod : modules)

{

if (mod.stage == SLANG_STAGE_CLOSEST_HIT)

{

std::string name = "hitgroup_" + mod.entryPointName;

hitGroupNames.push_back(name);

HitGroupDesc hg = {};

hg.hitGroupName = hitGroupNames.back().c_str();

hg.closestHitEntryPoint = mod.entryPointName.c_str();

hitGroups.push_back(hg);

}

}

std::vector<HitGroupDesc> hitGroups;

std::vector<std::string> hitGroupNames; // keep strings alive

// Reserve to prevent reallocation (which would invalidate c_str() pointers).

hitGroupNames.reserve(modules.size());

for (const auto& mod : modules)

{

if (mod.stage == SLANG_STAGE_CLOSEST_HIT)

{

std::string name = "hitgroup_" + mod.entryPointName;

hitGroupNames.push_back(name);

HitGroupDesc hg = {};

hg.hitGroupName = hitGroupNames.back().c_str();

hg.closestHitEntryPoint = mod.entryPointName.c_str();

hitGroups.push_back(hg);

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmarks/benchmark-compile/main.cpp` around lines 222 - 237, The hitGroupNames vector's c_str() pointers are stored into HitGroupDesc entries (hitGroups), but push_back(name) can reallocate and invalidate those pointers; to fix, compute how many SLANG_STAGE_CLOSEST_HIT modules exist, reserve that capacity on hitGroupNames (or otherwise ensure stable storage) before the loop, then create names, push them into hitGroupNames, and only then assign hg.hitGroupName = hitGroupNames.back().c_str() and push hg into hitGroups so the pointers remain valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the for (const auto& mod : modules) loop to locate where to apply the reserve/stable-storage change.

coderabbitai · 2026-02-23T23:58:49Z

benchmarks/benchmark-compile/synthetic-modules.cpp

+    int topLayer = numLayers - 1;
+    for (int f = 0; f < functionsPerLayer; ++f)
+    {
+        ss << "    acc = acc + layer" << topLayer << "_func" << f << "_" << index << "_s" << seed << "(result, float3("
+           << (f + 1) << ", " << (f + 2) << ", " << (f + 3) << "), acc);\n";
+    }
+
+    // Also call a few mid-layer functions to prevent dead-code elimination.
+    int midLayer = numLayers / 2;
+    for (int f = 0; f < functionsPerLayer; ++f)
+    {
+        ss << "    acc = acc + layer" << midLayer << "_func" << f << "_" << index << "_s" << seed
+           << "(acc, result, float3(" << (f + 1) << ", 0, 0));\n";
+    }


⚠️ Potential issue | 🟡 Minor

midLayer equals topLayer when numLayers = 2, making the "mid-layer" calls redundant.

With numLayers = 2: topLayer = 1 and midLayer = numLayers / 2 = 1. Both loops (lines 185-189 and 193-197) call the same layer1_* functions, just with different arguments. The comment on line 191 says "call a few mid-layer functions to prevent dead-code elimination," but there's no distinct mid layer here.

This doesn't break anything — it still stresses the compiler — but the duplicated calls may not match the intent. Consider either bumping numLayers to 3+ or using midLayer = 0 to actually exercise the bottom layer.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmarks/benchmark-compile/synthetic-modules.cpp` around lines 184 - 197, The midLayer calculation can equal topLayer when numLayers == 2, causing duplicate calls; change the midLayer assignment so it picks a distinct layer when numLayers is small (e.g., if numLayers > 2 keep midLayer = numLayers/2, otherwise set midLayer = 0) so the second loop calls a different layer than topLayer; update the midLayer variable used in the loop (reference: topLayer, midLayer, numLayers, functionsPerLayer) accordingly and keep the rest of the loop logic unchanged.

coderabbitai · 2026-02-23T23:58:49Z

src/cuda/optix-api-impl.cpp

+        // Check for compilation errors.
+        // TODO: destroy successfully created OptixModules on error
+        for (size_t i = 0; i < moduleCount; ++i)
+        {
+            if (isOptixError(taskPayloads[i].result))
+            {
+                char errorMsg[512];
+                snprintf(
+                    errorMsg,
+                    sizeof(errorMsg),
+                    "optixModuleCreate failed for module %zu ('%s')",
+                    taskPayloads[i].moduleIndex,
+                    taskPayloads[i].entryPointName
+                );
+                reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
+                return SLANG_FAIL;
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the exact file and lines mentioned wc -l src/cuda/optix-api-impl.cpp

Repository: shader-slang/slang-rhi

Length of output: 98

🏁 Script executed:

# Read the section around lines 731-747 with context sed -n '700,770p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3453

🏁 Script executed:

# Search for where optixModules is declared/populated in this file rg -n "optixModules\s*=" src/cuda/optix-api-impl.cpp -A 3 -B 3

Repository: shader-slang/slang-rhi

Length of output: 48

🏁 Script executed:

# Search for optixModules variable declaration/initialization broadly rg -n "optixModules" src/cuda/optix-api-impl.cpp -B 5 | head -50

Repository: shader-slang/slang-rhi

Length of output: 2493

🏁 Script executed:

# Find the function containing these lines rg -n "^[a-zA-Z_].*\{" src/cuda/optix-api-impl.cpp | grep -B 1 "7[0-3][0-9]:"

Repository: shader-slang/slang-rhi

Length of output: 48

🏁 Script executed:

# Get a larger context around line 731-747 sed -n '680,760p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3723

🏁 Script executed:

# Find the function signature for the code block around line 731 sed -n '600,690p' src/cuda/optix-api-impl.cpp | head -40

Repository: shader-slang/slang-rhi

Length of output: 2160

🏁 Script executed:

# Search backwards from line 731 to find function start rg -n "^[a-zA-Z_].*Result\s+\w+::" src/cuda/optix-api-impl.cpp | tail -5

Repository: shader-slang/slang-rhi

Length of output: 48

🏁 Script executed:

# Get a wider context to see function start and end sed -n '600,650p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3085

🏁 Script executed:

# Get the end of this function to see if there's cleanup on success path sed -n '900,1000p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 4961

🏁 Script executed:

# Search for optixModuleDestroy in the entire file rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cpp

Repository: shader-slang/slang-rhi

Length of output: 137

🏁 Script executed:

# Check if the function stores optixModules anywhere or passes them to a resource manager sed -n '900,950p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 2275

🏁 Script executed:

# Check how optixModules is initialized and what OptixModule type is sed -n '655,680p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 1405

🏁 Script executed:

# Look at the PipelineImpl to see if it has cleanup logic rg -n "class PipelineImpl" src/cuda/optix-api-impl.cpp -A 20

Repository: shader-slang/slang-rhi

Length of output: 1013

🏁 Script executed:

# Check where optixModuleDestroy is called to understand the pattern rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cpp -B 5 -A 2

Repository: shader-slang/slang-rhi

Length of output: 458

Clean up created modules when any compilation task fails.

The function returns without destroying successfully created modules on error, leaking device resources. The modules are created as raw OptiX handles and need explicit destruction via optixModuleDestroy() before returning; the std::vector destructor does not call it automatically. On the success path, modules are transferred to PipelineImpl which handles cleanup in its destructor, but the error path bypasses this.

Collect all errors in the loop, then destroy non-null modules before returning:

🧹 Suggested cleanup on failure

- for (size_t i = 0; i < moduleCount; ++i) - { - if (isOptixError(taskPayloads[i].result)) - { - char errorMsg[512]; - snprintf( - errorMsg, - sizeof(errorMsg), - "optixModuleCreate failed for module %zu ('%s')", - taskPayloads[i].moduleIndex, - taskPayloads[i].entryPointName - ); - reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device); - return SLANG_FAIL; - } - } + bool hadError = false; + for (size_t i = 0; i < moduleCount; ++i) + { + if (isOptixError(taskPayloads[i].result)) + { + char errorMsg[512]; + snprintf( + errorMsg, + sizeof(errorMsg), + "optixModuleCreate failed for module %zu ('%s')", + taskPayloads[i].moduleIndex, + taskPayloads[i].entryPointName + ); + reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device); + hadError = true; + } + } + if (hadError) + { + for (OptixModule module : optixModules) + { + if (module) + SLANG_OPTIX_ASSERT_ON_FAIL(optixModuleDestroy(module)); + } + return SLANG_FAIL; + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cuda/optix-api-impl.cpp` around lines 731 - 747, The loop checking compilation errors returns immediately on first failure without freeing already-created OptiX modules, leaking device resources; modify the error path in the block that iterates moduleCount and inspects taskPayloads (using isOptixError and reportOptixError) to first iterate over all taskPayloads up to moduleCount and call optixModuleDestroy() for any non-null module handle (the created OptixModule objects) before returning SLANG_FAIL, ensuring created modules are cleaned up when compilation fails (the success path still transfers modules to PipelineImpl).

ksavoie-nv added 13 commits February 10, 2026 14:09

Add micro-benchmark library

8e9254c

Remove criterion framework

84c2942

Initial version of compiler benchmark

10b2a6c

Add README

6998d12

switch diagram to mermaid

d93f430

More README changes

d4d1988

Add thread count to bench matrix

e20f715

clang-format

2cf2bd3

Call optixModuleCreate in parallel

2d38f35

Merge remote-tracking branch 'upstream/main' into add-parallel-compil…

8962234

…ation

Update README

e92781f

Delete unused benchmark stats

0152c14

Clean up cache disable/delete code

3cac21a

ksavoie-nv added 3 commits February 23, 2026 13:36

Merge branch 'main' into add-parallel-compilation

28c239d

Merge remote-tracking branch 'upstream/main' into add-parallel-compil…

86a0c75

…ation

Switch to slang-rhi's multi-threaded task pool

12ef369

Fix issue where tests reported that task pool was leaked

6b970c9

ksavoie-nv marked this pull request as ready for review February 23, 2026 23:50

ksavoie-nv requested a review from a team as a code owner February 23, 2026 23:50

ksavoie-nv requested review from bmillsNV and removed request for a team February 23, 2026 23:50

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compilation benchmark and call optixModuleCreate in parallel#662

Add compilation benchmark and call optixModuleCreate in parallel#662
ksavoie-nv wants to merge 17 commits intoshader-slang:mainfrom
ksavoie-nv:add-parallel-compilation

ksavoie-nv commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

ksavoie-nv commented Feb 12, 2026

Uh oh!

skallweitNV commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

ksavoie-nv commented Feb 23, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksavoie-nv commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compile Benchmarking Tool

Questions

Parallelize optixModuleCreate

Next steps

Summary by CodeRabbit

Release Notes

Uh oh!

ksavoie-nv commented Feb 12, 2026

Uh oh!

skallweitNV commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

ksavoie-nv commented Feb 23, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksavoie-nv commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading