Skip to content

Add compilation benchmark and call optixModuleCreate in parallel#662

Open
ksavoie-nv wants to merge 17 commits intoshader-slang:mainfrom
ksavoie-nv:add-parallel-compilation
Open

Add compilation benchmark and call optixModuleCreate in parallel#662
ksavoie-nv wants to merge 17 commits intoshader-slang:mainfrom
ksavoie-nv:add-parallel-compilation

Conversation

@ksavoie-nv
Copy link
Collaborator

@ksavoie-nv ksavoie-nv commented Feb 12, 2026

Compile Benchmarking Tool

Add a tool to test how well compilation scales and performs. The tool is in the benchmarks/benchmark-compile directory, and contains its own README. The intent is to make it easy to iterate on compilation performance.

Here are the current results (without parallel optixModuleCreate calls), taken on a system with an AMD Ryzen 9 7950X processor:

Device Type # Mods Size Frontend(ms) Codegen(ms) Downstrm(ms) Driver (ms) Total (ms)
Vulkan 1 complex 64.72 40.02 103.99 8266.07 8474.81
D3D12 1 complex 57.70 40.72 14829.35 7158.51 22086.28
CUDA 1 complex 67.85 57.59 23345.34 8598.85 32069.62
Device Type # Mods Size Frontend(ms) Codegen(ms) Downstrm(ms) Driver (ms) Total (ms)
Vulkan 16 simple 40.65 30.06 2.24 74.97 147.91
D3D12 16 simple 39.48 53.37 27.73 71.51 192.09
CUDA 16 simple 39.71 92.38 1967.69 158.26 2258.04

Questions

  • Parallel thread pool implementation: The tool has a concurrent task pool implementing the ITaskPool interface, located in the benchmarks/benchmark-compile directory. Would it be worth moving that implementation to src/core/task-pool.cpp next to the existing BlockingTaskPool implementation?
  • Parallelism in tests: If we decide to move the concurrent task pool into slang-rhi's core, should we add a way to enable it in slang-rhi's tests?

Parallelize optixModuleCreate

optixModuleCreate is currently called serially in a loop in createPipeline. That function is thread-safe, so an easy, early win is to call it in parallel via the global task pool. Running the benchmark, we can see that this improves the time we spend in driver compilation:

Device Type Threads # Mods Size Frontend(ms) Codegen(ms) Downstrm(ms) Driver (ms) Total (ms)
CUDA serial 32 simple 74.77 200.13 3465.18 300.58 4040.66
CUDA 1 32 simple 70.37 199.46 3434.02 295.23 3999.08
CUDA 2 32 simple 70.32 201.86 3433.03 135.93 3841.14
CUDA 4 32 simple 69.74 198.29 3433.53 80.49 3782.05
CUDA 8 32 simple 70.00 195.79 3410.63 52.47 3728.89
CUDA 16 32 simple 70.36 193.81 3389.02 47.98 3701.16
CUDA 32 32 simple 73.22 200.19 3408.34 53.59 3735.34

Note how driver compile time improves with the number of threads.

Next steps

  • Most compile time is spent in the downstream compiler (DXC/NVRTC). Investigate dispatching those compilers in parallel in Slang.
  • nvrtc seems excessively slow, especially for small modules. Check if small modules contain unnecessary code or if nvrtc is simply slower

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced benchmark-compile tool to measure shader compilation performance across Vulkan, D3D12, and CUDA backends with detailed timing breakdowns (Frontend, Codegen, Downstream, Driver)
    • Added synthetic module generation supporting configurable module counts and complexity levels for flexible benchmarking scenarios
  • Documentation

    • Added comprehensive README documenting benchmark-compile usage, CLI options, output interpretation, and caching strategies

@ksavoie-nv
Copy link
Collaborator Author

@skallweitNV do you think it would be worth pulling this tool into slang-rhi's repo, or should we keep it separate?

@skallweitNV
Copy link
Collaborator

@ksavoie-nv I think adding benchmarks to slang-rhi is a good thing. I've opened #674 to add a threaded task pool implementation that I've already had in a dev branch. Maybe take a look at that and see if it fits the purpose.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive benchmarking suite for ray tracing shader compilation, including CMake build configuration, synthetic shader module generation with configurable complexity, a detailed benchmark harness measuring compilation phases across backends, parallelized OptiX module compilation via task pools, and proper task pool lifecycle management in tests.

Changes

Cohort / File(s) Summary
Build Configuration
CMakeLists.txt
Added SLANG_RHI_BUILD_BENCHMARKS build option and benchmark-compile executable target with C++20, linking against slang-rhi and slang libraries.
Synthetic Module Generation
benchmarks/benchmark-compile/synthetic-modules.h, benchmarks/benchmark-compile/synthetic-modules.cpp
Introduced public API (generateSyntheticModules, SizeLevel enum, module descriptor structs) to generate ray-tracing shader modules with configurable complexity; simple and complex ClosestHit variants with 20 vs. 5000+ lines.
Benchmark Harness
benchmarks/benchmark-compile/main.cpp
Implemented comprehensive benchmark orchestration with CLI parsing, device/session/thread-pool lifecycle management, module compilation, ray-tracing pipeline creation, timing measurement across frontend/codegen/downstream/driver phases, driver cache clearing, and formatted result table reporting.
Benchmark Documentation
benchmarks/benchmark-compile/README.md
Comprehensive documentation covering tool purpose, results tables, build/run instructions, CLI reference, timing breakdown phases, cache defeat strategies, synthetic module composition, and thread pool behavior.
OptiX Module Compilation Parallelization
src/cuda/optix-api-impl.cpp
Parallelized optixModuleCreate calls via task pool while preserving sequential program group creation; introduced ModuleCompileTask struct, per-module payload handling, and module index mapping to decouple module compilation from program group creation.
Test Infrastructure
tests/main.cpp
Added explicit global task pool cleanup (setGlobalTaskPool(nullptr)) after test execution to prevent resource leaks and ensure proper lifecycle management.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Arguments
    participant Harness as Benchmark Harness
    participant SynMod as SyntheticModules
    participant Compiler as Slang Compiler
    participant RHI as RHI Device
    participant Driver as GPU Driver
    participant TaskPool as Task Pool
    
    CLI->>Harness: Parse device, module count, size, threads
    Harness->>Harness: For each iteration
    Harness->>SynMod: generateSyntheticModules(seed, size, count)
    SynMod-->>Harness: Shader source + entry points
    
    Harness->>Compiler: compileModules (Start Frontend timing)
    Compiler->>Compiler: Load & compile each module
    Compiler-->>Harness: Compiled program (End Frontend timing)
    
    Harness->>Compiler: Access codegen/downstream elapsed times
    Harness->>RHI: createRayTracingPipeline (Start Driver timing)
    RHI->>TaskPool: Submit parallel module creation
    TaskPool->>Driver: optixModuleCreate (parallel)
    Driver-->>TaskPool: Module handles
    TaskPool-->>RHI: All modules ready
    RHI->>Driver: Create program groups & pipeline (sequential)
    Driver-->>RHI: Pipeline handle (End Driver timing)
    
    Harness->>Harness: Aggregate timings across iterations
    Harness-->>CLI: Print results table with breakdowns
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Add threaded task pool #674 – Implements ThreadedTaskPool, global task pool init/set APIs, and ITaskPool semantics that the benchmark harness directly depends on for parallel module compilation and test cleanup.

Suggested reviewers

  • bmillsNV
  • skallweitNV

Poem

🐰 Hopping through benchmarks with glee,
Threads now compile in harmony,
From raygen to hit, we measure each bit,
CUDA runs fast, OptiX won't quit,
Performance revealed, compile time healed!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the two main changes: adding a compilation benchmark tool and parallelizing optixModuleCreate calls, which aligns with the substantial changeset including new benchmark infrastructure and parallel module creation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@ksavoie-nv ksavoie-nv marked this pull request as ready for review February 23, 2026 23:50
@ksavoie-nv ksavoie-nv requested a review from a team as a code owner February 23, 2026 23:50
@ksavoie-nv ksavoie-nv requested review from bmillsNV and removed request for a team February 23, 2026 23:50
@ksavoie-nv
Copy link
Collaborator Author

@skallweitNV I switched to the new threaded task pool, and it works great.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
CMakeLists.txt (1)

964-973: Consider linking warning flags for benchmark code quality.

The test target (line 936) links slang-rhi-warnings slang-rhi-warnings-as-errors, but the benchmark target does not. While examples also omit this, the benchmark is first-party code that would benefit from the same warning discipline.

♻️ Suggested change
     target_compile_definitions(benchmark-compile PRIVATE NOMINMAX)
     target_link_libraries(benchmark-compile PRIVATE slang-rhi slang)
+    target_link_libraries(benchmark-compile PRIVATE slang-rhi-warnings)
     target_include_directories(benchmark-compile PRIVATE include src)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMakeLists.txt` around lines 964 - 973, The benchmark target
benchmark-compile currently omits the project warning flag libraries; update the
benchmark-compile target to use the same warning discipline as the test target
by linking the slang-rhi-warnings and slang-rhi-warnings-as-errors targets (via
target_link_libraries for benchmark-compile) so the benchmark builds with the
same warning and warnings-as-errors settings used by the test target.
benchmarks/benchmark-compile/synthetic-modules.h (1)

16-28: Struct member variables lack the m_ prefix.

Per coding guidelines, member variables should start with m_ prefix and be in camelCase (e.g., m_source, m_entryPointName, m_stage, m_moduleCount, m_sizeLevel, m_seed). These are benchmark-only structs so this is low priority, but worth noting for consistency with the rest of the codebase.

As per coding guidelines: "Member variables should start with 'm_' prefix and be in camelCase".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/synthetic-modules.h` around lines 16 - 28,
Rename the struct member variables to follow the m_ camelCase convention: in
SyntheticModuleDesc change source -> m_source, entryPointName ->
m_entryPointName, stage -> m_stage; in SyntheticModuleParams change moduleCount
-> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed; update all
references/usages of these symbols (constructors, initializers, assignments, and
access sites) to use the new names and keep existing default values and types
unchanged.
benchmarks/benchmark-compile/main.cpp (1)

829-832: Prefer setenv over putenv with const_cast on string literals.

putenv(const_cast<char*>("...")) passes a pointer to read-only memory. While putenv typically doesn't modify the string, the const_cast is technically undefined behavior since the underlying data is a string literal. On POSIX, setenv is the safer alternative:

♻️ Suggested change
 `#else`
-    putenv(const_cast<char*>("OPTIX_CACHE_MAXSIZE=0"));
-    putenv(const_cast<char*>("__GL_SHADER_DISK_CACHE=0"));
+    setenv("OPTIX_CACHE_MAXSIZE", "0", 1);
+    setenv("__GL_SHADER_DISK_CACHE", "0", 1);
 `#endif`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/main.cpp` around lines 829 - 832, Replace the
unsafe putenv(const_cast<char*>("...")) calls with POSIX setenv to avoid casting
string literals to mutable char*; specifically change the putenv usages for
"OPTIX_CACHE_MAXSIZE=0" and "__GL_SHADER_DISK_CACHE=0" to calls like
setenv("OPTIX_CACHE_MAXSIZE","0",1) and setenv("__GL_SHADER_DISK_CACHE","0",1)
(or the equivalent wrapper in your codebase), removing the const_cast and
ensuring the overwrite flag is set.
benchmarks/benchmark-compile/README.md (1)

49-51: Add language identifiers to fenced code blocks.

Static analysis (markdownlint MD040) flags several fenced code blocks without language specifiers (lines 49, 97, 149, 191, 207). Adding text or an appropriate identifier would satisfy the linter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/README.md` around lines 49 - 51, Update the
README fenced code blocks that currently use plain ``` (for example the block
showing "build/<config>/benchmark-compile[.exe]") to include a language
identifier such as `text` (i.e., change ``` to ```text) for all flagged blocks
(around the occurrences at lines shown by the linter) so markdownlint MD040 is
satisfied.
src/cuda/optix-api-impl.cpp (1)

669-681: Use the m_ prefix for ModuleCompileTask fields.

These are member variables of a struct, so they should follow the member naming convention (e.g., m_deviceContext, m_moduleOptions, etc.).
As per coding guidelines, Member variables should start with 'm_' prefix and be in camelCase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuda/optix-api-impl.cpp` around lines 669 - 681, The struct
ModuleCompileTask's fields must use the member naming convention: rename each
field to start with m_ in camelCase (e.g., deviceContext -> m_deviceContext,
moduleOptions -> m_moduleOptions, pipelineOptions -> m_pipelineOptions, ptxCode
-> m_ptxCode, ptxSize -> m_ptxSize, outModule -> m_outModule, result ->
m_result, moduleIndex -> m_moduleIndex, entryPointName -> m_entryPointName) and
update all call sites that access these symbols (references to
ModuleCompileTask::deviceContext, .moduleOptions, .ptxCode, .moduleIndex, etc.)
to the new names to keep consistency across functions that create and consume
ModuleCompileTask instances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 222-237: The hitGroupNames vector's c_str() pointers are stored
into HitGroupDesc entries (hitGroups), but push_back(name) can reallocate and
invalidate those pointers; to fix, compute how many SLANG_STAGE_CLOSEST_HIT
modules exist, reserve that capacity on hitGroupNames (or otherwise ensure
stable storage) before the loop, then create names, push them into
hitGroupNames, and only then assign hg.hitGroupName =
hitGroupNames.back().c_str() and push hg into hitGroups so the pointers remain
valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the for (const auto&
mod : modules) loop to locate where to apply the reserve/stable-storage change.

In `@benchmarks/benchmark-compile/synthetic-modules.cpp`:
- Around line 184-197: The midLayer calculation can equal topLayer when
numLayers == 2, causing duplicate calls; change the midLayer assignment so it
picks a distinct layer when numLayers is small (e.g., if numLayers > 2 keep
midLayer = numLayers/2, otherwise set midLayer = 0) so the second loop calls a
different layer than topLayer; update the midLayer variable used in the loop
(reference: topLayer, midLayer, numLayers, functionsPerLayer) accordingly and
keep the rest of the loop logic unchanged.

In `@src/cuda/optix-api-impl.cpp`:
- Around line 731-747: The loop checking compilation errors returns immediately
on first failure without freeing already-created OptiX modules, leaking device
resources; modify the error path in the block that iterates moduleCount and
inspects taskPayloads (using isOptixError and reportOptixError) to first iterate
over all taskPayloads up to moduleCount and call optixModuleDestroy() for any
non-null module handle (the created OptixModule objects) before returning
SLANG_FAIL, ensuring created modules are cleaned up when compilation fails (the
success path still transfers modules to PipelineImpl).

---

Nitpick comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 829-832: Replace the unsafe putenv(const_cast<char*>("...")) calls
with POSIX setenv to avoid casting string literals to mutable char*;
specifically change the putenv usages for "OPTIX_CACHE_MAXSIZE=0" and
"__GL_SHADER_DISK_CACHE=0" to calls like setenv("OPTIX_CACHE_MAXSIZE","0",1) and
setenv("__GL_SHADER_DISK_CACHE","0",1) (or the equivalent wrapper in your
codebase), removing the const_cast and ensuring the overwrite flag is set.

In `@benchmarks/benchmark-compile/README.md`:
- Around line 49-51: Update the README fenced code blocks that currently use
plain ``` (for example the block showing
"build/<config>/benchmark-compile[.exe]") to include a language identifier such
as `text` (i.e., change ``` to ```text) for all flagged blocks (around the
occurrences at lines shown by the linter) so markdownlint MD040 is satisfied.

In `@benchmarks/benchmark-compile/synthetic-modules.h`:
- Around line 16-28: Rename the struct member variables to follow the m_
camelCase convention: in SyntheticModuleDesc change source -> m_source,
entryPointName -> m_entryPointName, stage -> m_stage; in SyntheticModuleParams
change moduleCount -> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed;
update all references/usages of these symbols (constructors, initializers,
assignments, and access sites) to use the new names and keep existing default
values and types unchanged.

In `@CMakeLists.txt`:
- Around line 964-973: The benchmark target benchmark-compile currently omits
the project warning flag libraries; update the benchmark-compile target to use
the same warning discipline as the test target by linking the slang-rhi-warnings
and slang-rhi-warnings-as-errors targets (via target_link_libraries for
benchmark-compile) so the benchmark builds with the same warning and
warnings-as-errors settings used by the test target.

In `@src/cuda/optix-api-impl.cpp`:
- Around line 669-681: The struct ModuleCompileTask's fields must use the member
naming convention: rename each field to start with m_ in camelCase (e.g.,
deviceContext -> m_deviceContext, moduleOptions -> m_moduleOptions,
pipelineOptions -> m_pipelineOptions, ptxCode -> m_ptxCode, ptxSize ->
m_ptxSize, outModule -> m_outModule, result -> m_result, moduleIndex ->
m_moduleIndex, entryPointName -> m_entryPointName) and update all call sites
that access these symbols (references to ModuleCompileTask::deviceContext,
.moduleOptions, .ptxCode, .moduleIndex, etc.) to the new names to keep
consistency across functions that create and consume ModuleCompileTask
instances.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4ea8c1 and 6b970c9.

📒 Files selected for processing (7)
  • CMakeLists.txt
  • benchmarks/benchmark-compile/README.md
  • benchmarks/benchmark-compile/main.cpp
  • benchmarks/benchmark-compile/synthetic-modules.cpp
  • benchmarks/benchmark-compile/synthetic-modules.h
  • src/cuda/optix-api-impl.cpp
  • tests/main.cpp

Comment on lines +222 to +237
std::vector<HitGroupDesc> hitGroups;
std::vector<std::string> hitGroupNames; // keep strings alive

for (const auto& mod : modules)
{
if (mod.stage == SLANG_STAGE_CLOSEST_HIT)
{
std::string name = "hitgroup_" + mod.entryPointName;
hitGroupNames.push_back(name);

HitGroupDesc hg = {};
hg.hitGroupName = hitGroupNames.back().c_str();
hg.closestHitEntryPoint = mod.entryPointName.c_str();
hitGroups.push_back(hg);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Dangling c_str() pointers: hitGroupNames reallocation invalidates previously stored pointers.

Each hitGroupNames.push_back(name) may reallocate the vector, invalidating all c_str() pointers already stored in hitGroups. With multiple closesthit modules, this causes hitGroups[i].hitGroupName to become a dangling pointer.

Fix by reserving capacity upfront so no reallocation occurs:

🐛 Proposed fix
     std::vector<HitGroupDesc> hitGroups;
     std::vector<std::string> hitGroupNames; // keep strings alive
 
+    // Reserve to prevent reallocation (which would invalidate c_str() pointers).
+    hitGroupNames.reserve(modules.size());
+
     for (const auto& mod : modules)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::vector<HitGroupDesc> hitGroups;
std::vector<std::string> hitGroupNames; // keep strings alive
for (const auto& mod : modules)
{
if (mod.stage == SLANG_STAGE_CLOSEST_HIT)
{
std::string name = "hitgroup_" + mod.entryPointName;
hitGroupNames.push_back(name);
HitGroupDesc hg = {};
hg.hitGroupName = hitGroupNames.back().c_str();
hg.closestHitEntryPoint = mod.entryPointName.c_str();
hitGroups.push_back(hg);
}
}
std::vector<HitGroupDesc> hitGroups;
std::vector<std::string> hitGroupNames; // keep strings alive
// Reserve to prevent reallocation (which would invalidate c_str() pointers).
hitGroupNames.reserve(modules.size());
for (const auto& mod : modules)
{
if (mod.stage == SLANG_STAGE_CLOSEST_HIT)
{
std::string name = "hitgroup_" + mod.entryPointName;
hitGroupNames.push_back(name);
HitGroupDesc hg = {};
hg.hitGroupName = hitGroupNames.back().c_str();
hg.closestHitEntryPoint = mod.entryPointName.c_str();
hitGroups.push_back(hg);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/main.cpp` around lines 222 - 237, The
hitGroupNames vector's c_str() pointers are stored into HitGroupDesc entries
(hitGroups), but push_back(name) can reallocate and invalidate those pointers;
to fix, compute how many SLANG_STAGE_CLOSEST_HIT modules exist, reserve that
capacity on hitGroupNames (or otherwise ensure stable storage) before the loop,
then create names, push them into hitGroupNames, and only then assign
hg.hitGroupName = hitGroupNames.back().c_str() and push hg into hitGroups so the
pointers remain valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the
for (const auto& mod : modules) loop to locate where to apply the
reserve/stable-storage change.

Comment on lines +184 to +197
int topLayer = numLayers - 1;
for (int f = 0; f < functionsPerLayer; ++f)
{
ss << " acc = acc + layer" << topLayer << "_func" << f << "_" << index << "_s" << seed << "(result, float3("
<< (f + 1) << ", " << (f + 2) << ", " << (f + 3) << "), acc);\n";
}

// Also call a few mid-layer functions to prevent dead-code elimination.
int midLayer = numLayers / 2;
for (int f = 0; f < functionsPerLayer; ++f)
{
ss << " acc = acc + layer" << midLayer << "_func" << f << "_" << index << "_s" << seed
<< "(acc, result, float3(" << (f + 1) << ", 0, 0));\n";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

midLayer equals topLayer when numLayers = 2, making the "mid-layer" calls redundant.

With numLayers = 2: topLayer = 1 and midLayer = numLayers / 2 = 1. Both loops (lines 185-189 and 193-197) call the same layer1_* functions, just with different arguments. The comment on line 191 says "call a few mid-layer functions to prevent dead-code elimination," but there's no distinct mid layer here.

This doesn't break anything — it still stresses the compiler — but the duplicated calls may not match the intent. Consider either bumping numLayers to 3+ or using midLayer = 0 to actually exercise the bottom layer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark-compile/synthetic-modules.cpp` around lines 184 - 197,
The midLayer calculation can equal topLayer when numLayers == 2, causing
duplicate calls; change the midLayer assignment so it picks a distinct layer
when numLayers is small (e.g., if numLayers > 2 keep midLayer = numLayers/2,
otherwise set midLayer = 0) so the second loop calls a different layer than
topLayer; update the midLayer variable used in the loop (reference: topLayer,
midLayer, numLayers, functionsPerLayer) accordingly and keep the rest of the
loop logic unchanged.

Comment on lines +731 to +747
// Check for compilation errors.
// TODO: destroy successfully created OptixModules on error
for (size_t i = 0; i < moduleCount; ++i)
{
if (isOptixError(taskPayloads[i].result))
{
char errorMsg[512];
snprintf(
errorMsg,
sizeof(errorMsg),
"optixModuleCreate failed for module %zu ('%s')",
taskPayloads[i].moduleIndex,
taskPayloads[i].entryPointName
);
reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
return SLANG_FAIL;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the exact file and lines mentioned
wc -l src/cuda/optix-api-impl.cpp

Repository: shader-slang/slang-rhi

Length of output: 98


🏁 Script executed:

# Read the section around lines 731-747 with context
sed -n '700,770p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3453


🏁 Script executed:

# Search for where optixModules is declared/populated in this file
rg -n "optixModules\s*=" src/cuda/optix-api-impl.cpp -A 3 -B 3

Repository: shader-slang/slang-rhi

Length of output: 48


🏁 Script executed:

# Search for optixModules variable declaration/initialization broadly
rg -n "optixModules" src/cuda/optix-api-impl.cpp -B 5 | head -50

Repository: shader-slang/slang-rhi

Length of output: 2493


🏁 Script executed:

# Find the function containing these lines
rg -n "^[a-zA-Z_].*\{" src/cuda/optix-api-impl.cpp | grep -B 1 "7[0-3][0-9]:"

Repository: shader-slang/slang-rhi

Length of output: 48


🏁 Script executed:

# Get a larger context around line 731-747
sed -n '680,760p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3723


🏁 Script executed:

# Find the function signature for the code block around line 731
sed -n '600,690p' src/cuda/optix-api-impl.cpp | head -40

Repository: shader-slang/slang-rhi

Length of output: 2160


🏁 Script executed:

# Search backwards from line 731 to find function start
rg -n "^[a-zA-Z_].*Result\s+\w+::" src/cuda/optix-api-impl.cpp | tail -5

Repository: shader-slang/slang-rhi

Length of output: 48


🏁 Script executed:

# Get a wider context to see function start and end
sed -n '600,650p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 3085


🏁 Script executed:

# Get the end of this function to see if there's cleanup on success path
sed -n '900,1000p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 4961


🏁 Script executed:

# Search for optixModuleDestroy in the entire file
rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cpp

Repository: shader-slang/slang-rhi

Length of output: 137


🏁 Script executed:

# Check if the function stores optixModules anywhere or passes them to a resource manager
sed -n '900,950p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 2275


🏁 Script executed:

# Check how optixModules is initialized and what OptixModule type is
sed -n '655,680p' src/cuda/optix-api-impl.cpp | cat -n

Repository: shader-slang/slang-rhi

Length of output: 1405


🏁 Script executed:

# Look at the PipelineImpl to see if it has cleanup logic
rg -n "class PipelineImpl" src/cuda/optix-api-impl.cpp -A 20

Repository: shader-slang/slang-rhi

Length of output: 1013


🏁 Script executed:

# Check where optixModuleDestroy is called to understand the pattern
rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cpp -B 5 -A 2

Repository: shader-slang/slang-rhi

Length of output: 458


Clean up created modules when any compilation task fails.

The function returns without destroying successfully created modules on error, leaking device resources. The modules are created as raw OptiX handles and need explicit destruction via optixModuleDestroy() before returning; the std::vector destructor does not call it automatically. On the success path, modules are transferred to PipelineImpl which handles cleanup in its destructor, but the error path bypasses this.

Collect all errors in the loop, then destroy non-null modules before returning:

🧹 Suggested cleanup on failure
-        for (size_t i = 0; i < moduleCount; ++i)
-        {
-            if (isOptixError(taskPayloads[i].result))
-            {
-                char errorMsg[512];
-                snprintf(
-                    errorMsg,
-                    sizeof(errorMsg),
-                    "optixModuleCreate failed for module %zu ('%s')",
-                    taskPayloads[i].moduleIndex,
-                    taskPayloads[i].entryPointName
-                );
-                reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
-                return SLANG_FAIL;
-            }
-        }
+        bool hadError = false;
+        for (size_t i = 0; i < moduleCount; ++i)
+        {
+            if (isOptixError(taskPayloads[i].result))
+            {
+                char errorMsg[512];
+                snprintf(
+                    errorMsg,
+                    sizeof(errorMsg),
+                    "optixModuleCreate failed for module %zu ('%s')",
+                    taskPayloads[i].moduleIndex,
+                    taskPayloads[i].entryPointName
+                );
+                reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
+                hadError = true;
+            }
+        }
+        if (hadError)
+        {
+            for (OptixModule module : optixModules)
+            {
+                if (module)
+                    SLANG_OPTIX_ASSERT_ON_FAIL(optixModuleDestroy(module));
+            }
+            return SLANG_FAIL;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuda/optix-api-impl.cpp` around lines 731 - 747, The loop checking
compilation errors returns immediately on first failure without freeing
already-created OptiX modules, leaking device resources; modify the error path
in the block that iterates moduleCount and inspects taskPayloads (using
isOptixError and reportOptixError) to first iterate over all taskPayloads up to
moduleCount and call optixModuleDestroy() for any non-null module handle (the
created OptixModule objects) before returning SLANG_FAIL, ensuring created
modules are cleaned up when compilation fails (the success path still transfers
modules to PipelineImpl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants