Add compilation benchmark and call optixModuleCreate in parallel#662
Add compilation benchmark and call optixModuleCreate in parallel#662ksavoie-nv wants to merge 17 commits intoshader-slang:mainfrom
Conversation
|
@skallweitNV do you think it would be worth pulling this tool into slang-rhi's repo, or should we keep it separate? |
|
@ksavoie-nv I think adding benchmarks to slang-rhi is a good thing. I've opened #674 to add a threaded task pool implementation that I've already had in a dev branch. Maybe take a look at that and see if it fits the purpose. |
📝 WalkthroughWalkthroughThis PR introduces a comprehensive benchmarking suite for ray tracing shader compilation, including CMake build configuration, synthetic shader module generation with configurable complexity, a detailed benchmark harness measuring compilation phases across backends, parallelized OptiX module compilation via task pools, and proper task pool lifecycle management in tests. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI Arguments
participant Harness as Benchmark Harness
participant SynMod as SyntheticModules
participant Compiler as Slang Compiler
participant RHI as RHI Device
participant Driver as GPU Driver
participant TaskPool as Task Pool
CLI->>Harness: Parse device, module count, size, threads
Harness->>Harness: For each iteration
Harness->>SynMod: generateSyntheticModules(seed, size, count)
SynMod-->>Harness: Shader source + entry points
Harness->>Compiler: compileModules (Start Frontend timing)
Compiler->>Compiler: Load & compile each module
Compiler-->>Harness: Compiled program (End Frontend timing)
Harness->>Compiler: Access codegen/downstream elapsed times
Harness->>RHI: createRayTracingPipeline (Start Driver timing)
RHI->>TaskPool: Submit parallel module creation
TaskPool->>Driver: optixModuleCreate (parallel)
Driver-->>TaskPool: Module handles
TaskPool-->>RHI: All modules ready
RHI->>Driver: Create program groups & pipeline (sequential)
Driver-->>RHI: Pipeline handle (End Driver timing)
Harness->>Harness: Aggregate timings across iterations
Harness-->>CLI: Print results table with breakdowns
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
|
@skallweitNV I switched to the new threaded task pool, and it works great. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
CMakeLists.txt (1)
964-973: Consider linking warning flags for benchmark code quality.The test target (line 936) links
slang-rhi-warnings slang-rhi-warnings-as-errors, but the benchmark target does not. While examples also omit this, the benchmark is first-party code that would benefit from the same warning discipline.♻️ Suggested change
target_compile_definitions(benchmark-compile PRIVATE NOMINMAX) target_link_libraries(benchmark-compile PRIVATE slang-rhi slang) + target_link_libraries(benchmark-compile PRIVATE slang-rhi-warnings) target_include_directories(benchmark-compile PRIVATE include src)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CMakeLists.txt` around lines 964 - 973, The benchmark target benchmark-compile currently omits the project warning flag libraries; update the benchmark-compile target to use the same warning discipline as the test target by linking the slang-rhi-warnings and slang-rhi-warnings-as-errors targets (via target_link_libraries for benchmark-compile) so the benchmark builds with the same warning and warnings-as-errors settings used by the test target.benchmarks/benchmark-compile/synthetic-modules.h (1)
16-28: Struct member variables lack them_prefix.Per coding guidelines, member variables should start with
m_prefix and be in camelCase (e.g.,m_source,m_entryPointName,m_stage,m_moduleCount,m_sizeLevel,m_seed). These are benchmark-only structs so this is low priority, but worth noting for consistency with the rest of the codebase.As per coding guidelines: "Member variables should start with 'm_' prefix and be in camelCase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/benchmark-compile/synthetic-modules.h` around lines 16 - 28, Rename the struct member variables to follow the m_ camelCase convention: in SyntheticModuleDesc change source -> m_source, entryPointName -> m_entryPointName, stage -> m_stage; in SyntheticModuleParams change moduleCount -> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed; update all references/usages of these symbols (constructors, initializers, assignments, and access sites) to use the new names and keep existing default values and types unchanged.benchmarks/benchmark-compile/main.cpp (1)
829-832: Prefersetenvoverputenvwithconst_caston string literals.
putenv(const_cast<char*>("..."))passes a pointer to read-only memory. Whileputenvtypically doesn't modify the string, theconst_castis technically undefined behavior since the underlying data is a string literal. On POSIX,setenvis the safer alternative:♻️ Suggested change
`#else` - putenv(const_cast<char*>("OPTIX_CACHE_MAXSIZE=0")); - putenv(const_cast<char*>("__GL_SHADER_DISK_CACHE=0")); + setenv("OPTIX_CACHE_MAXSIZE", "0", 1); + setenv("__GL_SHADER_DISK_CACHE", "0", 1); `#endif`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/benchmark-compile/main.cpp` around lines 829 - 832, Replace the unsafe putenv(const_cast<char*>("...")) calls with POSIX setenv to avoid casting string literals to mutable char*; specifically change the putenv usages for "OPTIX_CACHE_MAXSIZE=0" and "__GL_SHADER_DISK_CACHE=0" to calls like setenv("OPTIX_CACHE_MAXSIZE","0",1) and setenv("__GL_SHADER_DISK_CACHE","0",1) (or the equivalent wrapper in your codebase), removing the const_cast and ensuring the overwrite flag is set.benchmarks/benchmark-compile/README.md (1)
49-51: Add language identifiers to fenced code blocks.Static analysis (markdownlint MD040) flags several fenced code blocks without language specifiers (lines 49, 97, 149, 191, 207). Adding
textor an appropriate identifier would satisfy the linter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/benchmark-compile/README.md` around lines 49 - 51, Update the README fenced code blocks that currently use plain ``` (for example the block showing "build/<config>/benchmark-compile[.exe]") to include a language identifier such as `text` (i.e., change ``` to ```text) for all flagged blocks (around the occurrences at lines shown by the linter) so markdownlint MD040 is satisfied.src/cuda/optix-api-impl.cpp (1)
669-681: Use them_prefix forModuleCompileTaskfields.These are member variables of a struct, so they should follow the member naming convention (e.g.,
m_deviceContext,m_moduleOptions, etc.).
As per coding guidelines, Member variables should start with 'm_' prefix and be in camelCase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuda/optix-api-impl.cpp` around lines 669 - 681, The struct ModuleCompileTask's fields must use the member naming convention: rename each field to start with m_ in camelCase (e.g., deviceContext -> m_deviceContext, moduleOptions -> m_moduleOptions, pipelineOptions -> m_pipelineOptions, ptxCode -> m_ptxCode, ptxSize -> m_ptxSize, outModule -> m_outModule, result -> m_result, moduleIndex -> m_moduleIndex, entryPointName -> m_entryPointName) and update all call sites that access these symbols (references to ModuleCompileTask::deviceContext, .moduleOptions, .ptxCode, .moduleIndex, etc.) to the new names to keep consistency across functions that create and consume ModuleCompileTask instances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 222-237: The hitGroupNames vector's c_str() pointers are stored
into HitGroupDesc entries (hitGroups), but push_back(name) can reallocate and
invalidate those pointers; to fix, compute how many SLANG_STAGE_CLOSEST_HIT
modules exist, reserve that capacity on hitGroupNames (or otherwise ensure
stable storage) before the loop, then create names, push them into
hitGroupNames, and only then assign hg.hitGroupName =
hitGroupNames.back().c_str() and push hg into hitGroups so the pointers remain
valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the for (const auto&
mod : modules) loop to locate where to apply the reserve/stable-storage change.
In `@benchmarks/benchmark-compile/synthetic-modules.cpp`:
- Around line 184-197: The midLayer calculation can equal topLayer when
numLayers == 2, causing duplicate calls; change the midLayer assignment so it
picks a distinct layer when numLayers is small (e.g., if numLayers > 2 keep
midLayer = numLayers/2, otherwise set midLayer = 0) so the second loop calls a
different layer than topLayer; update the midLayer variable used in the loop
(reference: topLayer, midLayer, numLayers, functionsPerLayer) accordingly and
keep the rest of the loop logic unchanged.
In `@src/cuda/optix-api-impl.cpp`:
- Around line 731-747: The loop checking compilation errors returns immediately
on first failure without freeing already-created OptiX modules, leaking device
resources; modify the error path in the block that iterates moduleCount and
inspects taskPayloads (using isOptixError and reportOptixError) to first iterate
over all taskPayloads up to moduleCount and call optixModuleDestroy() for any
non-null module handle (the created OptixModule objects) before returning
SLANG_FAIL, ensuring created modules are cleaned up when compilation fails (the
success path still transfers modules to PipelineImpl).
---
Nitpick comments:
In `@benchmarks/benchmark-compile/main.cpp`:
- Around line 829-832: Replace the unsafe putenv(const_cast<char*>("...")) calls
with POSIX setenv to avoid casting string literals to mutable char*;
specifically change the putenv usages for "OPTIX_CACHE_MAXSIZE=0" and
"__GL_SHADER_DISK_CACHE=0" to calls like setenv("OPTIX_CACHE_MAXSIZE","0",1) and
setenv("__GL_SHADER_DISK_CACHE","0",1) (or the equivalent wrapper in your
codebase), removing the const_cast and ensuring the overwrite flag is set.
In `@benchmarks/benchmark-compile/README.md`:
- Around line 49-51: Update the README fenced code blocks that currently use
plain ``` (for example the block showing
"build/<config>/benchmark-compile[.exe]") to include a language identifier such
as `text` (i.e., change ``` to ```text) for all flagged blocks (around the
occurrences at lines shown by the linter) so markdownlint MD040 is satisfied.
In `@benchmarks/benchmark-compile/synthetic-modules.h`:
- Around line 16-28: Rename the struct member variables to follow the m_
camelCase convention: in SyntheticModuleDesc change source -> m_source,
entryPointName -> m_entryPointName, stage -> m_stage; in SyntheticModuleParams
change moduleCount -> m_moduleCount, sizeLevel -> m_sizeLevel, seed -> m_seed;
update all references/usages of these symbols (constructors, initializers,
assignments, and access sites) to use the new names and keep existing default
values and types unchanged.
In `@CMakeLists.txt`:
- Around line 964-973: The benchmark target benchmark-compile currently omits
the project warning flag libraries; update the benchmark-compile target to use
the same warning discipline as the test target by linking the slang-rhi-warnings
and slang-rhi-warnings-as-errors targets (via target_link_libraries for
benchmark-compile) so the benchmark builds with the same warning and
warnings-as-errors settings used by the test target.
In `@src/cuda/optix-api-impl.cpp`:
- Around line 669-681: The struct ModuleCompileTask's fields must use the member
naming convention: rename each field to start with m_ in camelCase (e.g.,
deviceContext -> m_deviceContext, moduleOptions -> m_moduleOptions,
pipelineOptions -> m_pipelineOptions, ptxCode -> m_ptxCode, ptxSize ->
m_ptxSize, outModule -> m_outModule, result -> m_result, moduleIndex ->
m_moduleIndex, entryPointName -> m_entryPointName) and update all call sites
that access these symbols (references to ModuleCompileTask::deviceContext,
.moduleOptions, .ptxCode, .moduleIndex, etc.) to the new names to keep
consistency across functions that create and consume ModuleCompileTask
instances.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CMakeLists.txtbenchmarks/benchmark-compile/README.mdbenchmarks/benchmark-compile/main.cppbenchmarks/benchmark-compile/synthetic-modules.cppbenchmarks/benchmark-compile/synthetic-modules.hsrc/cuda/optix-api-impl.cpptests/main.cpp
| std::vector<HitGroupDesc> hitGroups; | ||
| std::vector<std::string> hitGroupNames; // keep strings alive | ||
|
|
||
| for (const auto& mod : modules) | ||
| { | ||
| if (mod.stage == SLANG_STAGE_CLOSEST_HIT) | ||
| { | ||
| std::string name = "hitgroup_" + mod.entryPointName; | ||
| hitGroupNames.push_back(name); | ||
|
|
||
| HitGroupDesc hg = {}; | ||
| hg.hitGroupName = hitGroupNames.back().c_str(); | ||
| hg.closestHitEntryPoint = mod.entryPointName.c_str(); | ||
| hitGroups.push_back(hg); | ||
| } | ||
| } |
There was a problem hiding this comment.
Dangling c_str() pointers: hitGroupNames reallocation invalidates previously stored pointers.
Each hitGroupNames.push_back(name) may reallocate the vector, invalidating all c_str() pointers already stored in hitGroups. With multiple closesthit modules, this causes hitGroups[i].hitGroupName to become a dangling pointer.
Fix by reserving capacity upfront so no reallocation occurs:
🐛 Proposed fix
std::vector<HitGroupDesc> hitGroups;
std::vector<std::string> hitGroupNames; // keep strings alive
+ // Reserve to prevent reallocation (which would invalidate c_str() pointers).
+ hitGroupNames.reserve(modules.size());
+
for (const auto& mod : modules)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::vector<HitGroupDesc> hitGroups; | |
| std::vector<std::string> hitGroupNames; // keep strings alive | |
| for (const auto& mod : modules) | |
| { | |
| if (mod.stage == SLANG_STAGE_CLOSEST_HIT) | |
| { | |
| std::string name = "hitgroup_" + mod.entryPointName; | |
| hitGroupNames.push_back(name); | |
| HitGroupDesc hg = {}; | |
| hg.hitGroupName = hitGroupNames.back().c_str(); | |
| hg.closestHitEntryPoint = mod.entryPointName.c_str(); | |
| hitGroups.push_back(hg); | |
| } | |
| } | |
| std::vector<HitGroupDesc> hitGroups; | |
| std::vector<std::string> hitGroupNames; // keep strings alive | |
| // Reserve to prevent reallocation (which would invalidate c_str() pointers). | |
| hitGroupNames.reserve(modules.size()); | |
| for (const auto& mod : modules) | |
| { | |
| if (mod.stage == SLANG_STAGE_CLOSEST_HIT) | |
| { | |
| std::string name = "hitgroup_" + mod.entryPointName; | |
| hitGroupNames.push_back(name); | |
| HitGroupDesc hg = {}; | |
| hg.hitGroupName = hitGroupNames.back().c_str(); | |
| hg.closestHitEntryPoint = mod.entryPointName.c_str(); | |
| hitGroups.push_back(hg); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/benchmark-compile/main.cpp` around lines 222 - 237, The
hitGroupNames vector's c_str() pointers are stored into HitGroupDesc entries
(hitGroups), but push_back(name) can reallocate and invalidate those pointers;
to fix, compute how many SLANG_STAGE_CLOSEST_HIT modules exist, reserve that
capacity on hitGroupNames (or otherwise ensure stable storage) before the loop,
then create names, push them into hitGroupNames, and only then assign
hg.hitGroupName = hitGroupNames.back().c_str() and push hg into hitGroups so the
pointers remain valid; refer to hitGroupNames, hitGroups, HitGroupDesc, and the
for (const auto& mod : modules) loop to locate where to apply the
reserve/stable-storage change.
| int topLayer = numLayers - 1; | ||
| for (int f = 0; f < functionsPerLayer; ++f) | ||
| { | ||
| ss << " acc = acc + layer" << topLayer << "_func" << f << "_" << index << "_s" << seed << "(result, float3(" | ||
| << (f + 1) << ", " << (f + 2) << ", " << (f + 3) << "), acc);\n"; | ||
| } | ||
|
|
||
| // Also call a few mid-layer functions to prevent dead-code elimination. | ||
| int midLayer = numLayers / 2; | ||
| for (int f = 0; f < functionsPerLayer; ++f) | ||
| { | ||
| ss << " acc = acc + layer" << midLayer << "_func" << f << "_" << index << "_s" << seed | ||
| << "(acc, result, float3(" << (f + 1) << ", 0, 0));\n"; | ||
| } |
There was a problem hiding this comment.
midLayer equals topLayer when numLayers = 2, making the "mid-layer" calls redundant.
With numLayers = 2: topLayer = 1 and midLayer = numLayers / 2 = 1. Both loops (lines 185-189 and 193-197) call the same layer1_* functions, just with different arguments. The comment on line 191 says "call a few mid-layer functions to prevent dead-code elimination," but there's no distinct mid layer here.
This doesn't break anything — it still stresses the compiler — but the duplicated calls may not match the intent. Consider either bumping numLayers to 3+ or using midLayer = 0 to actually exercise the bottom layer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/benchmark-compile/synthetic-modules.cpp` around lines 184 - 197,
The midLayer calculation can equal topLayer when numLayers == 2, causing
duplicate calls; change the midLayer assignment so it picks a distinct layer
when numLayers is small (e.g., if numLayers > 2 keep midLayer = numLayers/2,
otherwise set midLayer = 0) so the second loop calls a different layer than
topLayer; update the midLayer variable used in the loop (reference: topLayer,
midLayer, numLayers, functionsPerLayer) accordingly and keep the rest of the
loop logic unchanged.
| // Check for compilation errors. | ||
| // TODO: destroy successfully created OptixModules on error | ||
| for (size_t i = 0; i < moduleCount; ++i) | ||
| { | ||
| if (isOptixError(taskPayloads[i].result)) | ||
| { | ||
| char errorMsg[512]; | ||
| snprintf( | ||
| errorMsg, | ||
| sizeof(errorMsg), | ||
| "optixModuleCreate failed for module %zu ('%s')", | ||
| taskPayloads[i].moduleIndex, | ||
| taskPayloads[i].entryPointName | ||
| ); | ||
| reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device); | ||
| return SLANG_FAIL; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the exact file and lines mentioned
wc -l src/cuda/optix-api-impl.cppRepository: shader-slang/slang-rhi
Length of output: 98
🏁 Script executed:
# Read the section around lines 731-747 with context
sed -n '700,770p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 3453
🏁 Script executed:
# Search for where optixModules is declared/populated in this file
rg -n "optixModules\s*=" src/cuda/optix-api-impl.cpp -A 3 -B 3Repository: shader-slang/slang-rhi
Length of output: 48
🏁 Script executed:
# Search for optixModules variable declaration/initialization broadly
rg -n "optixModules" src/cuda/optix-api-impl.cpp -B 5 | head -50Repository: shader-slang/slang-rhi
Length of output: 2493
🏁 Script executed:
# Find the function containing these lines
rg -n "^[a-zA-Z_].*\{" src/cuda/optix-api-impl.cpp | grep -B 1 "7[0-3][0-9]:"Repository: shader-slang/slang-rhi
Length of output: 48
🏁 Script executed:
# Get a larger context around line 731-747
sed -n '680,760p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 3723
🏁 Script executed:
# Find the function signature for the code block around line 731
sed -n '600,690p' src/cuda/optix-api-impl.cpp | head -40Repository: shader-slang/slang-rhi
Length of output: 2160
🏁 Script executed:
# Search backwards from line 731 to find function start
rg -n "^[a-zA-Z_].*Result\s+\w+::" src/cuda/optix-api-impl.cpp | tail -5Repository: shader-slang/slang-rhi
Length of output: 48
🏁 Script executed:
# Get a wider context to see function start and end
sed -n '600,650p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 3085
🏁 Script executed:
# Get the end of this function to see if there's cleanup on success path
sed -n '900,1000p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 4961
🏁 Script executed:
# Search for optixModuleDestroy in the entire file
rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cppRepository: shader-slang/slang-rhi
Length of output: 137
🏁 Script executed:
# Check if the function stores optixModules anywhere or passes them to a resource manager
sed -n '900,950p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 2275
🏁 Script executed:
# Check how optixModules is initialized and what OptixModule type is
sed -n '655,680p' src/cuda/optix-api-impl.cpp | cat -nRepository: shader-slang/slang-rhi
Length of output: 1405
🏁 Script executed:
# Look at the PipelineImpl to see if it has cleanup logic
rg -n "class PipelineImpl" src/cuda/optix-api-impl.cpp -A 20Repository: shader-slang/slang-rhi
Length of output: 1013
🏁 Script executed:
# Check where optixModuleDestroy is called to understand the pattern
rg -n "optixModuleDestroy" src/cuda/optix-api-impl.cpp -B 5 -A 2Repository: shader-slang/slang-rhi
Length of output: 458
Clean up created modules when any compilation task fails.
The function returns without destroying successfully created modules on error, leaking device resources. The modules are created as raw OptiX handles and need explicit destruction via optixModuleDestroy() before returning; the std::vector destructor does not call it automatically. On the success path, modules are transferred to PipelineImpl which handles cleanup in its destructor, but the error path bypasses this.
Collect all errors in the loop, then destroy non-null modules before returning:
🧹 Suggested cleanup on failure
- for (size_t i = 0; i < moduleCount; ++i)
- {
- if (isOptixError(taskPayloads[i].result))
- {
- char errorMsg[512];
- snprintf(
- errorMsg,
- sizeof(errorMsg),
- "optixModuleCreate failed for module %zu ('%s')",
- taskPayloads[i].moduleIndex,
- taskPayloads[i].entryPointName
- );
- reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
- return SLANG_FAIL;
- }
- }
+ bool hadError = false;
+ for (size_t i = 0; i < moduleCount; ++i)
+ {
+ if (isOptixError(taskPayloads[i].result))
+ {
+ char errorMsg[512];
+ snprintf(
+ errorMsg,
+ sizeof(errorMsg),
+ "optixModuleCreate failed for module %zu ('%s')",
+ taskPayloads[i].moduleIndex,
+ taskPayloads[i].entryPointName
+ );
+ reportOptixError(taskPayloads[i].result, errorMsg, __FILE__, __LINE__, m_device);
+ hadError = true;
+ }
+ }
+ if (hadError)
+ {
+ for (OptixModule module : optixModules)
+ {
+ if (module)
+ SLANG_OPTIX_ASSERT_ON_FAIL(optixModuleDestroy(module));
+ }
+ return SLANG_FAIL;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cuda/optix-api-impl.cpp` around lines 731 - 747, The loop checking
compilation errors returns immediately on first failure without freeing
already-created OptiX modules, leaking device resources; modify the error path
in the block that iterates moduleCount and inspects taskPayloads (using
isOptixError and reportOptixError) to first iterate over all taskPayloads up to
moduleCount and call optixModuleDestroy() for any non-null module handle (the
created OptixModule objects) before returning SLANG_FAIL, ensuring created
modules are cleaned up when compilation fails (the success path still transfers
modules to PipelineImpl).
Compile Benchmarking Tool
Add a tool to test how well compilation scales and performs. The tool is in the
benchmarks/benchmark-compiledirectory, and contains its own README. The intent is to make it easy to iterate on compilation performance.Here are the current results (without parallel optixModuleCreate calls), taken on a system with an AMD Ryzen 9 7950X processor:
Questions
ITaskPoolinterface, located in thebenchmarks/benchmark-compiledirectory. Would it be worth moving that implementation tosrc/core/task-pool.cppnext to the existing BlockingTaskPool implementation?Parallelize optixModuleCreate
optixModuleCreateis currently called serially in a loop increatePipeline. That function is thread-safe, so an easy, early win is to call it in parallel via the global task pool. Running the benchmark, we can see that this improves the time we spend in driver compilation:Note how driver compile time improves with the number of threads.
Next steps
Summary by CodeRabbit
Release Notes
New Features
Documentation