Speed up HLSL preprocessing and prepared SPIR-V hot paths by AnastaZIuk · Pull Request #1029 · Devsh-Graphics-Programming/Nabla

AnastaZIuk · 2026-03-24T16:24:46Z

Summary

reduce Wave include overhead in the hot HLSL path with an explicit per-session include cache using separate read and write session caches
classify builtin and generated include roots by provenance instead of guessing from include spelling
teach source-built nsc to accept -isystem so toolchain include roots can be registered explicitly in builtins-off flows
keep the prepared-SPIR-V hot-path improvements: a single-entrypoint trim fast path and validation once per unique content hash
thread one IGPUPipelineCache through compute, resolve, ImGui, and fullscreen present in the paired EX31 flow
update the Examples pointer to the paired Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262

Root cause

Three costs were stacking on top of each other.

First, the preprocess part comes from avoidable HLSL include debt in the hot path:

path_tracing/concepts.hlsl on the base branch pulls bxdf/common.hlsl only to synthesize a placeholder interaction for Ray::setInteraction; that edge comes from 4d186db76f
member_test_macros.hlsl on the base branch uses the umbrella boost/preprocessor.hpp even though this header only needs a narrow subset; that comes from 72972a9d6e
the custom Wave include bridge on this path was introduced in 12afd3d42d, which added the custom Boost.Wave context and include-path classes for the HLSL preprocessor; dxc_compile_flags pragma bookkeeping was later layered on in ae4386064cf; later merges, cleanup, depfile plumbing, and backports carried the same path forward but are not the semantic origin of the extra per-include work

Second, the base include-loader path paid redundant work before preprocessing reached DXC. The current disk-backed include body load path in IShaderCompiler.cpp comes from 5ac3b55552 and later loader reshapes like cc37325f28c. Per-lookup content hashing on that path was added in cf9a866623. The hot include bridge also lacked an explicit notion of builtin/generated include roots, which made toolchain headers harder to classify and cache cleanly.

Third, the pre-fast-path trimmer always validated and walked the incoming module before it could know whether the requested entrypoint set already matched the prepared shader. The old flow is visible in ISPIRVEntryPointTrimmer.cpp#L104-L246. That shape comes from cfb4bd1da6 and 9f3f823124.

The fullscreen-present helper was introduced in 2b08a15064. In that shape CFullScreenTriangle.cpp#L120 did not yet thread an external pipeline cache, so compute and present could not populate the same cache blob.

What this changes

cache and reuse include resolution results explicitly per preprocess session through separate read and write session caches
classify builtin and generated roots when they are registered instead of inferring special treatment from include spelling
let nsc accept -isystem and map those roots to system-classified include search paths in source-built flows
keep toolchain and generated headers on the fast path without changing the normal "" versus <> search semantics
trim token bookkeeping in CWaveStringResolver
replace the umbrella Boost include in member_test_macros.hlsl with the narrow Boost headers it actually uses
remove redundant public HLSL includes from hot headers and stop pulling bxdf/common.hlsl into path_tracing/concepts.hlsl
short-circuit ISPIRVEntryPointTrimmer when the incoming module is already a prepared single-entrypoint shader
cache successful validation per unique SPIR-V blob so hot paths keep validation without paying for it again
thread an external pipeline cache through FullScreenTriangle so EX31 can share one cache object across compute and present

Validation

Validation was run on AMD Ryzen 5 5600G with Radeon Graphics (6C/12T).

Exact local source-built nsc Release -P sweeps on the current EX31 scene rules taken from the generated build commands show:

9 heavy scene rules total
min 2.424 s
avg 2.503 s
max 2.632 s

Local source-built nsc preprocess profiles on the current EX31 heavy sphere rule show:

builtins OFF: include_requests=586, include_lookups=316, resolution_cache_skips=270, session_lookup_found=0
builtins ON: include_requests=586, include_lookups=234, resolution_cache_skips=352, session_lookup_found=44

The paired EX31 branch builds and runs in RelWithDebInfo with both builtins modes. Current warm-cache validation on the paired branch is:

builtins OFF: first_render_submit_ms=1533
builtins ON: first_render_submit_ms=1850

Prepared-shader and pipeline-cache validation on the paired EX31 branch is recorded in Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

This reverts commit 94a501f.

devshgraphicsprogramming · 2026-03-25T08:11:58Z

src/nbl/asset/utils/IShaderCompiler.cpp

-    if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName))
-        retVal = std::move(contents);
-    else retVal = std::move(trySearchPaths(lookupName));
+    if (asset::detail::isGloballyResolvedIncludeName(lookupName))
+    {
+        if (auto contents = tryIncludeGenerators(lookupName))
+            retVal = std::move(contents);
+        else if (auto contents = trySearchPaths(lookupName, needHash))
+            retVal = std::move(contents);
+        else retVal = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash);
+    }
+    else
+    {
+        if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash))
+            retVal = std::move(contents);
+        else if (auto contents = tryIncludeGenerators(lookupName))
+            retVal = std::move(contents);
+        else retVal = std::move(trySearchPaths(lookupName, needHash));
+    }


explain the reason for this change

you shouldn't try different include generators, the include generators should only be reachable with #include <> a and not #include ""

Also why should the precedence of a search path and default include loaders change depending on the path ?

src/nbl/asset/utils/includeResolutionCommon.h

include/nbl/asset/utils/IShaderCompiler.h

AnastaZIuk and others added 22 commits March 22, 2026 21:52

Reduce Wave preprocess overhead and update DXC pointer

624184f

Advance DXC to latest unroll-devshFixes

555684d

Restore default include search path for builtin HLSL

03ad12b

Advance DXC to latest unroll-devshFixes

ac0289d

Promote NSC channel ac0289d (#1028)

8e3c301

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update examples_tests to local unroll

fe4a528

Merge remote-tracking branch 'origin/master' into unroll-local-sync

441dcb2

Update EX31 examples pointer

f195565

Wire path tracer pipeline cache

697cfcf

Update path tracer examples pointer

fad9d56

Add SPIR-V trimmer fast path

a0b65da

Update path tracer examples pointer

9515bdd

Update path tracer examples pointer

939de4f

Update path tracer examples pointer

dd5180b

Trim manifest whitespace and update examples pointer

8d3e66d

Clean up shader review leftovers

cba6113

Update path tracer examples pointer

476a5bf

Cache validated SPIR-V hashes

d986945

Update path tracer examples pointer

5ecde9a

Tighten final shader cleanup

1ede3de

Update path tracer examples pointer

758f7c8

Update path tracer examples pointer

8745660

AnastaZIuk mentioned this pull request Mar 24, 2026

Precompile and cache EX31 path tracer variants Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262

Open

Mark generated NSC headers correctly

94a501f

AnastaZIuk changed the title ~~Support EX31 precompiled path tracer fast paths on unroll~~ Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths Mar 24, 2026

AnastaZIuk added 4 commits March 24, 2026 18:32

Update path tracer examples pointer

b1f28c0

Revert "Mark generated NSC headers correctly"

4b444b6

This reverts commit 94a501f.

Update path tracer examples pointer

f4b0aed

Update path tracer examples pointer

01794c5

AnastaZIuk changed the title ~~Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths~~ Speed up HLSL preprocessing and prepared SPIR-V hot paths Mar 24, 2026

AnastaZIuk added 10 commits March 24, 2026 22:11

Update path tracer examples pointer

4a0c2e2

Update path tracer examples pointer

02f04db

Update path tracer examples pointer

3ae2b26

Update path tracer examples pointer

fcae991

Update path tracer examples pointer

c8af81b

Update path tracer examples pointer

3541a9d

Validate SPIR-V once per blob

8723771

Update EX31 examples pointer

52ae40b

Update EX31 examples pointer

6476500

Update EX31 examples pointer

f5f036e

devshgraphicsprogramming reviewed Mar 25, 2026

View reviewed changes

src/nbl/asset/utils/includeResolutionCommon.h Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Mar 25, 2026

View reviewed changes

include/nbl/asset/utils/IShaderCompiler.h Outdated Show resolved Hide resolved

AnastaZIuk added 7 commits March 25, 2026 11:46

Update EX31 examples pointer

e545d37

Address shader compiler review feedback locally

5aa95f1

Checkpoint local EX31 review state

40e1e1e

Add include session cache plumbing

daf1fe3

Update EX31 examples pointer locally

9912390

Update EX31 examples pointer locally

a75f581

Classify toolchain include roots

4aa78fd

devshgraphicsprogramming reviewed Mar 26, 2026

View reviewed changes

include/nbl/asset/utils/IShaderCompiler.h Outdated Show resolved Hide resolved

AnastaZIuk added 2 commits March 26, 2026 11:54

Split include session cache reads and writes

15b80eb

Stop exporting generated keys header as source

cfaac9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029
AnastaZIuk wants to merge 46 commits intomasterfrom
unroll

AnastaZIuk commented Mar 24, 2026 •

edited

Loading

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AnastaZIuk commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

What this changes

Validation

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnastaZIuk commented Mar 24, 2026 •

edited

Loading