Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029
Open
AnastaZIuk wants to merge 46 commits intomasterfrom
Open
Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029AnastaZIuk wants to merge 46 commits intomasterfrom
AnastaZIuk wants to merge 46 commits intomasterfrom
Conversation
Comment on lines
-683
to
+749
| if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName)) | ||
| retVal = std::move(contents); | ||
| else retVal = std::move(trySearchPaths(lookupName)); | ||
| if (asset::detail::isGloballyResolvedIncludeName(lookupName)) | ||
| { | ||
| if (auto contents = tryIncludeGenerators(lookupName)) | ||
| retVal = std::move(contents); | ||
| else if (auto contents = trySearchPaths(lookupName, needHash)) | ||
| retVal = std::move(contents); | ||
| else retVal = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash); | ||
| } | ||
| else | ||
| { | ||
| if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash)) | ||
| retVal = std::move(contents); | ||
| else if (auto contents = tryIncludeGenerators(lookupName)) | ||
| retVal = std::move(contents); | ||
| else retVal = std::move(trySearchPaths(lookupName, needHash)); | ||
| } |
There was a problem hiding this comment.
explain the reason for this change
There was a problem hiding this comment.
you shouldn't try different include generators, the include generators should only be reachable with #include <> a and not #include ""
Also why should the precedence of a search path and default include loaders change depending on the path ?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nscto accept-isystemso toolchain include roots can be registered explicitly in builtins-off flowsIGPUPipelineCachethrough compute, resolve, ImGui, and fullscreen present in the paired EX31 flowRoot cause
Three costs were stacking on top of each other.
First, the preprocess part comes from avoidable HLSL include debt in the hot path:
path_tracing/concepts.hlslon the base branch pullsbxdf/common.hlslonly to synthesize a placeholder interaction forRay::setInteraction; that edge comes from4d186db76fmember_test_macros.hlslon the base branch uses the umbrellaboost/preprocessor.hppeven though this header only needs a narrow subset; that comes from72972a9d6e12afd3d42d, which added the custom Boost.Wave context and include-path classes for the HLSL preprocessor;dxc_compile_flagspragma bookkeeping was later layered on inae4386064cf; later merges, cleanup, depfile plumbing, and backports carried the same path forward but are not the semantic origin of the extra per-include workSecond, the base include-loader path paid redundant work before preprocessing reached DXC. The current disk-backed include body load path in
IShaderCompiler.cppcomes from5ac3b55552and later loader reshapes likecc37325f28c. Per-lookup content hashing on that path was added incf9a866623. The hot include bridge also lacked an explicit notion of builtin/generated include roots, which made toolchain headers harder to classify and cache cleanly.Third, the pre-fast-path trimmer always validated and walked the incoming module before it could know whether the requested entrypoint set already matched the prepared shader. The old flow is visible in
ISPIRVEntryPointTrimmer.cpp#L104-L246. That shape comes fromcfb4bd1da6and9f3f823124.The fullscreen-present helper was introduced in
2b08a15064. In that shapeCFullScreenTriangle.cpp#L120did not yet thread an external pipeline cache, so compute and present could not populate the same cache blob.What this changes
nscaccept-isystemand map those roots to system-classified include search paths in source-built flows""versus<>search semanticsCWaveStringResolvermember_test_macros.hlslwith the narrow Boost headers it actually usesbxdf/common.hlslintopath_tracing/concepts.hlslISPIRVEntryPointTrimmerwhen the incoming module is already a prepared single-entrypoint shaderFullScreenTriangleso EX31 can share one cache object across compute and presentValidation
Validation was run on AMD Ryzen 5 5600G with Radeon Graphics (6C/12T).
Exact local source-built
nscRelease -Psweeps on the current EX31 scene rules taken from the generated build commands show:9heavy scene rules total2.424 s2.503 s2.632 sLocal source-built
nscpreprocess profiles on the current EX31 heavy sphere rule show:OFF:include_requests=586,include_lookups=316,resolution_cache_skips=270,session_lookup_found=0ON:include_requests=586,include_lookups=234,resolution_cache_skips=352,session_lookup_found=44The paired EX31 branch builds and runs in
RelWithDebInfowith both builtins modes. Current warm-cache validation on the paired branch is:OFF:first_render_submit_ms=1533ON:first_render_submit_ms=1850Prepared-shader and pipeline-cache validation on the paired EX31 branch is recorded in Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262.