You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 25, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+56-57Lines changed: 56 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,91 +1,73 @@
1
1
# Sampler Feedback Streaming
2
2
3
-
This repository contains a demo of `DirectX12 Sampler Feedback Streaming`, a technique using [DirectX12 Sampler Feedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) to guide continuous loading and eviction of small portions (tiles) of assets. Sampler Feedback Streaming allows scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources.
3
+
## Introduction
4
4
5
-
The demo requires **`Windows 10 20H1 (aka May 2020 Update, build 19041)`** or later and a GPU with Sampler Feedback Support.
5
+
This repository contains an [MIT licensed](LICENSE) demo of _DirectX12 Sampler Feedback Streaming_, a technique using [DirectX12 Sampler Feedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) to guide continuous loading and eviction of small portions (tiles) of assets. Sampler Feedback Streaming allows scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources.
6
+
7
+
The demo requires ***Windows 10 20H1 (aka May 2020 Update, build 19041)*** or later and a GPU with Sampler Feedback Support, such as Intel Iris Xe Graphics as found in 11th Generation Intel® Core™ processors and discrete GPUs (driver version **[30.0.100.9667](https://downloadcenter.intel.com/product/80939/Graphics) or later**).
8
+
9
+
This repository will be updated when DirectStorage for Windows® becomes available.
6
10
7
11
See also:
8
12
9
13
*[GDC 2021 video](https://software.intel.com/content/www/us/en/develop/events/gdc.html?videoid=6264595860001)[(alternate link)](https://www.youtube.com/watch?v=VDDbrfZucpQ) which provides an overview of Sampler Feedback and discusses this sample starting at about 15:30.
Sampler Feedback is supported in hardware on Intel Iris Xe Graphics, as can be found in 11th Generation Intel® Core™ processors and discrete GPUs. This sample requires driver version ***[30.0.100.9667](https://downloadcenter.intel.com/product/80939/Graphics) or later***.
15
+
*[GDC 2021 presentation](https://software.intel.com/content/dam/develop/external/us/en/documents/pdf/july-gdc-2021-sampler-feedback-texture-space-shading-direct-storage.pdf) in PDF form
Textures derived from [Hubble Images](https://www.nasa.gov/mission_pages/hubble/multimedia/index.html), see the [Hubble Copyright](https://hubblesite.org/copyright)
17
19
18
-
## License
19
-
20
-
Copyright 2021 Intel Corporation
21
-
22
-
Permission is hereby granted, free of charge, to any person obtaining a copy of
23
-
this software and associated documentation files (the "Software"), to deal in
24
-
the Software without restriction, including without limitation the rights to
of the Software, and to permit persons to whom the Software is furnished to do
27
-
so, subject to the following conditions:
28
20
29
-
The above copyright notice and this permission notice shall be included in all
30
-
copies or substantial portions of the Software.
21
+
Note the textures shown above, which total over 13GB, are not part of the repo. A few 16k x 16k textures are available as a [release](https://github.com/GameTechDev/SamplerFeedbackStreaming/releases/tag/1)
31
22
32
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
33
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
34
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
35
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
36
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
37
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
38
-
SOFTWARE.
39
-
40
-
## Requirements
41
-
42
-
The demo requires **`Windows 10 20H1 (aka May 2020 Update, build 19041)`** or later and a GPU with Sampler Feedback Support.
43
-
44
-
Intel Iris Xe Graphics, as can be found in 11th Generation Intel® Core™ processors and future discrete GPUs, requires BETA driver [30.0.100.9667](https://downloadcenter.intel.com/product/80939/Graphics) or later.
45
-
46
-
Note this repository does not contain the textures shown above, which total over 13GB. A link to these textures will hopefully be provided soon. Test textures are provided, as is a mechanism to convert from BCx format DDS files into the custom .XET format. UPDATE: The first "release" package (labeled "1") contains a .zip with a few hi-res textures.
47
-
48
-
This repository will be updated when DirectStorage for Windows® becomes available.
23
+
Test textures are provided, as is a mechanism to convert from BCx format DDS files into the custom .XET format.
49
24
50
25
## Build Instructions
51
26
52
-
Download the source. Build the solution file with Visual Studio 2019.
53
-
54
-
## Running
27
+
Download the source. Build the solution file [SamplerFeedbackStreaming.sln](SamplerFeedbackStreaming.sln) (tested with Visual Studio 2019).
55
28
56
-
All executablesand .bat files will be found in the x64/Release or x64/Debug directories.
29
+
All executables, scripts, configurations, and media files will be found in the x64/Release or x64/Debug directories.
57
30
58
31
To run within Visual Studio, change the working directory to $(TargetDir) under Properties/Debugging:
59
32
60
33
")
61
34
62
-
By default (no command line options) the application starts looking at a single object, "terrain", which allows for exploring sampler feedback streaming. In the top right find 2 windows: on the left is the raw GPU min mip feedback, on the right is the min mip map generated by the application. Across the bottom are the mips of the texture, with mip 0 in the bottom left. Left-click drag the terrain to see sampler feedback streaming in action.
35
+
Or cd to the build directory (x64/Release or x64/Debug) and run from the command line:
By default (no command line options) there will be a single object, "terrain", which allows for exploring sampler feedback streaming. In the top right find 2 windows: on the left is the raw GPU min mip feedback, on the right is the min mip map "residency map" generated by the application. Across the bottom are the mips of the texture, with mip 0 in the bottom left. Left-click drag the terrain to see sampler feedback streaming in action.
The textures in the first "release" package, hubble-16k.zip, work with "demo-hubble.bat". Make sure the mediadir in the batch file is set properly, or override it on the command line as follows:
49
+
The high-resolution textures in the first "release" package, [hubble-16k.zip](https://github.com/GameTechDev/SamplerFeedbackStreaming/releases/tag/1), work with "demo-hubble.bat", including a sky and earth. Make sure the mediadir in the batch file is set properly, or override it on the command line as follows:
By default, the application loads [config.json](config/config.json).
81
56
82
57
However, it has been observed that performance decays over time on earlier nvidia devices/drivers (as the tiles in the heap become fragmented relative to resources). Specifically, the CPU time for [UpdateTileMappings](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12commandqueue-updatetilemappings) limits the system throughput.
83
58
84
-
If you observe this issue (most obvious with stress.bat using large textures), run the included batch files with the addition of `-config nvidia.json`, which distributes resources across many small heaps. E.g.:
59
+
If you observe this issue, obvious in demo mode or with [stress.bat](scripts/stress.bat), add `-config nvidia.json` to the command line.
The main differences in [nvidia.json](configs/nvidia.json) distribute textures across 127 heaps each sized 32MB (512 tiles * 64KB per tile), for a total of ~4GB of GPU physical memory. Since this implementation restricts resources to a single heap, it is possible for the small heaps to fill resulting in visual artifacts. However, mapping and unmapping of small heaps appears to be significantly faster on some GPUs.
67
+
68
+
"heapSizeTiles": 512, // size for each heap. 64KB per tile * 16384 tiles -> 1GB heap
69
+
"numHeaps": 127, // number of heaps. objects will be distributed among heaps
70
+
89
71
## Keyboard controls
90
72
91
73
There are a lot of keyboard controls - a function of giving many demos:
@@ -107,7 +89,9 @@ There are a lot of keyboard controls - a function of giving many demos:
107
89
108
90
## JSON configuration files and command lines
109
91
110
-
For a full list of command line options, pass the command line "?"
92
+
For a full list of command line options, pass the command line "?", e.g.
93
+
94
+
c:> expanse.exe ?
111
95
112
96
Most of the detailed controls for the system can be find in a *json* file. The options in the json have corresponding command lines, e.g.:
113
97
@@ -125,7 +109,7 @@ The executable `DdsToXet.exe` converts BCn DDS textures to the custom XET format
125
109
126
110
c:> ddstoxet.xet -in myfile.dds -out myfile.xet
127
111
128
-
The batch file `convert.bat` will read all the DDS files in one directory and write XET files to a second directory. The output directory must exist.
112
+
The batch file [convert.bat](scripts/convert.bat) will read all the DDS files in one directory and write XET files to a second directory. The output directory must exist.
129
113
130
114
c:> convert c:\myDdsFiles c:\myXetFiles
131
115
@@ -151,7 +135,7 @@ There are also a few known bugs:
151
135
152
136
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation). The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
153
137
154
-
All the magic can be found in the **TileUpdateManager** library (see TileUpdateManager.h), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
138
+
All the magic can be found in the **TileUpdateManager** library (see [TileUpdateManager.h](TileUpdateManager/TileUpdateManager.h)), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
155
139
156
140
The technique works as follows:
157
141
@@ -184,7 +168,7 @@ Below, the Visualization mode was set to "Color = Mip" and labels were added. Ti
184
168
185
169
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
186
170
187
-
We can see this in the shader "terrainPS.hlsl". Resources are defined at the top of the shader, including the reserved buffer, the residency resource, and the sampler:
171
+
We can see this in the shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved buffer, the residency resource, and the sampler:
188
172
189
173
```cpp
190
174
Texture2D g_streamingTexture : register(t0);
@@ -205,16 +189,22 @@ The sampling operation is clamped to the minimum mip resident (mipLevel).
205
189
206
190
### 4. Draw Objects While Recording Feedback
207
191
208
-
For expanse, there is a "normal" non-feedback shader named terrainPS.hlsl and a "feedback-enabled" version of the same shader, terrainPS-FB.hlsl. The latter simply writes feedback using [WriteSamplerFeedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
192
+
For expanse, there is a "normal" non-feedback shader named [terrainPS.hlsl](src/shaders/terrainPS.hlsl) and a "feedback-enabled" version of the same shader, [terrainPS-FB.hlsl](src/shaders/terrainPS-FB.hlsl). The latter simply writes feedback using [WriteSamplerFeedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
193
+
194
+
To add feedback to an existing shader:
195
+
196
+
1. include the original shader hlsl
197
+
2. add binding for the paired feedback resource
198
+
3. call the WriteSamplerFeedback intrinsic with the resource and sampler defined in the original shader
@@ -225,19 +215,22 @@ Resolving feedback for one resource is inexpensive, but adds up when there are 1
225
215
226
216
As an optimization, Expanse tells streaming resources to evict all tiles if they are behind the camera. This could potentially be improved to include any object not in the view frustum.
227
217
228
-
You can find the time limit estimation, the eviction optimization, and the request to gather sampler feedback by searching Scene.cpp for the following:
218
+
You can find the time limit estimation, the eviction optimization, and the request to gather sampler feedback by searching [Scene.cpp](src/Scene.cpp) for the following:
229
219
230
-
* DetermineMaxNumFeedbackResolves
231
-
* QueueEviction
232
-
* SetFeedbackEnabled
220
+
- **DetermineMaxNumFeedbackResolves** determines how many resources to gather feedback for
221
+
- **QueueEviction** tell runtime to evict tiles for this resource (as soon as possible)
222
+
- **SetFeedbackEnabled** results in 2 actions:
223
+
1. tell the runtime to collect feedback for this object via TileUpdateManager::QueueFeedback(), which results in clearing and resolving the feedback resource for this resource for this frame
224
+
2. use the feedback-enabled pixel shader for this object
233
225
234
226
### 5. Determine Which Tiles to Load & Evict
235
227
236
228
Once the draw command is complete, the feedback is ready to read on the CPU - either by copying the feedback to a readback resource, or by resolving directly to a readback resource.
237
229
238
230
Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
239
231
240
-
Data structures for tracking reference count, residency state, and heap usage can be found in StreamingResource.cpp/h, look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap).
232
+
Data structures for tracking reference count, residency state, and heap usage can be found in [StreamingResource.cpp](TileUpdateManager/StreamingResource.cpp) and [StreamingResource.h](TileUpdateManager/StreamingResource.h), look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap), which execute concurrently in separate CPU threads.
233
+
241
234
```cpp
242
235
class TileMappingState
243
236
{
@@ -255,15 +248,21 @@ Tiles can only be evicted if there are no lower-mip-level tiles that depend on t
255
248
256
249
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on double or triple buffering of the swap chain. If a tile is needed before the delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
257
250
258
-
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a FileStreamer class to do the actual tile loads. The latter implementation (FileStreamerReference) can easily be exchanged with DirectStorage for Windows.
251
+
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a [FileStreamer](TileUpdateManager/FileStreamer.h) class to do the actual tile loads. The latter implementation ([FileStreamerReference](TileUpdateManager/FileStreamerReference.h)) can easily be exchanged with DirectStorage for Windows.
259
252
260
253
### 6. Putting it all Together
261
254
262
-
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() as follows:
255
+
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() in [Scene.cpp](src/Scene.cpp)as follows:
263
256
264
257
```cpp
265
258
auto commandLists = m_pTileUpdateManager->EndFrame();
Sample and its code provided under MIT license, please see [LICENSE](/LICENSE). All third-party source code provided under their own respective and MIT-compatible Open Source licenses.
0 commit comments