You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 25, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+55-61Lines changed: 55 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,8 +56,6 @@ The high-resolution textures in the first "release" package, [hubble-16k.zip](ht
56
56
57
57
## Keyboard controls
58
58
59
-
There are a lot of keyboard controls - a function of giving many demos:
60
-
61
59
*`qwe / asd` : strafe left, forward, strafe right / rotate left, back, rotate right
62
60
*`z c` : levitate up and down
63
61
*`x` : toggles "up lock". When hovering over the "terrain" object, locking the up direction "feels right" with mouse navigation. Otherwise, it should be turned off.
@@ -79,23 +77,20 @@ For a full list of command line options, pass the command line "?", e.g.
79
77
80
78
c:> expanse.exe ?
81
79
82
-
Most of the detailed controls for the system can be find in a *json* file. By default, the application loads[config.json](config/config.json).
80
+
Most of the detailed settings for the system can be found in the default configuration file[config.json](config/config.json).
83
81
84
82
The options in the json have corresponding command lines, e.g.:
85
83
86
84
json:
87
85
88
86
"mediaDir" : "media"
89
87
90
-
command line:
91
-
92
-
-mediadir c:\myMedia
88
+
equivalent command line:
93
89
90
+
-mediadir media
94
91
95
92
On nvidia devices using drivers prior to 496.13, it is recommended to add `-config nvidia.json` to the command line, e.g.:
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. I believe this could be mitigated by "eroding" the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
140
+
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. This could be mitigated by dilating or eroding the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
146
141
147
142
There are also a few known bugs:
148
143
* entering full screen in a multi-gpu system moves the window to a monitor attached to the GPU by design. However, if the window starts on a different monitor, it "disappears" on the first maximization. Hit *escape* then maximize again, and it should work fine.
149
144
* full-screen while remote desktop is not borderless.
150
145
151
-
## How It Works
146
+
# How It Works
152
147
153
-
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation). The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
148
+
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation) - in this sample, GPU time is mostly a function of the Sampler Feedback Resolve() operations described below. The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
154
149
155
150
All the magic can be found in the **TileUpdateManager** library (see the internal file [TileUpdateManager.h](TileUpdateManager/TileUpdateManager.h) - applications should include [SamplerFeedbackStreaming.h](TileUpdateManager/SamplerFeedbackStreaming.h)), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
156
151
157
152
The technique works as follows:
158
153
159
-
### 1. Create a Texture to be Streamed
154
+
## 1. Create a Texture to be Streamed
160
155
161
156
The streaming textures are allocated as DX12 [Reserved Resources](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createreservedresource), which behave like [VirtualAlloc](https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc) in C. Each resource takes no physical GPU memory until 64KB regions of the resource are committed in 1 or more GPU [heaps](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createheap). The x/y dimensions of a reserved resource tile is a function of the texture format, such that it fills a 64KB GPU memory page. For example, BC7 textures have 256x256 tiles, while BC1 textures have 512x256 tiles.
162
157
163
158
In Expanse, each tiled resource corresponds to a single .XeT file on a hard drive (though multiple resources can point to the same file). The file contains dimensions and format, but also information about how to access the tiles within the file.
164
159
165
-
### 2. Create and Pair a Min-Mip Feedback Map
160
+
## 2. Create and Pair a Min-Mip Feedback Map
166
161
167
-
To use sampler feedback, we create a feedback resource with identical dimensions to record information about which texels were sampled.
162
+
To use sampler feedback, we create a feedback resource corresponding to each streaming resource, with identical dimensions to record information about which texels were sampled.
168
163
169
-
For this streaming usage, we use the min mip feedback feature by [creating the resource](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createcommittedresource2) with the format to DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions through the SamplerFeedbackRegion member of [D3D12_RESOURCE_DESC1](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_resource_desc1).
164
+
For this streaming usage, we use the min mip feedback feature by [creating the resource](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createcommittedresource2) with the format DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions of the tiled resource (streaming resource) through the SamplerFeedbackRegion member of [D3D12_RESOURCE_DESC1](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_resource_desc1).
170
165
171
166
For the feedback to be written by GPU shaders (in this case, pixel shaders) the texture and feedback resources must be paired through a view created with [CreateSamplerFeedbackUnorderedAccessView](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createsamplerfeedbackunorderedaccessview).
172
167
173
-
### 3. Determine Resident Tiles
174
-
175
-
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a **residency map**. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also [GetResourceTiling](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-getresourcetiling).
176
-
177
-
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the [Sample Feedback spec](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) as follows:
178
-
179
-
```
180
-
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
181
-
```
182
-
183
-
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
184
-
")
185
-
186
-
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
187
-
188
-
We can see the lookup into the residency map in the pixel shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
189
-
190
-
```cpp
191
-
Texture2D g_streamingTexture : register(t0);
192
-
Buffer<uint> g_minmipmap: register(t1);
193
-
SamplerState g_sampler : register(s0);
194
-
```
195
-
196
-
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
197
-
198
-
```cpp
199
-
int2 uv = input.tex * g_minmipmapDim;
200
-
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
201
-
uint mipLevel = g_minmipmap.Load(index);
202
-
```
203
-
204
-
The sampling operation is clamped to the minimum mip resident (mipLevel).
205
-
206
-
```cpp
207
-
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
208
-
```
209
-
210
-
### 4. Draw Objects While Recording Feedback
168
+
## 3. Draw Objects While Recording Feedback
211
169
212
170
For expanse, there is a "normal" non-feedback shader named [terrainPS.hlsl](src/shaders/terrainPS.hlsl) and a "feedback-enabled" version of the same shader, [terrainPS-FB.hlsl](src/shaders/terrainPS-FB.hlsl). The latter simply writes feedback using [WriteSamplerFeedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
Sampler Feedback resources are opaque, and must be *Resolved* before interpretting on the CPU.
233
193
234
194
Resolving feedback for one resource is inexpensive, but adds up when there are 1000 objects. Expanse has a configurable time limit for the amount of feedback resolved each frame. The "FB" shaders are only used for a subset of resources such that the amount of feedback produced can be resolved within the time limit. The time limit is managed by the application, not by the TileUpdateManager library, by keeping a running average of resolve time as reported by GPU timers.
235
195
@@ -243,11 +203,8 @@ You can find the time limit estimation, the eviction optimization, and the reque
243
203
1. tell the runtime to collect feedback for this object via TileUpdateManager::QueueFeedback(), which results in clearing and resolving the feedback resource for this resource for this frame
244
204
2. use the feedback-enabled pixel shader for this object
245
205
246
-
### 5. Determine Which Tiles to Load & Evict
247
-
248
-
Once the draw command is complete, the feedback is ready to read on the CPU - either by copying the feedback to a readback resource, or by resolving directly to a readback resource.
249
-
250
-
Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
206
+
## 5. Determine Which Tiles to Load & Evict
207
+
The resolved Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
251
208
252
209
Data structures for tracking reference count, residency state, and heap usage can be found in [StreamingResource.cpp](TileUpdateManager/StreamingResource.cpp) and [StreamingResource.h](TileUpdateManager/StreamingResource.h), look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap), which execute concurrently in separate CPU threads.
253
210
@@ -264,13 +221,50 @@ private:
264
221
TileMappingState m_tileMappingState;
265
222
```
266
223
267
-
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have 4 mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
224
+
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have four mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
268
225
269
-
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on double or triple buffering of the swap chain. If a tile is needed before the delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
226
+
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on swap chain buffer count (i.e. double or triple buffering). If a tile is needed before the eviction delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
270
227
271
228
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a [FileStreamer](TileUpdateManager/FileStreamer.h) class to do the actual tile loads. The latter implementation ([FileStreamerReference](TileUpdateManager/FileStreamerReference.h)) can easily be exchanged with DirectStorage for Windows.
272
229
273
-
### 6. Putting it all Together
230
+
### 6. Update Residency Map
231
+
232
+
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a **residency map**. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also [GetResourceTiling](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-getresourcetiling).
233
+
234
+
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the [Sample Feedback spec](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) as follows:
235
+
236
+
```
237
+
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
238
+
```
239
+
240
+
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
241
+
")
242
+
243
+
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
244
+
245
+
We can see the lookup into the residency map in the pixel shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
246
+
247
+
```cpp
248
+
Texture2D g_streamingTexture : register(t0);
249
+
Buffer<uint> g_minmipmap: register(t1);
250
+
SamplerState g_sampler : register(s0);
251
+
```
252
+
253
+
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
254
+
255
+
```cpp
256
+
int2 uv = input.tex * g_minmipmapDim;
257
+
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
258
+
uint mipLevel = g_minmipmap.Load(index);
259
+
```
260
+
261
+
The sampling operation is clamped to the minimum mip resident (mipLevel).
262
+
263
+
```cpp
264
+
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
265
+
```
266
+
267
+
## 7. Putting it all Together
274
268
275
269
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() in [Scene.cpp](src/Scene.cpp) as follows:
0 commit comments