Skip to content
This repository was archived by the owner on Dec 25, 2023. It is now read-only.

Commit 31ce7b6

Browse files
Update Readme.
1 parent ae75e19 commit 31ce7b6

File tree

3 files changed

+58
-64
lines changed

3 files changed

+58
-64
lines changed

README.md

Lines changed: 55 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,6 @@ The high-resolution textures in the first "release" package, [hubble-16k.zip](ht
5656

5757
## Keyboard controls
5858

59-
There are a lot of keyboard controls - a function of giving many demos:
60-
6159
* `qwe / asd` : strafe left, forward, strafe right / rotate left, back, rotate right
6260
* `z c` : levitate up and down
6361
* `x` : toggles "up lock". When hovering over the "terrain" object, locking the up direction "feels right" with mouse navigation. Otherwise, it should be turned off.
@@ -79,23 +77,20 @@ For a full list of command line options, pass the command line "?", e.g.
7977

8078
c:> expanse.exe ?
8179

82-
Most of the detailed controls for the system can be find in a *json* file. By default, the application loads [config.json](config/config.json).
80+
Most of the detailed settings for the system can be found in the default configuration file [config.json](config/config.json).
8381

8482
The options in the json have corresponding command lines, e.g.:
8583

8684
json:
8785

8886
"mediaDir" : "media"
8987

90-
command line:
91-
92-
-mediadir c:\myMedia
88+
equivalent command line:
9389

90+
-mediadir media
9491

9592
On nvidia devices using drivers prior to 496.13, it is recommended to add `-config nvidia.json` to the command line, e.g.:
9693

97-
E.g.:
98-
9994
c:\SamplerFeedbackStreaming\x64\Release> demo.bat -config nvidia.json
10095
c:\SamplerFeedbackStreaming\x64\Release> stress.bat -mediadir c:\hubble-16k -config nvidia.json
10196

@@ -142,72 +137,35 @@ The following image shows an exaggerated version of the problem, created by disa
142137
143138
![Streaming Cracks](./readme-images/streaming-cracks.jpg "Streaming Cracks")
144139
145-
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. I believe this could be mitigated by "eroding" the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
140+
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. This could be mitigated by dilating or eroding the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
146141
147142
There are also a few known bugs:
148143
* entering full screen in a multi-gpu system moves the window to a monitor attached to the GPU by design. However, if the window starts on a different monitor, it "disappears" on the first maximization. Hit *escape* then maximize again, and it should work fine.
149144
* full-screen while remote desktop is not borderless.
150145
151-
## How It Works
146+
# How It Works
152147
153-
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation). The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
148+
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation) - in this sample, GPU time is mostly a function of the Sampler Feedback Resolve() operations described below. The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
154149
155150
All the magic can be found in the **TileUpdateManager** library (see the internal file [TileUpdateManager.h](TileUpdateManager/TileUpdateManager.h) - applications should include [SamplerFeedbackStreaming.h](TileUpdateManager/SamplerFeedbackStreaming.h)), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
156151
157152
The technique works as follows:
158153
159-
### 1. Create a Texture to be Streamed
154+
## 1. Create a Texture to be Streamed
160155
161156
The streaming textures are allocated as DX12 [Reserved Resources](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createreservedresource), which behave like [VirtualAlloc](https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc) in C. Each resource takes no physical GPU memory until 64KB regions of the resource are committed in 1 or more GPU [heaps](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createheap). The x/y dimensions of a reserved resource tile is a function of the texture format, such that it fills a 64KB GPU memory page. For example, BC7 textures have 256x256 tiles, while BC1 textures have 512x256 tiles.
162157
163158
In Expanse, each tiled resource corresponds to a single .XeT file on a hard drive (though multiple resources can point to the same file). The file contains dimensions and format, but also information about how to access the tiles within the file.
164159
165-
### 2. Create and Pair a Min-Mip Feedback Map
160+
## 2. Create and Pair a Min-Mip Feedback Map
166161
167-
To use sampler feedback, we create a feedback resource with identical dimensions to record information about which texels were sampled.
162+
To use sampler feedback, we create a feedback resource corresponding to each streaming resource, with identical dimensions to record information about which texels were sampled.
168163
169-
For this streaming usage, we use the min mip feedback feature by [creating the resource](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createcommittedresource2) with the format to DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions through the SamplerFeedbackRegion member of [D3D12_RESOURCE_DESC1](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_resource_desc1).
164+
For this streaming usage, we use the min mip feedback feature by [creating the resource](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createcommittedresource2) with the format DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions of the tiled resource (streaming resource) through the SamplerFeedbackRegion member of [D3D12_RESOURCE_DESC1](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_resource_desc1).
170165
171166
For the feedback to be written by GPU shaders (in this case, pixel shaders) the texture and feedback resources must be paired through a view created with [CreateSamplerFeedbackUnorderedAccessView](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device8-createsamplerfeedbackunorderedaccessview).
172167
173-
### 3. Determine Resident Tiles
174-
175-
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a **residency map**. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also [GetResourceTiling](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-getresourcetiling).
176-
177-
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the [Sample Feedback spec](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) as follows:
178-
179-
```
180-
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
181-
```
182-
183-
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
184-
![Expanse UI showing feedback and residency maps](./readme-images/labels.jpg "Expanse UI showing Min Mip Feedback, Residency Map, and Texture Mips (labels added)")
185-
186-
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
187-
188-
We can see the lookup into the residency map in the pixel shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
189-
190-
```cpp
191-
Texture2D g_streamingTexture : register(t0);
192-
Buffer<uint> g_minmipmap: register(t1);
193-
SamplerState g_sampler : register(s0);
194-
```
195-
196-
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
197-
198-
```cpp
199-
int2 uv = input.tex * g_minmipmapDim;
200-
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
201-
uint mipLevel = g_minmipmap.Load(index);
202-
```
203-
204-
The sampling operation is clamped to the minimum mip resident (mipLevel).
205-
206-
```cpp
207-
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
208-
```
209-
210-
### 4. Draw Objects While Recording Feedback
168+
## 3. Draw Objects While Recording Feedback
211169
212170
For expanse, there is a "normal" non-feedback shader named [terrainPS.hlsl](src/shaders/terrainPS.hlsl) and a "feedback-enabled" version of the same shader, [terrainPS-FB.hlsl](src/shaders/terrainPS-FB.hlsl). The latter simply writes feedback using [WriteSamplerFeedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
213171
@@ -230,6 +188,8 @@ float4 psFB(VS_OUT input) : SV_TARGET0
230188
return ps(input);
231189
}
232190
```
191+
## 4. Process Feedback
192+
Sampler Feedback resources are opaque, and must be *Resolved* before interpretting on the CPU.
233193

234194
Resolving feedback for one resource is inexpensive, but adds up when there are 1000 objects. Expanse has a configurable time limit for the amount of feedback resolved each frame. The "FB" shaders are only used for a subset of resources such that the amount of feedback produced can be resolved within the time limit. The time limit is managed by the application, not by the TileUpdateManager library, by keeping a running average of resolve time as reported by GPU timers.
235195

@@ -243,11 +203,8 @@ You can find the time limit estimation, the eviction optimization, and the reque
243203
1. tell the runtime to collect feedback for this object via TileUpdateManager::QueueFeedback(), which results in clearing and resolving the feedback resource for this resource for this frame
244204
2. use the feedback-enabled pixel shader for this object
245205

246-
### 5. Determine Which Tiles to Load & Evict
247-
248-
Once the draw command is complete, the feedback is ready to read on the CPU - either by copying the feedback to a readback resource, or by resolving directly to a readback resource.
249-
250-
Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
206+
## 5. Determine Which Tiles to Load & Evict
207+
The resolved Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
251208

252209
Data structures for tracking reference count, residency state, and heap usage can be found in [StreamingResource.cpp](TileUpdateManager/StreamingResource.cpp) and [StreamingResource.h](TileUpdateManager/StreamingResource.h), look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap), which execute concurrently in separate CPU threads.
253210

@@ -264,13 +221,50 @@ private:
264221
TileMappingState m_tileMappingState;
265222
```
266223

267-
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have 4 mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
224+
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have four mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
268225

269-
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on double or triple buffering of the swap chain. If a tile is needed before the delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
226+
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on swap chain buffer count (i.e. double or triple buffering). If a tile is needed before the eviction delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
270227

271228
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a [FileStreamer](TileUpdateManager/FileStreamer.h) class to do the actual tile loads. The latter implementation ([FileStreamerReference](TileUpdateManager/FileStreamerReference.h)) can easily be exchanged with DirectStorage for Windows.
272229

273-
### 6. Putting it all Together
230+
### 6. Update Residency Map
231+
232+
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a **residency map**. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also [GetResourceTiling](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-getresourcetiling).
233+
234+
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the [Sample Feedback spec](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) as follows:
235+
236+
```
237+
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
238+
```
239+
240+
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
241+
![Expanse UI showing feedback and residency maps](./readme-images/labels.jpg "Expanse UI showing Min Mip Feedback, Residency Map, and Texture Mips (labels added)")
242+
243+
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
244+
245+
We can see the lookup into the residency map in the pixel shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
246+
247+
```cpp
248+
Texture2D g_streamingTexture : register(t0);
249+
Buffer<uint> g_minmipmap: register(t1);
250+
SamplerState g_sampler : register(s0);
251+
```
252+
253+
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
254+
255+
```cpp
256+
int2 uv = input.tex * g_minmipmapDim;
257+
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
258+
uint mipLevel = g_minmipmap.Load(index);
259+
```
260+
261+
The sampling operation is clamped to the minimum mip resident (mipLevel).
262+
263+
```cpp
264+
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
265+
```
266+
267+
## 7. Putting it all Together
274268

275269
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() in [Scene.cpp](src/Scene.cpp) as follows:
276270

TileUpdateManager/TileUpdateManagerExt.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,9 @@ TileUpdateManager::CommandLists TileUpdateManager::EndFrame()
226226

227227
// transition packed mips if necessary
228228
// FIXME? if any 1 needs a transition, go ahead and check all of them. not worth optimizing.
229+
// NOTE: the debug layer will complain about CopyTextureRegion() if the resource state is not state_copy_dest (or common)
230+
// despite the fact the copy queue doesn't really care about resource state
231+
// CopyTiles() won't complain because this library always targets an atlas that is always state_copy_dest
229232
if (m_packedMipTransition)
230233
{
231234
m_packedMipTransition = false;

0 commit comments

Comments
 (0)