Skip to content
This repository was archived by the owner on Dec 25, 2023. It is now read-only.

Commit ae75e19

Browse files
Refactor introduces simplified SamplerFeedbackStreaming.h library interface.
1 parent 53a9cdc commit ae75e19

27 files changed

+1581
-1187
lines changed

README.md

Lines changed: 46 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
This repository contains an [MIT licensed](LICENSE) demo of _DirectX12 Sampler Feedback Streaming_, a technique using [DirectX12 Sampler Feedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) to guide continuous loading and eviction of small portions (tiles) of assets. Sampler Feedback Streaming allows scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources.
5+
This repository contains an [MIT licensed](LICENSE) demo of _DirectX12 Sampler Feedback Streaming_, a technique using [DirectX12 Sampler Feedback](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) to guide continuous loading and eviction of small portions (tiles) of assets allowing for much higher visual quality than previously possible by making better use of GPU memory capacity. Sampler Feedback Streaming allows scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources.
66

77
The demo requires ***Windows 10 20H1 (aka May 2020 Update, build 19041)*** or later and a GPU with Sampler Feedback Support, such as Intel Iris Xe Graphics as found in 11th Generation Intel® Core™ processors and discrete GPUs (driver version **[30.0.100.9667](https://downloadcenter.intel.com/product/80939/Graphics) or later**).
88

@@ -37,6 +37,10 @@ Or cd to the build directory (x64/Release or x64/Debug) and run from the command
3737

3838
c:\SamplerFeedbackStreaming\x64\Release> expanse.exe
3939

40+
On nvidia drivers **prior to 496.13**, it is recommended to add `-config nvidia.json` to the command line. See the below description of json files and configurations.
41+
42+
c:\SamplerFeedbackStreaming\x64\Release> expanse.exe -config nvidia.json
43+
4044
By default (no command line options) there will be a single object, "terrain", which allows for exploring sampler feedback streaming. In the top right find 2 windows: on the left is the raw GPU min mip feedback, on the right is the min mip map "residency map" generated by the application. Across the bottom are the mips of the texture, with mip 0 in the bottom left. Left-click drag the terrain to see sampler feedback streaming in action.
4145
![default startup](./readme-images/default-startup.jpg "default startup")
4246

@@ -50,24 +54,6 @@ The high-resolution textures in the first "release" package, [hubble-16k.zip](ht
5054

5155
c:\SamplerFeedbackStreaming\x64\Release> demo-hubble.bat -mediadir c:\hubble-16k
5256

53-
## Configurations
54-
55-
By default, the application loads [config.json](config/config.json).
56-
57-
However, it has been observed that performance decays over time on earlier nvidia devices/drivers (as the tiles in the heap become fragmented relative to resources). Specifically, the CPU time for [UpdateTileMappings](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12commandqueue-updatetilemappings) limits the system throughput.
58-
59-
If you observe this issue, obvious in demo mode or with [stress.bat](scripts/stress.bat), add `-config nvidia.json` to the command line.
60-
61-
E.g.:
62-
63-
c:\SamplerFeedbackStreaming\x64\Release> demo.bat -config nvidia.json
64-
c:\SamplerFeedbackStreaming\x64\Release> stress.bat -mediadir c:\hubble-16k -config nvidia.json
65-
66-
The main differences in [nvidia.json](configs/nvidia.json) distribute textures across 127 heaps each sized 32MB (512 tiles * 64KB per tile), for a total of ~4GB of GPU physical memory. Since this implementation restricts resources to a single heap, it is possible for the small heaps to fill resulting in visual artifacts. However, mapping and unmapping of small heaps appears to be significantly faster on some GPUs.
67-
68-
"heapSizeTiles": 512, // size for each heap. 64KB per tile * 16384 tiles -> 1GB heap
69-
"numHeaps": 127, // number of heaps. objects will be distributed among heaps
70-
7157
## Keyboard controls
7258

7359
There are a lot of keyboard controls - a function of giving many demos:
@@ -87,13 +73,15 @@ There are a lot of keyboard controls - a function of giving many demos:
8773
* `insert` : toggles frustum. This behaves a little wonky.
8874
* `esc` : while windowed, exit. while full-screen, return to windowed mode
8975

90-
## JSON configuration files and command lines
76+
## Configuration files and command lines
9177

9278
For a full list of command line options, pass the command line "?", e.g.
9379

9480
c:> expanse.exe ?
9581

96-
Most of the detailed controls for the system can be find in a *json* file. The options in the json have corresponding command lines, e.g.:
82+
Most of the detailed controls for the system can be find in a *json* file. By default, the application loads [config.json](config/config.json).
83+
84+
The options in the json have corresponding command lines, e.g.:
9785

9886
json:
9987

@@ -103,19 +91,48 @@ command line:
10391

10492
-mediadir c:\myMedia
10593

94+
95+
On nvidia devices using drivers prior to 496.13, it is recommended to add `-config nvidia.json` to the command line, e.g.:
96+
97+
E.g.:
98+
99+
c:\SamplerFeedbackStreaming\x64\Release> demo.bat -config nvidia.json
100+
c:\SamplerFeedbackStreaming\x64\Release> stress.bat -mediadir c:\hubble-16k -config nvidia.json
101+
102+
This config works around an issue where performance decays over time as the tiles in the heap become fragmented relative to resources. Specifically, the CPU time for [UpdateTileMappings](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12commandqueue-updatetilemappings) limits the system throughput. The workaround distribute textures across many small heaps, which can result in artifacts if the small heaps fill.
103+
106104
## Creating Your Own Textures
107105

108106
The executable `DdsToXet.exe` converts BCn DDS textures to the custom XET format. Only BC1 and BC7 textures have been tested. Usage:
109107

110-
c:> ddstoxet.xet -in myfile.dds -out myfile.xet
108+
c:> ddstoxet.exe -in myfile.dds -out myfile.xet
111109

112110
The batch file [convert.bat](scripts/convert.bat) will read all the DDS files in one directory and write XET files to a second directory. The output directory must exist.
113111

114112
c:> convert c:\myDdsFiles c:\myXetFiles
115113

116114
## TileUpdateManager: a library for streaming textures
117115

118-
Within the source, there is a *TileUpdateManager* library that aspires to be stand-alone. The central object, *TileUpdateManager*, allows for the creation of streaming textures and heaps to contain them. These objects handle all the feedback resource creation, readback, processing, and file/IO.
116+
The sample includes a library *TileUpdateManager* with a minimal set of APIs defined in [SamplerFeedbackStreaming.h](TileUpdateManager/SamplerFeedbackStreaming.h). The central object, *TileUpdateManager*, allows for the creation of streaming textures and heaps to contain them. These objects handle all the feedback resource creation, readback, processing, and file/IO.
117+
118+
The application creates a TileUpdateManager and 1 or more heaps in Scene.cpp:
119+
120+
```cpp
121+
m_pTileUpdateManager = std::make_unique<TileUpdateManager>(m_device.Get(), m_commandQueue.Get(), tumDesc);
122+
123+
124+
// create 1 or more heaps to contain our StreamingResources
125+
for (UINT i = 0; i < m_args.m_numHeaps; i++)
126+
{
127+
m_sharedHeaps.push_back(m_pTileUpdateManager->CreateStreamingHeap(m_args.m_streamingHeapSize));
128+
}
129+
```
130+
131+
Each SceneObject creates its own StreamingResource. Note **a StreamingResource can be used by multiple objects**, but this sample was designed to emphasize the ability to manage many resources and so objects are 1:1 with StreamingResources.
132+
133+
```cpp
134+
m_pStreamingResource = std::unique_ptr<StreamingResource>(in_pTileUpdateManager->CreateStreamingResource(in_filename, in_pStreamingHeap));
135+
```
119136
120137
## Known issues
121138
@@ -135,7 +152,7 @@ There are also a few known bugs:
135152
136153
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation). The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
137154
138-
All the magic can be found in the **TileUpdateManager** library (see [TileUpdateManager.h](TileUpdateManager/TileUpdateManager.h)), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
155+
All the magic can be found in the **TileUpdateManager** library (see the internal file [TileUpdateManager.h](TileUpdateManager/TileUpdateManager.h) - applications should include [SamplerFeedbackStreaming.h](TileUpdateManager/SamplerFeedbackStreaming.h)), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
139156
140157
The technique works as follows:
141158
@@ -168,21 +185,24 @@ Below, the Visualization mode was set to "Color = Mip" and labels were added. Ti
168185
169186
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
170187
171-
We can see this in the shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved buffer, the residency resource, and the sampler:
188+
We can see the lookup into the residency map in the pixel shader [terrainPS.hlsl](src/shaders/terrainPS.hlsl). Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
172189
173190
```cpp
174191
Texture2D g_streamingTexture : register(t0);
175192
Buffer<uint> g_minmipmap: register(t1);
176193
SamplerState g_sampler : register(s0);
177194
```
178195

179-
The shader offsets into its region of the residency buffer (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
196+
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
197+
180198
```cpp
181199
int2 uv = input.tex * g_minmipmapDim;
182200
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
183201
uint mipLevel = g_minmipmap.Load(index);
184202
```
203+
185204
The sampling operation is clamped to the minimum mip resident (mipLevel).
205+
186206
```cpp
187207
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
188208
```

TileUpdateManager/DataUploader.cpp

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,15 @@
2626

2727
#include "pch.h"
2828

29-
#include "Interfaces.h"
29+
#include "DataUploader.h"
30+
#include "StreamingResourceDU.h"
3031
#include "FileStreamerReference.h"
32+
#include "StreamingHeap.h"
33+
34+
// per-batch timing disabled
35+
// design change resulted in start vs. complete timestamps on different threads
36+
// batch start, when copying, is actually in a 3rd thread
37+
#define ENABLE_PER_BATCH_TIMING 0
3138

3239
//=============================================================================
3340
// Internal class that uploads texture data into a reserved resource
@@ -202,7 +209,7 @@ void Streaming::DataUploader::FlushCommands()
202209
//-----------------------------------------------------------------------------
203210
// tries to find an available UpdateList, may return null
204211
//-----------------------------------------------------------------------------
205-
Streaming::UpdateList* Streaming::DataUploader::AllocateUpdateList(StreamingResource* in_pStreamingResource)
212+
Streaming::UpdateList* Streaming::DataUploader::AllocateUpdateList(Streaming::StreamingResourceBase* in_pStreamingResource)
206213
{
207214
UpdateList* pUpdateList = nullptr;
208215

@@ -234,7 +241,9 @@ Streaming::UpdateList* Streaming::DataUploader::AllocateUpdateList(StreamingReso
234241
break;
235242
}
236243
}
237-
ASSERT(pUpdateList);
244+
// pUpdateList might be null: more than 1 thread can enter the loop with initial condition of 1 free updatelist
245+
// m_updateListFreeCount > 0 is an optimization, not a guarantee.
246+
// calling functions must handle nullptr returned
238247
}
239248
return pUpdateList;
240249
}
@@ -333,7 +342,7 @@ void Streaming::DataUploader::FenceMonitorThread()
333342

334343
// The UpdateList is complete
335344
// notify all tiles, evictions, and packed mips
336-
345+
#if ENABLE_PER_BATCH_TIMING
337346
auto& timings = m_streamingTimes[m_streamingTimeIndex];
338347
m_streamingTimeIndex = (m_streamingTimeIndex + 1) % m_streamingTimes.size();
339348

@@ -343,7 +352,7 @@ void Streaming::DataUploader::FenceMonitorThread()
343352
timings.m_numTilesUnMapped = updateList.GetNumEvictions();
344353
timings.m_copyTime = 0;
345354
timings.m_numTilesCopied = (UINT)updateList.GetNumStandardUpdates();
346-
355+
#endif
347356
// notify evictions
348357
if (updateList.GetNumEvictions())
349358
{
@@ -355,7 +364,9 @@ void Streaming::DataUploader::FenceMonitorThread()
355364
// notify regular tiles
356365
if (updateList.GetNumStandardUpdates())
357366
{
367+
#if ENABLE_PER_BATCH_TIMING
358368
timings.m_copyTime = updateList.m_copyTime;
369+
#endif
359370
// a gpu copy has completed, so we can update the corresponding timer
360371
//timings.m_gpuTime = m_gpuTimer.MapReadBack(in_updateList.m_streamingTimeIndex);
361372
m_numTotalUploads.fetch_add(updateList.GetNumStandardUpdates(), std::memory_order_relaxed);
@@ -414,10 +425,10 @@ void Streaming::DataUploader::SubmitThread()
414425

415426
// WARNING: UpdateTileMappings performance is an issue on some hardware
416427
// throughput will degrade if UpdateTileMappings isn't ~free
417-
428+
#if ENABLE_PER_BATCH_TIMING
418429
// record initial discovery time
419430
updateList.m_startTime = m_cpuTimer.GetTime();
420-
431+
#endif
421432
// unmap tiles that are being evicted
422433
if (updateList.GetNumEvictions())
423434
{
@@ -447,8 +458,9 @@ void Streaming::DataUploader::SubmitThread()
447458
}
448459

449460
// note: packed tile mapping has previously been submitted, but mapping may not be complete
450-
461+
#if ENABLE_PER_BATCH_TIMING
451462
updateList.m_mappingTime = m_cpuTimer.GetSecondsSince(updateList.m_startTime);
463+
#endif
452464
}
453465
break; // end STATE_SUBMITTED
454466

TileUpdateManager/DataUploader.h

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,6 @@
3434

3535
#include "TileUpdateManager.h"
3636

37-
class StreamingResource;
38-
3937
//==================================================
4038
// UploadBuffer keeps an upload buffer per swapchain backbuffer
4139
// and tracks occupancy of current buffer
@@ -57,15 +55,15 @@ namespace Streaming
5755
);
5856
~DataUploader();
5957

60-
FileStreamer::FileHandle* OpenFile(const std::wstring& in_path) const { return m_pFileStreamer->OpenFile(in_path); }
58+
FileHandle* OpenFile(const std::wstring& in_path) const { return m_pFileStreamer->OpenFile(in_path); }
6159

6260
// wait for all outstanding commands to complete.
6361
void FlushCommands();
6462

6563
ID3D12CommandQueue* GetMappingQueue() const { return m_mappingCommandQueue.Get(); }
6664

6765
// may return null. called by StreamingResource.
68-
UpdateList* AllocateUpdateList(StreamingResource* in_pStreamingResource);
66+
UpdateList* AllocateUpdateList(StreamingResourceBase* in_pStreamingResource);
6967

7068
void SubmitUpdateList(Streaming::UpdateList& in_updateList);
7169

@@ -83,7 +81,7 @@ namespace Streaming
8381
//----------------------------------
8482
// statistics and visualization
8583
//----------------------------------
86-
const TileUpdateManager::BatchTimes& GetStreamingTimes() const { return m_streamingTimes; }
84+
const BatchTimes& GetStreamingTimes() const { return m_streamingTimes; }
8785

8886
float GetGpuStreamingTime() const { return m_gpuTimer.GetTimes()[0].first; }
8987

@@ -118,7 +116,7 @@ namespace Streaming
118116
UINT m_updateListAllocIndex{ 0 };
119117

120118
UINT m_streamingTimeIndex{ 0 }; // index into cpu or gpu streaming history arrays
121-
TileUpdateManager::BatchTimes m_streamingTimes;
119+
BatchTimes m_streamingTimes;
122120

123121
// object that performs UpdateTileMappings() requests
124122
Streaming::MappingUpdater m_mappingUpdater;

TileUpdateManager/FileStreamer.h

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,20 +32,19 @@ namespace Streaming
3232
{
3333
struct UpdateList;
3434

35+
// file handle internals different between reference and DS FileStreamers
36+
class FileHandle
37+
{
38+
public:
39+
virtual ~FileHandle() {}
40+
};
41+
3542
class FileStreamer
3643
{
3744
public:
3845
FileStreamer(ID3D12Device* in_pDevice);
3946
virtual ~FileStreamer() {}
4047

41-
// reference implementation returns a standard file handle
42-
// DS implementation may return an IDStorageFile*
43-
class FileHandle
44-
{
45-
public:
46-
virtual ~FileHandle() {}
47-
};
48-
4948
virtual FileHandle* OpenFile(const std::wstring& in_path) = 0;
5049

5150
virtual void StreamTexture(Streaming::UpdateList& in_updateList) = 0;

TileUpdateManager/FileStreamerReference.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,9 @@
2929
#include "FileStreamerReference.h"
3030
#include "UpdateList.h"
3131
#include "XeTexture.h"
32-
#include "Interfaces.h"
32+
#include "StreamingResourceDU.h"
3333
#include "DXSampleHelper.h"
34+
#include "StreamingHeap.h"
3435

3536
//-----------------------------------------------------------------------------
3637
// Constructor
@@ -141,7 +142,7 @@ bool Streaming::FileStreamerReference::CopyBatch::GetReadsComplete()
141142
//-----------------------------------------------------------------------------
142143
// opening a file returns an opaque file handle
143144
//-----------------------------------------------------------------------------
144-
Streaming::FileStreamer::FileHandle* Streaming::FileStreamerReference::OpenFile(const std::wstring& in_path)
145+
Streaming::FileHandle* Streaming::FileStreamerReference::OpenFile(const std::wstring& in_path)
145146
{
146147
// open the file
147148
HANDLE fileHandle = CreateFile(in_path.c_str(), GENERIC_READ,

0 commit comments

Comments
 (0)