Skip to content

Commit 1e99b36

Browse files
Update docs. Add process-wide CPU measurements.
We now report three CPU measurement numbers: Full process time, API runner time, and readahead worker time. Process time minus the two other should tell you how much time is spent in any driver worker threads. Process time minus readahead worker time should give you the time spent by the driver on processing API calls, including some overhead by the replayer itself in preparing the calls, but this overhead should be a constant factor.
1 parent 903e1ee commit 1e99b36

File tree

4 files changed

+36
-13
lines changed

4 files changed

+36
-13
lines changed

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,25 @@
11
Introduction
22
============
33

4-
Vulkan tracer designed for multi-threaded replay with a minimum overhead and maximum portability
5-
across different platforms. It is an experimental project that aims to explore Vulkan tracing
6-
options.
4+
API tracer designed for multi-threaded replay with a minimum overhead and maximum portability
5+
across different platforms. It is an experimental project that aims to explore options in API
6+
tracing.
77

88
Features
99
--------
1010

1111
* Fully multi-threaded design. See [Multithread design](doc/Multithreading.md) for more information.
12-
* Focus on performance and generating stable, portable traces, sacrificing precise reproduction.
12+
* Focus on performance and generating stable, portable traces, sacrificing exact reproduction.
1313
* Autogenerates nearly all its code with support for tracing nearly all functions and extensions.
1414
Replay support may however vary.
1515
* Detects many unused features and removes erroneous enablement of them from the trace.
1616
* Blackhole replay where no work is actually submitted to the GPU.
1717
* Noscreen replay where we run any content without creating a window surface or displaying anything.
1818
* Implements the experimental [Common Benchmark Standard](external/tracetooltests/doc/BenchmarkingStandard.md)
19+
* Uses API usage analysis rather than a page guard to detect host-side changes (this was a mistake that
20+
needs to be undone).
21+
22+
Generally faster, uses less CPU resources and produces smaller trace files than gfxreconstruct.
1923

2024
Performance
2125
-----------

TODO.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ General:
55
* More work needed on trace portability
66
* More work on making buffer suballocations faster
77
* Vulkan-SC support, look into on-the-fly conversion to and from normal Vulkan
8-
* Rayquery / raytracing
9-
* Push descriptors
10-
* Inline uniform blocks
11-
* Memory aliasing
8+
* Rayquery / raytracing support
9+
* Push descriptors support
10+
* Inline uniform blocks support
11+
* Memory aliasing support
1212
* Make VkLayer_lavatube.json truthful
13+
* Drop our own packfile format for using zip files instead
1314
* Improved multi-device support
1415
* Store internal Vulkan object metadata by Vulkan device
1516

@@ -24,17 +25,17 @@ Missing Vulkan call implementations:
2425
* vkGetDeviceFaultInfoEXT
2526

2627
Missing and desirable extension support:
28+
* VK_EXT_descriptor_buffer
2729
* VK_EXT_mutable_descriptor_type
2830
* VK_EXT_device_generated_commands
2931
* VK_KHR_pipeline_binary
3032

3133
Replayer:
3234
* Add back Android build
33-
* Checkpoint and fastforward traces
35+
* Trace fastforwarding
3436
* VK_EXT_pipeline_creation_feedback
3537
* Built-in screenshotting support, reading from virtual swapchain
36-
* Blackhole and none WSI generate validation warnings
3738

3839
Tools
3940
* Trace to text tool
40-
* Improve the python code generators
41+
* Improve the python code generators (very ugly code)

src/read.cpp

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,14 @@ void lava_reader::finalize(bool terminate)
125125
runner += runner_local;
126126
worker += worker_local;
127127
}
128-
ILOG("CPU time spent in ms - worker %lu, runner %lu", (long unsigned)worker, (long unsigned)runner);
128+
struct timespec stop_process_cpu_usage;
129+
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop_process_cpu_usage) != 0)
130+
{
131+
ELOG("Failed to get process CPU usage at stop time: %s", strerror(errno));
132+
}
133+
assert(stop_process_cpu_usage.tv_sec >= process_cpu_usage.tv_sec);
134+
const uint64_t process_time = diff_timespec(&stop_process_cpu_usage, &process_cpu_usage);
135+
ILOG("CPU time spent in ms - readhead workers %lu, API runners %lu, full process %lu", (long unsigned)worker, (long unsigned)runner, (long unsigned)process_time);
129136
if (terminate)
130137
{
131138
for (auto& v : *thread_call_numbers) v = 0; // stop waiting threads from progressing
@@ -161,6 +168,12 @@ void lava_reader::init(const std::string& path, int heap_size)
161168
Json::Value trackable = packed_json("tracking.json", mPackedFile);
162169
trackable_read(trackable);
163170

171+
// Set initial value, in case no start frame reached
172+
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_cpu_usage) != 0)
173+
{
174+
ELOG("Failed to initialize process CPU usage: %s", strerror(errno));
175+
}
176+
164177
// Set up buffer device address tracking
165178
if (trackable.isMember("VkBuffer"))
166179
{

src/read.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ class lava_reader
9898
private:
9999
/// Start time of frame range
100100
std::atomic_uint64_t mStartTime{ 0 };
101-
101+
/// Start CPU usage for whole process
102+
struct timespec process_cpu_usage;
102103
lava::mutex global_mutex;
103104
std::string mPackedFile;
104105
std::unordered_map<int, lava_file_reader*> thread_streams GUARDED_BY(global_mutex);
@@ -176,6 +177,10 @@ class lava_file_reader : public file_reader
176177
if (mHaveFirstFrame)
177178
{
178179
ILOG("==== starting frame frange ====");
180+
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &parent->process_cpu_usage) != 0)
181+
{
182+
ELOG("Failed to get process CPU usage: %s", strerror(errno));
183+
}
179184
// Set start time in all threads
180185
parent->global_mutex.lock();
181186
for (unsigned i = 0; i < parent->threads.size(); i++)

0 commit comments

Comments
 (0)