Update docs. Add process-wide CPU measurements.

per-mathisen-arm · per-mathisen-arm · commit 1e99b3685fde · 2025-11-15T12:27:10.000+01:00
We now report three CPU measurement numbers: Full process
time, API runner time, and readahead worker time. Process
time minus the two other should tell you how much time is
spent in any driver worker threads. Process time minus
readahead worker time should give you the time spent by
the driver on processing API calls, including some
overhead by the replayer itself in preparing the calls,
but this overhead should be a constant factor.
diff --git a/README.md b/README.md
@@ -1,21 +1,25 @@
 Introduction
 ============
 
-Vulkan tracer designed for multi-threaded replay with a minimum overhead and maximum portability
-across different platforms. It is an experimental project that aims to explore Vulkan tracing
-options.
+API tracer designed for multi-threaded replay with a minimum overhead and maximum portability
+across different platforms. It is an experimental project that aims to explore options in API
+tracing.
 
 Features
 --------
 
 * Fully multi-threaded design. See [Multithread design](doc/Multithreading.md) for more information.
-* Focus on performance and generating stable, portable traces, sacrificing precise reproduction.
+* Focus on performance and generating stable, portable traces, sacrificing exact reproduction.
 * Autogenerates nearly all its code with support for tracing nearly all functions and extensions.
   Replay support may however vary.
 * Detects many unused features and removes erroneous enablement of them from the trace.
 * Blackhole replay where no work is actually submitted to the GPU.
 * Noscreen replay where we run any content without creating a window surface or displaying anything.
 * Implements the experimental [Common Benchmark Standard](external/tracetooltests/doc/BenchmarkingStandard.md)
+* Uses API usage analysis rather than a page guard to detect host-side changes (this was a mistake that
+  needs to be undone).
+
+Generally faster, uses less CPU resources and produces smaller trace files than gfxreconstruct.
 
 Performance
 -----------
diff --git a/TODO.md b/TODO.md
@@ -5,11 +5,12 @@ General:
 * More work needed on trace portability
 * More work on making buffer suballocations faster
 * Vulkan-SC support, look into on-the-fly conversion to and from normal Vulkan
-* Rayquery / raytracing
-* Push descriptors
-* Inline uniform blocks
-* Memory aliasing
+* Rayquery / raytracing support
+* Push descriptors support
+* Inline uniform blocks support
+* Memory aliasing support
 * Make VkLayer_lavatube.json truthful
+* Drop our own packfile format for using zip files instead
 * Improved multi-device support
 	* Store internal Vulkan object metadata by Vulkan device
 
@@ -24,17 +25,17 @@ Missing Vulkan call implementations:
 * vkGetDeviceFaultInfoEXT
 
 Missing and desirable extension support:
+* VK_EXT_descriptor_buffer
 * VK_EXT_mutable_descriptor_type
 * VK_EXT_device_generated_commands
 * VK_KHR_pipeline_binary
 
 Replayer:
 * Add back Android build
-* Checkpoint and fastforward traces
+* Trace fastforwarding
 * VK_EXT_pipeline_creation_feedback
 * Built-in screenshotting support, reading from virtual swapchain
-* Blackhole and none WSI generate validation warnings
 
 Tools
 * Trace to text tool
-* Improve the python code generators
+* Improve the python code generators (very ugly code)
diff --git a/src/read.cpp b/src/read.cpp
@@ -125,7 +125,14 @@ void lava_reader::finalize(bool terminate)
 		runner += runner_local;
 		worker += worker_local;
 	}
-	ILOG("CPU time spent in ms - worker %lu, runner %lu", (long unsigned)worker, (long unsigned)runner);
+	struct timespec stop_process_cpu_usage;
+	if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop_process_cpu_usage) != 0)
+	{
+		ELOG("Failed to get process CPU usage at stop time: %s", strerror(errno));
+	}
+	assert(stop_process_cpu_usage.tv_sec >= process_cpu_usage.tv_sec);
+	const uint64_t process_time = diff_timespec(&stop_process_cpu_usage, &process_cpu_usage);
+	ILOG("CPU time spent in ms - readhead workers %lu, API runners %lu, full process %lu", (long unsigned)worker, (long unsigned)runner, (long unsigned)process_time);
 	if (terminate)
 	{
 		for (auto& v : *thread_call_numbers) v = 0; // stop waiting threads from progressing
@@ -161,6 +168,12 @@ void lava_reader::init(const std::string& path, int heap_size)
 	Json::Value trackable = packed_json("tracking.json", mPackedFile);
 	trackable_read(trackable);
 
+	// Set initial value, in case no start frame reached
+	if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_cpu_usage) != 0)
+	{
+		ELOG("Failed to initialize process CPU usage: %s", strerror(errno));
+	}
+
 	// Set up buffer device address tracking
 	if (trackable.isMember("VkBuffer"))
 	{
diff --git a/src/read.h b/src/read.h
@@ -98,7 +98,8 @@ class lava_reader
 private:
 	/// Start time of frame range
 	std::atomic_uint64_t mStartTime{ 0 };
-
+	/// Start CPU usage for whole process
+	struct timespec process_cpu_usage;
 	lava::mutex global_mutex;
 	std::string mPackedFile;
 	std::unordered_map<int, lava_file_reader*> thread_streams GUARDED_BY(global_mutex);
@@ -176,6 +177,10 @@ class lava_file_reader : public file_reader
 			if (mHaveFirstFrame)
 			{
 				ILOG("==== starting frame frange ====");
+				if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &parent->process_cpu_usage) != 0)
+				{
+					ELOG("Failed to get process CPU usage: %s", strerror(errno));
+				}
 				// Set start time in all threads
 				parent->global_mutex.lock();
 				for (unsigned i = 0; i < parent->threads.size(); i++)