Bolder captions

kvark · kvark · commit 5c2013d87f6b · 2018-08-10T11:34:16.000-04:00
diff --git a/_posts/2018-08-10-dota2-macos-performance.md b/_posts/2018-08-10-dota2-macos-performance.md
@@ -3,7 +3,7 @@ layout: post
 title: Portability benchmark of Dota2 on MacOS
 ---
 
-### The Race
+## The Race
 
 gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It's a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability [implementation](https://github.com/gfx-rs/portability) based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.
 
@@ -13,7 +13,7 @@ Once Dota2 became functional under our portability implementation, we entered a
 
 ![simple test with gfx-portability](/img/dota-simple.jpg)
 
-#### Modes
+### Modes
 
 Our portability library has been tested in two different modes of the Metal backend:
   - "Immediate" - We record the underlying Metal command buffers "live" as the corresponding Vulkan command buffers are recorded. Doing this allows the user to parallelize the recording as they see fit, allowing them to make sure everything is ready to be submitted.
@@ -23,20 +23,20 @@ Please note that technically "Immediate" recording has less CPU overhead. Additi
 
 Another interesting observation is that 71% of time spent by gfx/immediate (when recording the commands only) is actually spent inside the driver, according to our instrumented profile. This gives us an upper bound for Vulkan Portability overhead on Metal to be about 40%, although that includes some OS and Metal run-time bits as well.
 
-### Results
+## Results
 
 | Test/Library                     | gfx/immediate | gfx/deferred | MoltenVK     | OpenGL      |
 | -------------------------------- | ------------- | ------------ | ------------ | ----------- |
 | CPU % of Main thread             | 35%           | 12%          | 21%          | ?           |
 | _platform A_ (Intel, dual-core)  | | | |
-| fps/variability on low settings  | 41.5 / 4.4    | 47.9 / 4.6   | 40.5 / 6.3   | 45.0 / 5.2  |
-| fps/variability on high settings | 33.9 / 3.5    | 41.3 / 4.0   | 35.9 / 5.3   | 34.9 / 6.6  |
+| fps/variability on low settings  | 41.5 / 4.4    | **47.9** / 4.6   | 40.5 / 6.3   | 45.0 / 5.2  |
+| fps/variability on high settings | 33.9 / 3.5    | **41.3** / 4.0   | 35.9 / 5.3   | 34.9 / 6.6  |
 | _platform B_ (AMD, quad core)    | | | |
-| fps/variability on low settings  | 58.1 / 11.4   | 74.5 / 11.2  | 71.7 / 12.6  | 77.0 / 12.7 |
-| fps/variability on high settings | 51.1 / 9.3    | 59.2 / 7.4   | 61.4 / 10.0  | 49.0 / 5.8  |
+| fps/variability on low settings  | 58.1 / 11.4   | 74.5 / 11.2  | 71.7 / 12.6  | **77.0** / 12.7 |
+| fps/variability on high settings | 51.1 / 9.3    | 59.2 / 7.4   | **61.4** / 10.0  | 49.0 / 5.8  |
 | _platform C_ (NV, quad core)     | | | |
-| fps/variability on low settings  | 54.3 / 10.0   | 66.0 / 7.7   | 64.0 / 7.8   | 56.7 / 7.0  |
-| fps/variability on high settings | 40.6 / 4.4    | 43.1 / 3.8   | 42.1 / 3.9   | 37.6 / 3.2  |
+| fps/variability on low settings  | 54.3 / 10.0   | **66.0** / 7.7   | 64.0 / 7.8   | 56.7 / 7.0  |
+| fps/variability on high settings | 40.6 / 4.4    | **43.1** / 3.8   | 42.1 / 3.9   | 37.6 / 3.2  |
 
 The first metric shows how much time of the main thread is spent inside the portability library, compared to the total execution time (which is the actual time minus all the sleeping). It was measured with Time Profiler on a simple scene (see screenshot). Note that the submission is done on a separate thread by Dota2, so that time isn't taken into account here. Interestingly, the less time we spend on the main thread the faster our frame rate ends up being. Or, in other words, it's a race of who gets to the submission first :)
 
@@ -49,7 +49,7 @@ make dota-bench-orig # for MoltenVK bundled with Dota2
 make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Deferred" recording
 ```
 
-#### Platforms
+### Platforms
 
 ![macbook fleet](/img/macbook-fleet.jpg)
 
@@ -62,7 +62,7 @@ make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Def
 | OS               | macOS 10.14 beta | macOS 10.14 beta | macOS 10.13.6 |
 | resolution       | 1440 x 900 | 1680 x 1050 | 1440 x 900 |
 
-### Conclusions
+## Conclusions
 
 MoltenVK does a good job translating Vulkan to Metal. Interestingly, though, it can be slower than OpenGL on low settings. This doesn't match Phoronix nor Valve/MoltenVK's numbers. We suspect the difference to be explained by Phoronix using a 10x shorter run with pre-loaded pipeline caches. In this case GL would struggle creating all the new pipelines and will not have time to reach the full speed - not exactly an Apples to Apples comparison :)
 
@@ -72,7 +72,7 @@ Either way, our benchmarks show that OpenGL is still fairly good on MacOS, and i
 
 We believe that "Immediate" command recording has great potential that hasn't yet been realized with Dota2 as it is architectured today with it's one big submission per frame, which increases latency on the already latency-limited program. Hopefully, we'll see more applications taking advantage of this in the future.
 
-#### Rust
+### Rust
 
 Rust has proven itself viable in complex high-performance systems. We were able to build solid abstractions and hide the complexity behind tiny interfaces, while still being able to reason about and optimize the low-level performance. Iterating on large architectural changes was a breeze - we would just change one most important piece and then fix all the compile errors.