You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2018-08-10-dota2-macos-performance.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ layout: post
3
3
title: Portability benchmark of Dota2 on MacOS
4
4
---
5
5
6
-
###The Race
6
+
## The Race
7
7
8
8
gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It's a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability [implementation](https://github.com/gfx-rs/portability) based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.
9
9
@@ -13,7 +13,7 @@ Once Dota2 became functional under our portability implementation, we entered a
13
13
14
14

15
15
16
-
####Modes
16
+
### Modes
17
17
18
18
Our portability library has been tested in two different modes of the Metal backend:
19
19
- "Immediate" - We record the underlying Metal command buffers "live" as the corresponding Vulkan command buffers are recorded. Doing this allows the user to parallelize the recording as they see fit, allowing them to make sure everything is ready to be submitted.
@@ -23,20 +23,20 @@ Please note that technically "Immediate" recording has less CPU overhead. Additi
23
23
24
24
Another interesting observation is that 71% of time spent by gfx/immediate (when recording the commands only) is actually spent inside the driver, according to our instrumented profile. This gives us an upper bound for Vulkan Portability overhead on Metal to be about 40%, although that includes some OS and Metal run-time bits as well.
The first metric shows how much time of the main thread is spent inside the portability library, compared to the total execution time (which is the actual time minus all the sleeping). It was measured with Time Profiler on a simple scene (see screenshot). Note that the submission is done on a separate thread by Dota2, so that time isn't taken into account here. Interestingly, the less time we spend on the main thread the faster our frame rate ends up being. Or, in other words, it's a race of who gets to the submission first :)
42
42
@@ -49,7 +49,7 @@ make dota-bench-orig # for MoltenVK bundled with Dota2
49
49
make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Deferred" recording
50
50
```
51
51
52
-
####Platforms
52
+
### Platforms
53
53
54
54

55
55
@@ -62,7 +62,7 @@ make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Def
| resolution | 1440 x 900 | 1680 x 1050 | 1440 x 900 |
64
64
65
-
###Conclusions
65
+
## Conclusions
66
66
67
67
MoltenVK does a good job translating Vulkan to Metal. Interestingly, though, it can be slower than OpenGL on low settings. This doesn't match Phoronix nor Valve/MoltenVK's numbers. We suspect the difference to be explained by Phoronix using a 10x shorter run with pre-loaded pipeline caches. In this case GL would struggle creating all the new pipelines and will not have time to reach the full speed - not exactly an Apples to Apples comparison :)
68
68
@@ -72,7 +72,7 @@ Either way, our benchmarks show that OpenGL is still fairly good on MacOS, and i
72
72
73
73
We believe that "Immediate" command recording has great potential that hasn't yet been realized with Dota2 as it is architectured today with it's one big submission per frame, which increases latency on the already latency-limited program. Hopefully, we'll see more applications taking advantage of this in the future.
74
74
75
-
####Rust
75
+
### Rust
76
76
77
77
Rust has proven itself viable in complex high-performance systems. We were able to build solid abstractions and hide the complexity behind tiny interfaces, while still being able to reason about and optimize the low-level performance. Iterating on large architectural changes was a breeze - we would just change one most important piece and then fix all the compile errors.
0 commit comments