Skip to content

Commit 5c2013d

Browse files
committed
Bolder captions
1 parent 1891e11 commit 5c2013d

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

_posts/2018-08-10-dota2-macos-performance.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ layout: post
33
title: Portability benchmark of Dota2 on MacOS
44
---
55

6-
### The Race
6+
## The Race
77

88
gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It's a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability [implementation](https://github.com/gfx-rs/portability) based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.
99

@@ -13,7 +13,7 @@ Once Dota2 became functional under our portability implementation, we entered a
1313

1414
![simple test with gfx-portability](/img/dota-simple.jpg)
1515

16-
#### Modes
16+
### Modes
1717

1818
Our portability library has been tested in two different modes of the Metal backend:
1919
- "Immediate" - We record the underlying Metal command buffers "live" as the corresponding Vulkan command buffers are recorded. Doing this allows the user to parallelize the recording as they see fit, allowing them to make sure everything is ready to be submitted.
@@ -23,20 +23,20 @@ Please note that technically "Immediate" recording has less CPU overhead. Additi
2323

2424
Another interesting observation is that 71% of time spent by gfx/immediate (when recording the commands only) is actually spent inside the driver, according to our instrumented profile. This gives us an upper bound for Vulkan Portability overhead on Metal to be about 40%, although that includes some OS and Metal run-time bits as well.
2525

26-
### Results
26+
## Results
2727

2828
| Test/Library | gfx/immediate | gfx/deferred | MoltenVK | OpenGL |
2929
| -------------------------------- | ------------- | ------------ | ------------ | ----------- |
3030
| CPU % of Main thread | 35% | 12% | 21% | ? |
3131
| _platform A_ (Intel, dual-core) | | | |
32-
| fps/variability on low settings | 41.5 / 4.4 | 47.9 / 4.6 | 40.5 / 6.3 | 45.0 / 5.2 |
33-
| fps/variability on high settings | 33.9 / 3.5 | 41.3 / 4.0 | 35.9 / 5.3 | 34.9 / 6.6 |
32+
| fps/variability on low settings | 41.5 / 4.4 | **47.9** / 4.6 | 40.5 / 6.3 | 45.0 / 5.2 |
33+
| fps/variability on high settings | 33.9 / 3.5 | **41.3** / 4.0 | 35.9 / 5.3 | 34.9 / 6.6 |
3434
| _platform B_ (AMD, quad core) | | | |
35-
| fps/variability on low settings | 58.1 / 11.4 | 74.5 / 11.2 | 71.7 / 12.6 | 77.0 / 12.7 |
36-
| fps/variability on high settings | 51.1 / 9.3 | 59.2 / 7.4 | 61.4 / 10.0 | 49.0 / 5.8 |
35+
| fps/variability on low settings | 58.1 / 11.4 | 74.5 / 11.2 | 71.7 / 12.6 | **77.0** / 12.7 |
36+
| fps/variability on high settings | 51.1 / 9.3 | 59.2 / 7.4 | **61.4** / 10.0 | 49.0 / 5.8 |
3737
| _platform C_ (NV, quad core) | | | |
38-
| fps/variability on low settings | 54.3 / 10.0 | 66.0 / 7.7 | 64.0 / 7.8 | 56.7 / 7.0 |
39-
| fps/variability on high settings | 40.6 / 4.4 | 43.1 / 3.8 | 42.1 / 3.9 | 37.6 / 3.2 |
38+
| fps/variability on low settings | 54.3 / 10.0 | **66.0** / 7.7 | 64.0 / 7.8 | 56.7 / 7.0 |
39+
| fps/variability on high settings | 40.6 / 4.4 | **43.1** / 3.8 | 42.1 / 3.9 | 37.6 / 3.2 |
4040

4141
The first metric shows how much time of the main thread is spent inside the portability library, compared to the total execution time (which is the actual time minus all the sleeping). It was measured with Time Profiler on a simple scene (see screenshot). Note that the submission is done on a separate thread by Dota2, so that time isn't taken into account here. Interestingly, the less time we spend on the main thread the faster our frame rate ends up being. Or, in other words, it's a race of who gets to the submission first :)
4242

@@ -49,7 +49,7 @@ make dota-bench-orig # for MoltenVK bundled with Dota2
4949
make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Deferred" recording
5050
```
5151

52-
#### Platforms
52+
### Platforms
5353

5454
![macbook fleet](/img/macbook-fleet.jpg)
5555

@@ -62,7 +62,7 @@ make dota-bench-gfx GFX_METAL_RECORDING=deferred # for gfx-portability with "Def
6262
| OS | macOS 10.14 beta | macOS 10.14 beta | macOS 10.13.6 |
6363
| resolution | 1440 x 900 | 1680 x 1050 | 1440 x 900 |
6464

65-
### Conclusions
65+
## Conclusions
6666

6767
MoltenVK does a good job translating Vulkan to Metal. Interestingly, though, it can be slower than OpenGL on low settings. This doesn't match Phoronix nor Valve/MoltenVK's numbers. We suspect the difference to be explained by Phoronix using a 10x shorter run with pre-loaded pipeline caches. In this case GL would struggle creating all the new pipelines and will not have time to reach the full speed - not exactly an Apples to Apples comparison :)
6868

@@ -72,7 +72,7 @@ Either way, our benchmarks show that OpenGL is still fairly good on MacOS, and i
7272

7373
We believe that "Immediate" command recording has great potential that hasn't yet been realized with Dota2 as it is architectured today with it's one big submission per frame, which increases latency on the already latency-limited program. Hopefully, we'll see more applications taking advantage of this in the future.
7474

75-
#### Rust
75+
### Rust
7676

7777
Rust has proven itself viable in complex high-performance systems. We were able to build solid abstractions and hide the complexity behind tiny interfaces, while still being able to reason about and optimize the low-level performance. Iterating on large architectural changes was a breeze - we would just change one most important piece and then fix all the compile errors.
7878

0 commit comments

Comments
 (0)