Skip to content

Commit 6f1df83

Browse files
alyssarosenzweigmarcan
authored andcommitted
Add new post: AAA gaming on M1!
Signed-off-by: Alyssa Rosenzweig <[email protected]>
1 parent 8aaa511 commit 6f1df83

File tree

1 file changed

+170
-0
lines changed

1 file changed

+170
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
+++
2+
date = "2024-10-01T10:00:00-04:00"
3+
draft = false
4+
title = "AAA gaming on Asahi Linux"
5+
slug = "aaa-gaming-on-asahi-linux"
6+
author = "Alyssa Rosenzweig"
7+
+++
8+
9+
Gaming on Linux on M1 is here! We're thrilled to release our Asahi game playing
10+
toolkit, which integrates our Vulkan 1.3 drivers with x86 emulation and Windows
11+
compatibility. Plus a bonus: conformant OpenCL 3.0.
12+
13+
Asahi Linux now ships the only conformant [OpenGL®](https://www.khronos.org/conformance/adopters/conformant-products/opengl#submission_3470),<!--
14+
[OpenGL® ES](https://www.khronos.org/conformance/adopters/conformant-products/opengles#submission_1045),-->
15+
[OpenCL™](https://www.khronos.org/conformance/adopters/conformant-products/opencl#submission_433), and
16+
[Vulkan®](https://www.khronos.org/conformance/adopters/conformant-products#submission_7910)
17+
drivers for this hardware. As for gaming... while today's release is an alpha, [**Control**](https://store.steampowered.com/app/870780/Control_Ultimate_Edition/) runs well!
18+
19+
<figure><a href="/img/blog/2024/10/Control-small.png"><img src="/img/blog/2024/10/Control-small.avif" alt="Control"></a></figure>
20+
21+
## Installation
22+
23+
First, install [Fedora Asahi Remix](https://asahilinux.org/fedora/). Once
24+
installed, get the latest drivers with <code style="white-space:nowrap">dnf
25+
upgrade \-\-refresh && reboot</code>. Then just <code
26+
style="white-space:nowrap">dnf install steam</code> and play. While all M1/M2-series systems work, most games require 16GB of memory due to emulation overhead.
27+
28+
## The stack
29+
30+
Games are typically x86 Windows binaries rendering with DirectX, while our target is Arm Linux
31+
with Vulkan. We need to handle each difference:
32+
33+
* [FEX](https://fex-emu.com/) emulates x86 on Arm.
34+
* [Wine](https://www.winehq.org/) translates Windows to Linux.
35+
* [DXVK](https://github.com/doitsujin/dxvk) and [vkd3d-proton](https://github.com/HansKristian-Work/vkd3d-proton) translate DirectX to Vulkan.
36+
37+
There's one curveball: page size. Operating systems allocate memory in fixed
38+
size "pages". If an application expects smaller pages than the system uses,
39+
they will break due to insufficient alignment of allocations. That's a problem:
40+
x86 expects 4K pages but Apple systems use 16K pages.
41+
42+
While Linux can't mix page sizes between processes, it *can* virtualize another
43+
Arm Linux kernel with a different page size. So we run games inside a tiny
44+
virtual machine using [muvm](https://github.com/AsahiLinux/muvm), passing through
45+
devices like the GPU and game controllers. The
46+
hardware is happy because the system is 16K, the game is happy because the
47+
virtual machine is 4K, and you're happy because you can play [**Fallout
48+
4**](https://store.steampowered.com/app/377160/Fallout_4/).
49+
50+
<figure><a href="/img/blog/2024/10/Fallout4-small.png"><img src="/img/blog/2024/10/Fallout4-small.avif" alt="Fallout 4"></a></figure>
51+
52+
## Vulkan
53+
54+
The final piece is an adult-level Vulkan driver, since translating DirectX requires Vulkan 1.3
55+
with many extensions. Back in April, I wrote
56+
[Honeykrisp](https://rosenzweig.io/blog/vk13-on-the-m1-in-1-month.html), the
57+
only Vulkan 1.3 driver for Apple hardware. I've since added DXVK support. Let's look at some new features.
58+
59+
### Tessellation
60+
61+
Tessellation enables games like [**The Witcher 3**](https://store.steampowered.com/app/292030/The_Witcher_3_Wild_Hunt/) to generate
62+
geometry. The M1 has hardware tessellation, but it is
63+
too limited for DirectX, Vulkan, or OpenGL. We must instead tessellate with arcane compute shaders, as detailed in [today's talk at XDC2024](https://www.youtube.com/live/pDsksRBLXPk).
64+
65+
<figure><a href="/img/blog/2024/10/Witcher3-small.png"><img src="/img/blog/2024/10/Witcher3-small.avif" alt="The Witcher 3"></a></figure>
66+
67+
### Geometry shaders
68+
69+
Geometry shaders are an older, cruder method to generate geometry. Like
70+
tessellation, the M1 lacks geometry shader hardware so we emulate with compute.
71+
Is that fast? No, but geometry shaders are slow [even on desktop
72+
GPUs](http://www.joshbarczak.com/blog/?p=667). They don't need to be fast --
73+
just fast enough for games like
74+
[**Ghostrunner**](https://store.steampowered.com/app/1139900/Ghostrunner/).
75+
76+
<figure><a href="/img/blog/2024/10/Ghostrunner-small.png"><img src="/img/blog/2024/10/Ghostrunner-small.avif" alt="Ghostrunner"></a></figure>
77+
78+
### Enhanced robustness
79+
80+
"Robustness" permits an application's shaders to access buffers out-of-bounds
81+
without crashing the hardware. In OpenGL and Vulkan, out-of-bounds loads may
82+
return arbitrary elements, and out-of-bounds stores may corrupt the buffer.
83+
Our OpenGL driver [exploits this
84+
definition](https://rosenzweig.io/blog/conformant-gl46-on-the-m1.html) for
85+
efficient robustness on the M1.
86+
87+
Some games require stronger guarantees. In DirectX, out-of-bounds loads return zero, and
88+
out-of-bounds stores are ignored. DXVK therefore requires
89+
[`VK_EXT_robustness2`](https://docs.vulkan.org/guide/latest/robustness.html#_vk_ext_robustness2),
90+
a Vulkan extension strengthening robustness.
91+
92+
Like before, we implement robustness with compare-and-select instructions. A
93+
naïve implementation would *compare* a loaded index with the buffer size and
94+
*select* a zero result if out-of-bounds. However, our GPU loads are vector
95+
while arithmetic is scalar. Even if we disabled page faults, we would need up
96+
to four compare-and-selects per load.
97+
98+
```asm
99+
load R, buffer, index * 16
100+
ulesel R[0], index, size, R[0], 0
101+
ulesel R[1], index, size, R[1], 0
102+
ulesel R[2], index, size, R[2], 0
103+
ulesel R[3], index, size, R[3], 0
104+
```
105+
106+
There's a trick: reserve *64 gigabytes* of zeroes using virtual memory voodoo.
107+
Since every 32-bit index multiplied by 16 fits in 64 gigabytes, any index into
108+
this region loads zeroes. For out-of-bounds loads, we simply replace the buffer
109+
address with the reserved address while preserving the index. Replacing a
110+
64-bit address costs just two 32-bit compare-and-selects.
111+
112+
```asm
113+
ulesel buffer.lo, index, size, buffer.lo, RESERVED.lo
114+
ulesel buffer.hi, index, size, buffer.hi, RESERVED.hi
115+
load R, buffer, index * 16
116+
```
117+
118+
Two instructions, not four.
119+
120+
## Next steps
121+
122+
Sparse texturing is next for Honeykrisp, which will unlock more DX12 games. The alpha already runs DX12 games that don't require sparse, like [**Cyberpunk
123+
2077**](https://store.steampowered.com/app/1091500/Cyberpunk_2077/).
124+
125+
<figure><a href="/img/blog/2024/10/Cyberpunk2077-small.png"><img src="/img/blog/2024/10/Cyberpunk2077-small.avif" alt="Cyberpunk 2077"></a></figure>
126+
127+
While many games are playable, newer AAA titles don't hit 60fps *yet*.
128+
Correctness comes first. Performance improves next. Indie games like
129+
[**Hollow Knight**](https://store.steampowered.com/app/367520/Hollow_Knight/) do run full speed.
130+
131+
<figure><a href="/img/blog/2024/10/HollowKnight-small.png"><img src="/img/blog/2024/10/HollowKnight-small.avif" alt="Hollow Knight"></a></figure>
132+
133+
Beyond gaming, we're adding general purpose x86 emulation based on this
134+
stack. For more information, [see the
135+
FAQ](https://docs.fedoraproject.org/en-US/fedora-asahi-remix/x86-support/).
136+
137+
Today's alpha is a taste of what's to come. Not the final form, but
138+
enough to enjoy [**Portal 2**](https://store.steampowered.com/app/620/Portal_2/) while we work towards "1.0".
139+
140+
<figure><a href="/img/blog/2024/10/Portal2-small.png"><img src="/img/blog/2024/10/Portal2-small.avif" alt="Portal 2"></a></figure>
141+
142+
## Acknowledgements
143+
144+
This work has been years in the making with major contributions from...
145+
146+
* [Alyssa Rosenzweig](https://rosenzweig.io)
147+
* [Asahi Lina](https://lina.yt/me)
148+
* [chaos_princess](https://social.treehouse.systems/@chaos_princess)
149+
* [Davide Cavalca](https://github.com/davide125)
150+
* [Dougall Johnson](https://mastodon.social/@dougall)
151+
* [Ella Stanforth](https://ella.gay)
152+
* [Faith Ekstrand](https://www.gfxstrand.net/faith/welcome/)
153+
* [Janne Grunau](https://social.treehouse.systems/@janne)
154+
* [Karol Herbst](https://chaos.social/@karolherbst)
155+
* [marcan](https://social.treehouse.systems/@marcan)
156+
* [Mary Guillemard](https://mary.zone)
157+
* [Neal Gompa](https://neal.gompa.dev/)
158+
* [Sergio López](https://sinrega.org)
159+
* [TellowKrinkle](https://github.com/TellowKrinkle)
160+
* [Teoh Han Hui](https://github.com/teohhanhui)
161+
* [Rob Clark](https://mastodon.gamedev.place/@robclark)
162+
* [Ryan Houdek](https://github.com/sonicadvance1)
163+
164+
... Plus hundreds of developers whose work we build upon, spanning the Linux,
165+
Mesa, Wine, and FEX projects. Today's release is
166+
thanks to the magic of open source.
167+
168+
We hope you enjoy the magic.
169+
170+
Happy gaming.

0 commit comments

Comments
 (0)