Native arm64 dynamic core support for Apple Silicon #2
Replies: 0 comments 6 replies
-
Crossed linked to vogons: https://www.vogons.org/viewtopic.php?p=958225#p958225 |
Beta Was this translation helpful? Give feedback.
-
Based on jmarsh's comment in the vogons thread I moved the write protect toggle and cache invalidation up to just the CreateCacheBlock call, which resulted in a significant performance improvement. JITing Quake, the old approach: JITing Quake, the new approach: I also added a naive mprotect implementation for SELinux, tested under Fedora 33 ARM64 in a VM, and it seems to work fine. I have the experiment at https://github.com/kklobe/dosbox-staging/tree/kklobe/arm64_dynamic. |
Beta Was this translation helpful? Give feedback.
-
my reply reposted from vogons: Hi all, author of the code here, just wanted to respond to some good points jmarsh has raised. TLDR: I'm in favor of a "1.5" Approach that combines a single mmap region with write protect toggling. I started this experiment because I wanted DOSBox Staging dynrec running on my M1 MacBook Pro that I've had since December. I like Approach 1 because of the simplicity, along with the relative portability of a single mmap call with MAP_ANON | MAP_PRIVATE (| MAP_JIT if apple). With regards to the 2016 presentation mentioned, a few things are new in the Apple ecosystem in the last several years:
Apple's recommended solution for security in 2021 seems to include the above mentioned components: enable the Hardened Runtime, enable the JIT Entitlement, and use per-thread write protect toggling to help reduce the attack surface. And, I reasoned, if it's going to toggle, it might as well just have a single mapping, and not have to deal with the fiddly issues pointed out for dual mappings. After looking around at some other projects that do codegen, I found that the toggling approach is common:
So it looks like there's a decent precedent for the toggling approach. It also worked with a quick test on Fedora 33 arm64 with SELinux enabled by using an mprotect in place of the pthread_jit_write_protect_np(). In summary, I came to the same conclusion as the author of the OpenJDK PR: "It's implemented with pthread_jit_write_protect_np provided by Apple... This approach of managing W^X mode turned out to be simple and efficient enough." Thanks for taking the time to discuss. |
Beta Was this translation helpful? Give feedback.
-
Adding a link where this code allows the dynamic core to run with a 64-bit x86-64 installation of Fedora 34, with SELinux in its default "enforcing" state. See: dosbox-staging/dosbox-staging#1010 With these being tested-and-working, having minimal measured performance impact, the lean state of the implementation (ie: the dynamic core is left essentially as-is), and the approach following best-practices per the list above - I can't see anything holding this back from landing in Staging! |
Beta Was this translation helpful? Give feedback.
-
Discussion archived in PR: dosbox-staging/dosbox-staging#1031 |
Beta Was this translation helpful? Give feedback.
-
Archived discussion to https://github.com/dosbox-staging/archived-discussions-for-dosbox-staging |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Last night I threw together a minimum working hack for native arm64 dynamic core on my MacBook Pro M1, with some encouraging results.
Using Future Crew's Unreal (
unreal p8
with GUS 44kHz, surely one of the better moments in DOS history), I was able to raise the audio stuttering threshold from ~100k cycles on normal core to ~400k on the dynamic core.The short list:
cache_addX
write with write protect disable and enable, then flush the written memoryIn practice, this is all in
dyn_cache.h
, and looks like:and
I understand from Discord chats that there's a plan for this work and I am very happy to help out with coding, testing, benchmarking (the fun part), and anything else.
A few references:
Porting Just-In-Time Compilers to Apple Silicon
Attempts to mprotect() with MAP_JIT failing on Apple Silicon as of macOS 11.2
Apple M1 Support for MacOS
Beta Was this translation helpful? Give feedback.
All reactions