MacOS Port of Xenia by wmarti · Pull Request #2332 · xenia-project/xenia

wmarti · 2025-12-21T03:39:29Z

This is a work in progress port of Xenia for MacOS, currently only tested on Apple Silicon, over @Wunkolo’s ARM64 backend #2259. In theory this would also work on iOS devices, but only in regions where JIT compilation is available, and distribution is available outside of the AppStore, like the EU.

The Metal backend translates Xbox 360 shader microcode through multiple stages:

Xbox 360 Microcode (ucode)
    ↓ DxbcShaderTranslator (shared with D3D12)
DXBC (DirectX Bytecode)
    ↓ dxbc_to_dxil_converter (native dxbc2dxil)
DXIL (DirectX Intermediate Language)
    ↓ metal_shader_converter (Apple Metal Shader Converter)
Metal IR
    ↓ MTLDevice newLibraryWithData
MTLLibrary (GPU-executable)

The pipeline leverages:

DxbcShaderTranslator: Existing Xenia infrastructure for microcode → DXBC SM 5.1
dxbc2dxil: DirectXShaderCompiler tool (ported to MacOS for use as a native library) -> DXIL SM 6.0
Metal Shader Converter: Apple's metalirconverter library for DXIL → Metal IR

Maybe eventually I'll go the SPIR-V -> MSL route, but this seemed the easiest for now (even though there's a big performance penalty).

The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time. Not expecting this to get merged anytime soon, but just opening this PR for tracking. The "app" builds but does not run games yet. I've got xenia-gpu-metal-trace-dump reproducing traces captured in D3D12 backend from Gears of War ~mostly correctly. Other Games are WIP, as you can see below.

Gears of War

Halo 3

GTA IV

Additionally fixes some instruction forms to use the more general `STR` instruction with an offset

You wouldn't believe how much time this bug costed me

Guest-function calls will use W17 for indirect calls

Fixes some offset generation as well

Fixes indirect branches

Was picking up `W0` rather than src1

Operand order is wrong.

Writing to the wrong register!

Potential input-register stomping and operand order is seemingly wrong. Passes generated unit tests.

Passes generated unit tests

Accessing the same memory as different types (other than char) using reinterpret_cast or a union is undefined behavior that has already caused issues like xenia-project#1971. Also adds a XE_RESTRICT_VAR definition for declaring non-aliasing pointers in performance-critical areas in the future.

Hopefully prevents some potential xenia-project#1971-like situations. WAIT_REG_MEM's implementation also allowed the compiler to load the value only once, which caused an infinite loop with the other changes in the commit (even in debug builds), so it's now accessed as volatile. Possibly it would be even better to replace it with some (acquire/release?) atomic load/store some day at least for the registers actually seen as participating in those waits. Also fixes the endianness being handled only on the first wait iteration in WAIT_REG_MEM.

Hopefully should reduce the CI failure rate, although this testing approach is fundamentally flawed as it depends on OS scheduling.

Metal's xesl_firstOneBitHigh used 32 - clz(x), which is off by one and returns 32 for zero, unlike HLSL/GLSL firstbithigh/findMSB. Switch to 31 - clz(x) with uint casts for signed inputs to match backend semantics and fix float10 denorm decode in resolve.

wmarti · 2026-01-02T16:25:52Z

There's been quite a few issues to iron out in the Metal backend, and I even caught one Metal specific bug in the ui/shader code that was causing a compression like artifact in some traces! That one took like, a whole day to figure out...

Additionally, there's been quite a few a64 backend tune-ups I've had to re-work for Mac, again... As well as various other things. But anyways, it's coming along nicely and I'm working on trying to get the xenia-app (aka the actual emulator) to boot games at this point. Even have a little extra surprise in the works. Merry Christmas and Happy New Year! あけおめ！

- Replace dxbc2dxil CLI spawns with per-thread IDxbcConverter conversion and expose extra options via XENIA_DXBC2DXIL_FLAGS (default -skip-container-parts) - Link dxilconv/LLVMDxcSupport in Metal targets and add dxilconv include/lib paths - Route shader dumps under dump_shaders/metal_shaders and only emit when enabled

- Track resolve dest_swap for presenter copies and RB swizzle control. - Upload gamma ramps and set 8bpc vs PWL for 10bpc swap formats. - Gate internal swap capture and remove trace dump dummy presenter.

OpenSauce04 · 2026-01-04T13:29:00Z

What's going on with the commit history in this PR?
Why does it add/change ~1.5x the number of lines that are currently in the entire Xenia codebase?

Wunkolo · 2026-01-04T18:04:13Z

What's going on with the commit history in this PR? Why does it add/change ~1.5x the number of lines that are currently in the entire Xenia codebase?

This seems to be an artifact of the vibe coding mentioned from the original post:

The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time.

In addition to the fact that it uses my arm64 jit backend as a base which already involves a lot of changes.

In this PR there is an addition of an AGENTS.md and GEMINI.md and other artifacts that imply that this PR is using a lot of AI-assistance. Merging PR this is also going to be setting a precedent about AI-usage within Xenia moving forward. Not too sure how to feel about this.

It looks like it would take an incredible amount of time to recenter this PR into the minimal amount of edits to deliver this feature in a concise way without any of the superficial changes and merge-conflicts.
Some project files seem like were deleted-and-readded rather than moved with something like git mv to properly preserve file-history. Also there seems to be a whole bunch of new android project files added.

wmarti · 2026-01-05T06:30:57Z

What's going on with the commit history in this PR? Why does it add/change ~1.5x the number of lines that are currently in the entire Xenia codebase?

This seems to be an artifact of the vibe coding mentioned from the original post:

The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time.

In addition to the fact that it uses my arm64 jit backend as a base which already involves a lot of changes.

In this PR there is an addition of an AGENTS.md and GEMINI.md and other artifacts that imply that this PR is using a lot of AI-assistance. Merging PR this is also going to be setting a precedent about AI-usage within Xenia moving forward. Not too sure how to feel about this.

It looks like it would take an incredible amount of time to recenter this PR into the minimal amount of edits to deliver this feature in a concise way without any of the superficial changes and merge-conflicts. Some project files seem like were deleted-and-readded rather than moved with something like git mv to properly preserve file-history. Also there seems to be a whole bunch of new android project files added.

Despite the branch I've included in the PR being named metal-backend-clean-msc it is, well, not totally clean. There's a bunch of garbage accidentally included throughout development (third_party folder, the dxbc2dxil port, scratch data, debug scripts, logs, etc) that would be tricky to fully purge. Re-bases on top of both master and canary are in the works, but for the moment, I'm more focused on getting the emulator functional. That said, there are a non-trivial amount of changes required to the base shared code to get it building on Mac that may be tricky to merge, but the real changes including @Wunkolo's backend and the new Metal GPU backend roughly equate to 47544 insertions(+), 1149 deletions(-) compared to master, give or take a few thousand.

Also, for anyone reading, please keep in mind that this is my first ever pull request. I am not a professional software engineer with 10 or 20 years of experience like the rest of the contributors to Xenia, I am just a student. When I started out on this project to port Xenia to MacOS a year ago, I never thought I'd actually get here. It's taken hundreds of hours or more of work, and the fact that it's even booting at all I think pretty incredible. It's really a testament to what can be accomplished with LLMs and a little determination. Rest assured I'm going to get this thing into a state that's usable, and mergeable into one of the mainstream codebases. 😎

For now, here's some more screenshots. Enjoy!

OpenSauce04 · 2026-01-05T11:26:53Z

I think it'd be in everyone's best interest for a maintainer to share their thoughts here. In its current state, this PR seems unreviewable and unmergable due to the sheer volume of code in the pull request, and because of the aforementioned "vibe coding", the code would absolutely warrant a thorough review. A lot of the code changes also seem unrelated to the PR topic.

I don't know how you could clean this up without just starting over and manually moving over and cleaning up changes which are actually relevant to the pull request.

That's without even considering what Xenia's stance on AI-generated code of this scale would be.

has207 · 2026-01-05T11:36:18Z

I think it'd be in everyone's best interest for a maintainer to share their thoughts here. In its current state, this PR seems unreviewable and unmergable due to the sheer volume of code in the pull request, and because of the aforementioned "vibe coding", the code would absolutely warrant a thorough review. A lot of the code changes also seem unrelated to the PR topic.

I don't know how you could clean this up without just starting over and manually moving over and cleaning up changes which are actually relevant to the pull request.

That's without even considering what Xenia's stance on AI-generated code of this scale would be.

I don't pretend to represent the xenia project, but there's no danger of any code getting merged into master at this point regardless. He said he'll make a better rebase on top of canary, let him work on it, it's not to anyone detriment and only to everyone's benefit if he makes something that works. Criticizing something that's clearly still a work in-progress and not even requested a review is certainly not helpful however.

Triang3l · 2026-01-05T20:37:07Z

I think it'd be in everyone's best interest for a maintainer to share their thoughts here. In its current state, this PR seems unreviewable and unmergable due to the sheer volume of code in the pull request, and because of the aforementioned "vibe coding", the code would absolutely warrant a thorough review. A lot of the code changes also seem unrelated to the PR topic.

I don't know how you could clean this up without just starting over and manually moving over and cleaning up changes which are actually relevant to the pull request.

That's without even considering what Xenia's stance on AI-generated code of this scale would be.

While I can't speak for other developers, my personal stance is that code submissions should be held to the same standards during reviews regardless of the tools used to produce them.

Cats playing musical instruments and boss fights with a huge butthurt dude are just the current state of our timeline, and if machine learning can improve our lives in various ways, why not take advantage of what we have. We, humans, don't even really create anything out of nowhere, rather just transform things that we encounter using other things that we encounter — and that's especially applicable to interoperability tools like emulators — and so does generative machine learning, although currently nowhere as meaningfully as humans.

However, when you're importing large pieces of code from an LLM, it's much easier to let some unhandled edge cases sneak into the code than when you're carefully thinking about every line you're writing, so it's likely that ML-generated code will raise a lot of questions during the review process.

Another issue to consider is copyright, and it's complicated because you don't know the origin of the code you're getting from your AI assistant. In general, code can be separated into 3 categories with their copyright implications:

General host-side code. This includes overall Metal usage, for example. However, if you can formulate the code you need in a natural language, I'm not sure why you'd even need an LLM for it instead of just rewriting your prompt in C++ by yourself. Maybe for some reusable snippets? In my opinion, vibe coding takes even more time because you constantly have to review and integrate the code the LLM produces with passive vigilance, although I've never tried it so I can't be sure.
Third-party host code. This is, I think, the most problematic part of AI assistant usage, as it's difficult to identify more or less unique functionality implementations (as opposed to boilerplate or generic data structures) copied from third-party libraries. But if the assistant ends up copying a substantial piece of code from a library, you need to make sure it's properly licensed, attributed, and that its license is compatible with the BSD license of Xenia (this includes avoiding GPL code). Again, this adds more work that needs to be done by the reviewers. However, this is mostly relevant to areas such as file formats and codecs. In the case of this merge request, however, there is host-side shader translation, but there's no translation code itself added into src, only invocation of third-party library functions, so that's fine.
Guest emulation code. Now this is something you should totally avoid while contributing to Xenia — asking an LLM about the functionality of the internals of the Xbox 360 console or the NT kernel. Xbox consoles from the 2000s are notorious for quite careless handling of the confidentiality of their licensee-only tooling by the parties involved, so it's not unlikely that the training data of a public LLM may include leaked game source code, leaked or shared-source Windows source code, XDK-based homebrew, or even XDK documentation or headers themselves. For recreation of the kernel, or other Microsoft-specific components such as the SMC or the boot loaders, you probably should not use an LLM. On the GPU side, the implications of that are not as strict, since a large fraction of its programming interface is shared with other ATI lineage GPUs (like the Adreno 200, and to a lesser extent the R600, which both have a sizeable amount of public reference information), and many other details can be inferred from researching retail games in different ways, or behavior of other ATI hardware in similar situations, or publications made by Microsoft and game developers. However, if you're introducing some very specific handling for a rare register configuration, you need to make sure your findings can be reproduced by other people working with the Xenia code based on information, software and hardware available to the general public (not the XDK documentation) — such as by providing runnable source code for a hardware test validating your implementation (that doesn't require the XDK to compile and run — such as via LibXenon, or manual XEX generation, or Adreno 200 OpenGL ES or KGSL/DRM where relevant), or by leaving a comment in the code that references a game where the correctness of the implementation is observable and can be clearly verified. The requirements here are exactly the same as for code written without the use of an LLM.

Now, regarding this merge request itself.

After briefly looking at various files in it, it feels to me like the insistence on vibe coding keeps distracting you from something that may accelerate your work much greater than any AI assistant — actual awareness of the architecture.

Many changes here seem to be unnecessary for porting to Mac or to POSIX systems in general. I don't know how exactly they appeared here, but it seems like there may be insufficient understanding of what's needed and what isn't, and possibly some reluctance to explore the code base, maybe due to inflated expectations about the AI's knowledge of the code.

And that's observable not only in the semi-related commits that somehow ended up in this merge request, but in the new code as well:

The whole "buffer cache" concept implemented in the MetalBufferCache is something that has never been used by the current rendering architecture of Xenia — instead, vertex attributes are fetched directly from the 512 MB SharedMemory in shaders, and vertex indices are also read directly from that buffer. This is a leftover from the old, pre-2018 Vulkan and OpenGL renderers that didn't use a continuous shadow of the guest memory on the host GPU. It's also very unusual that the diff for this branch contains deletion of the gl4 renderer — which was abandoned in 2017 and already deleted back in 2018. This, by the way, sort of explains why there are so many commits and changes in this merge request — looks like the branch was created from the wrong master branch commit, and it needs to be rebased somehow.
The MetalGeometryShader class contains DXBC geometry shader building code copied from the Direct3D 12 renderer. However, Metal doesn't have geometry shaders at all. Instead, on Metal, Xenia will have to rely on expansion of point sprites and rectangles in vertex shaders, by having multiple host vertices corresponding to one guest vertex, or by running the guest vertex shader code multiple times in a host invocation — see the HostVertexShaderType::kPointListAsTriangleStrip and HostVertexShaderType::kRectangleListAsTriangleStrip implementation in the SPIR-V shader translator. Alternatively, mesh shaders may be used for this purpose, but the vertex shader fallback will still be required for hardware not supporting them. It's confusing that a whole new file was added for functionality that will never be used — it seems like it wasn't even checked whether it actually would be needed.
The MetalPresenter is in the wrong directory (gpu rather than ui).
There is fflush(stdout); fflush(stderr); after some XELOG calls in the Metal code. A quick look at base/logging.cc would reveal that Xenia already flushes the log output by default — that's toggled via the flush_log configuration variable defined in the beginning of that file.
Lots of scaffolding, lots of logging, various probings and self-tests. It feels like you don't trust the code you're getting at all, are you essentially just making more and more prompts until you suddenly get something working? In many cases, you can more or less see what's wrong by exploring related render targets, textures and buffers in the Metal debugger, or even by logging and looking at related variables on the CPU. The self-test for raster order groups — just why, what does it do? If you need to know if they're supported by the host device, you merely need to check the return value of [MTLDevice areRasterOrderGroupsSupported] (and whether this method exists in the used OS version at all), not to rely on errors or undefined behavior (unless there's some driver bug that you're trying to work around).

The whole idea of a compute shader fallback for programmable blending — how is it even supposed to work? Are you trying to create a full-blown software renderer in compute shaders? I see some EDRAM blend compute shaders, are they even currently used at all, or are they just another bunch of code from the LLM stashed for an unknown purpose for an indeterminate point in the future? If you're trying to, for instance, render a primitive into a separate framebuffer, and then blend that framebuffer with the main one, a lot of problems will need to be solved for it to work even at a more or less acceptable speed — how to detect if overlap actually happens to avoid inserting a barrier between every pair of triangles, how to calculate the bounding rectangles for triangles on the screen to dispatch compute shader grids of a necessary size, etc. A lot of that sounds like having to transform triangles on the CPU to detect overlap… I'm very very confused. The whole fallback isn't really even needed in the first place because in most cases the guest framebuffer pixel formats can be emulated in a reasonable way without ROV/ROG. Xenia already has code for converting depth to 24-bit floating-point in translated pixel shaders (that is disabled by default, but I think should be enabled in the future as its performance cost doesn't seem to be that severe, but it, if I recall correctly, fixes very annoying issues in prominent games like Halo 4), and I'm planning to try promoting the 8_8_8_8_GAMMA format with that piecewise linear precision distribution to rgba16Unorm (mostly similarly to how 2_10_10_10_FLOAT is emulated via rgba16Float) to ensure linear space blending and eliminate the bright squares around decals in Halo 3.

The primary goals that a GPU emulation host backend needs to achieve are:

Completeness.
Correctness.
Reasonably wide support for different host hardware and OS versions.

For the first two, I don't think the current development approach is helpful. From an external point of view, it seems like the idea is not merely to use the AI as an assistant, but rather to just ask for random changes and hoping that they're going to work and pass some "probing" tests. It feels like not only AI is used for code generation, but the whole development process also kind of resembles an AI genetic algorithm. What makes correctness very doubtful is the presence of a lot of code that just has no meaning in the architecture, like the buffer cache from the pre-2018 renderer, and geometry shader generation. It's like the code not only doesn't get properly verified especially in edge cases, but a lot of it has never even been run at all?

And regarding host configuration support, there already are some strange decisions, such as checking the presence of formats like MTLPixelFormatBGR10A2Unorm at compilation time, rather than targeting the header version that contains all of them and checking the actual support at runtime. Though that seems to be done only in probing code, but again that sounds like copying what the LLM produced without actually looking at the code.

All this merge request is really weird in my opinion (and it took me a few hours to write this comment, and I guess how the attitude changed over time is pretty evident). It just looks like a pretty straightforward task (maybe with the exception of integrating vertex shader generation for point sprites and rectangles, as well as adapting the tessellation logic to the more Xbox 360-like interface of Metal tessellation) of replacing Direct3D 12 and Vulkan code with roughly equivalent Metal code, possibly moving some parts to backend-agnostic code along the way, that can be solved quite easily by breaking it down into sort of a dependency tree and simply rewriting each part that's needed at each point in development, instead was massively overcomplicated by wasting a lot of time trying to get something functional from an LLM while being reluctant to spend a couple of weeks to learn the structure of the renderer and the primary challenges of emulating the GPU of the Xbox 360?

mrdc · 2026-01-09T18:08:09Z

I ported Wunkolo’s ARM64 code to canary some time ago and was able to run it on Win 11 ARM. Without llm or vibe coding, so not so many PRs or too many changes. But it was quite rough and I wasn’t able to test video backend - Win 11 Arm on Mac has only DX11, so Xenia wasn’t happy. Need give it another try this year, but no eta.

wmarti · 2026-01-12T14:23:14Z

While I can't speak for other developers, my personal stance is that code submissions should be held to the same standards during reviews regardless of the tools used to produce them.

Cats playing musical instruments and boss fights with a huge butthurt dude are just the current state of our timeline, and if machine learning can improve our lives in various ways, why not take advantage of what we have. We, humans, don't even really create anything out of nowhere, rather just transform things that we encounter using other things that we encounter — and that's especially applicable to interoperability tools like emulators — and so does generative machine learning, although currently nowhere as meaningfully as humans.

However, when you're importing large pieces of code from an LLM, it's much easier to let some unhandled edge cases sneak into the code than when you're carefully thinking about every line you're writing, so it's likely that ML-generated code will raise a lot of questions during the review process.

Another issue to consider is copyright, and it's complicated because you don't know the origin of the code you're getting from your AI assistant. In general, code can be separated into 3 categories with their copyright implications:

General host-side code. This includes overall Metal usage, for example. However, if you can formulate the code you need in a natural language, I'm not sure why you'd even need an LLM for it instead of just rewriting your prompt in C++ by yourself. Maybe for some reusable snippets? In my opinion, vibe coding takes even more time because you constantly have to review and integrate the code the LLM produces with passive vigilance, although I've never tried it so I can't be sure.

Third-party host code. This is, I think, the most problematic part of AI assistant usage, as it's difficult to identify more or less unique functionality implementations (as opposed to boilerplate or generic data structures) copied from third-party libraries. But if the assistant ends up copying a substantial piece of code from a library, you need to make sure it's properly licensed, attributed, and that its license is compatible with the BSD license of Xenia (this includes avoiding GPL code). Again, this adds more work that needs to be done by the reviewers. However, this is mostly relevant to areas such as file formats and codecs. In the case of this merge request, however, there is host-side shader translation, but there's no translation code itself added into src, only invocation of third-party library functions, so that's fine.

Guest emulation code. Now this is something you should totally avoid while contributing to Xenia — asking an LLM about the functionality of the internals of the Xbox 360 console or the NT kernel. Xbox consoles from the 2000s are notorious for quite careless handling of the confidentiality of their licensee-only tooling by the parties involved, so it's not unlikely that the training data of a public LLM may include leaked game source code, leaked or shared-source Windows source code, XDK-based homebrew, or even XDK documentation or headers themselves. For recreation of the kernel, or other Microsoft-specific components such as the SMC or the boot loaders, you probably should not use an LLM. On the GPU side, the implications of that are not as strict, since a large fraction of its programming interface is shared with other ATI lineage GPUs (like the Adreno 200, and to a lesser extent the R600, which both have a sizeable amount of public reference information), and many other details can be inferred from researching retail games in different ways, or behavior of other ATI hardware in similar situations, or publications made by Microsoft and game developers. However, if you're introducing some very specific handling for a rare register configuration, you need to make sure your findings can be reproduced by other people working with the Xenia code based on information, software and hardware available to the general public (not the XDK documentation) — such as by providing runnable source code for a hardware test validating your implementation (that doesn't require the XDK to compile and run — such as via LibXenon, or manual XEX generation, or Adreno 200 OpenGL ES or KGSL/DRM where relevant), or by leaving a comment in the code that references a game where the correctness of the implementation is observable and can be clearly verified. The requirements here are exactly the same as for code written without the use of an LLM.

Now, regarding this merge request itself.

After briefly looking at various files in it, it feels to me like the insistence on vibe coding keeps distracting you from something that may accelerate your work much greater than any AI assistant — actual awareness of the architecture.

Many changes here seem to be unnecessary for porting to Mac or to POSIX systems in general. I don't know how exactly they appeared here, but it seems like there may be insufficient understanding of what's needed and what isn't, and possibly some reluctance to explore the code base, maybe due to inflated expectations about the AI's knowledge of the code.

And that's observable not only in the semi-related commits that somehow ended up in this merge request, but in the new code as well:

The whole "buffer cache" concept implemented in the MetalBufferCache is something that has never been used by the current rendering architecture of Xenia — instead, vertex attributes are fetched directly from the 512 MB SharedMemory in shaders, and vertex indices are also read directly from that buffer. This is a leftover from the old, pre-2018 Vulkan and OpenGL renderers that didn't use a continuous shadow of the guest memory on the host GPU. It's also very unusual that the diff for this branch contains deletion of the gl4 renderer — which was abandoned in 2017 and already deleted back in 2018. This, by the way, sort of explains why there are so many commits and changes in this merge request — looks like the branch was created from the wrong master branch commit, and it needs to be rebased somehow.

The MetalGeometryShader class contains DXBC geometry shader building code copied from the Direct3D 12 renderer. However, Metal doesn't have geometry shaders at all. Instead, on Metal, Xenia will have to rely on expansion of point sprites and rectangles in vertex shaders, by having multiple host vertices corresponding to one guest vertex, or by running the guest vertex shader code multiple times in a host invocation — see the HostVertexShaderType::kPointListAsTriangleStrip and HostVertexShaderType::kRectangleListAsTriangleStrip implementation in the SPIR-V shader translator. Alternatively, mesh shaders may be used for this purpose, but the vertex shader fallback will still be required for hardware not supporting them. It's confusing that a whole new file was added for functionality that will never be used — it seems like it wasn't even checked whether it actually would be needed.

The MetalPresenter is in the wrong directory (gpu rather than ui).

There is fflush(stdout); fflush(stderr); after some XELOG calls in the Metal code. A quick look at base/logging.cc would reveal that Xenia already flushes the log output by default — that's toggled via the flush_log configuration variable defined in the beginning of that file.

Lots of scaffolding, lots of logging, various probings and self-tests. It feels like you don't trust the code you're getting at all, are you essentially just making more and more prompts until you suddenly get something working? In many cases, you can more or less see what's wrong by exploring related render targets, textures and buffers in the Metal debugger, or even by logging and looking at related variables on the CPU. The self-test for raster order groups — just why, what does it do? If you need to know if they're supported by the host device, you merely need to check the return value of [MTLDevice areRasterOrderGroupsSupported] (and whether this method exists in the used OS version at all), not to rely on errors or undefined behavior (unless there's some driver bug that you're trying to work around).

The whole idea of a compute shader fallback for programmable blending — how is it even supposed to work? Are you trying to create a full-blown software renderer in compute shaders? I see some EDRAM blend compute shaders, are they even currently used at all, or are they just another bunch of code from the LLM stashed for an unknown purpose for an indeterminate point in the future? If you're trying to, for instance, render a primitive into a separate framebuffer, and then blend that framebuffer with the main one, a lot of problems will need to be solved for it to work even at a more or less acceptable speed — how to detect if overlap actually happens to avoid inserting a barrier between every pair of triangles, how to calculate the bounding rectangles for triangles on the screen to dispatch compute shader grids of a necessary size, etc. A lot of that sounds like having to transform triangles on the CPU to detect overlap… I'm very very confused. The whole fallback isn't really even needed in the first place because in most cases the guest framebuffer pixel formats can be emulated in a reasonable way without ROV/ROG. Xenia already has code for converting depth to 24-bit floating-point in translated pixel shaders (that is disabled by default, but I think should be enabled in the future as its performance cost doesn't seem to be that severe, but it, if I recall correctly, fixes very annoying issues in prominent games like Halo 4), and I'm planning to try promoting the 8_8_8_8_GAMMA format with that piecewise linear precision distribution to rgba16Unorm (mostly similarly to how 2_10_10_10_FLOAT is emulated via rgba16Float) to ensure linear space blending and eliminate the bright squares around decals in Halo 3.

The primary goals that a GPU emulation host backend needs to achieve are:

Completeness.

Correctness.

Reasonably wide support for different host hardware and OS versions.

For the first two, I don't think the current development approach is helpful. From an external point of view, it seems like the idea is not merely to use the AI as an assistant, but rather to just ask for random changes and hoping that they're going to work and pass some "probing" tests. It feels like not only AI is used for code generation, but the whole development process also kind of resembles an AI genetic algorithm. What makes correctness very doubtful is the presence of a lot of code that just has no meaning in the architecture, like the buffer cache from the pre-2018 renderer, and geometry shader generation. It's like the code not only doesn't get properly verified especially in edge cases, but a lot of it has never even been run at all?

And regarding host configuration support, there already are some strange decisions, such as checking the presence of formats like MTLPixelFormatBGR10A2Unorm at compilation time, rather than targeting the header version that contains all of them and checking the actual support at runtime. Though that seems to be done only in probing code, but again that sounds like copying what the LLM produced without actually looking at the code.

All this merge request is really weird in my opinion (and it took me a few hours to write this comment, and I guess how the attitude changed over time is pretty evident). It just looks like a pretty straightforward task (maybe with the exception of integrating vertex shader generation for point sprites and rectangles, as well as adapting the tessellation logic to the more Xbox 360-like interface of Metal tessellation) of replacing Direct3D 12 and Vulkan code with roughly equivalent Metal code, possibly moving some parts to backend-agnostic code along the way, that can be solved quite easily by breaking it down into sort of a dependency tree and simply rewriting each part that's needed at each point in development, instead was massively overcomplicated by wasting a lot of time trying to get something functional from an LLM while being reluctant to spend a couple of weeks to learn the structure of the renderer and the primary challenges of emulating the GPU of the Xbox 360?

@Triang3l First of all, I would like to thank you for taking the time to write this reply. Having taken just a few hours to do so, you have probably spent more time reviewing my work than anyone else in the last 5 years. I have tremendous respect and admiration for what you have built here, you are clearly a very talented, capable and passionate engineer. Additionally, I think your analysis of the situation is spot on, and I am pretty much in agreement with your perspective regarding the merge request at this point.

I’ve been keeping up with the status of a Mac port of Xenia for a while. When I realized an ARM64 backend had been developed, I was surprised to see that it was not completed to the point where it was merged into master or canary. I also realized that this added one of the key missing pieces required for supporting Apple Silicon based devices in a meaningful way in Xenia (although x86_64 support for Xenia on ARM64 based Macs is now “technically” supported by Rosetta 2 through the introduction of somewhat limited AVX support, I was not aware or ever interested in making use of this). I was also aware of the fact that a Metal backend would have to be developed due to a variety of limitations in MoltenVK, the issue of primitive restart and a lack of emulation for geometry shaders among other things.

Initially, I thought I'd utilize LLMs to assist me in understanding the codebase and how to conduct a proper port. Unfortunately, I quickly abandoned this idea to "hand-craft" a port to MacOS, or “doing it the right way” when I realized (and simply when it became possible with tools like claude-code, codex, etc.) that it was faster and easier to just rely on LLMs to do all of the heavy lifting. These days, it can be pretty easy to just let them "run", intervening when it loses focus or is obviously heading the wrong direction until you get the expected output you want, treating the underlying system as something like a black box, and then verifying the expected output manually when necessary. From the perspective of maximal learning, correctness, maintainability, completeness etc. this is obviously not the correct way to go about conducting a porting project of this scale, but I'm not sure that was my goal. I don't think I ever really thought the project would get to a point where it was actually running games with a relatively functional Metal graphics backend. This effort evolved as a way for me to closely follow and get in touch with the progression of the capabilities of LLMs over the course of the last year or so. Given the complexity and scale of the codebase, I feel that Xenia is a unique testbed for this topic, both now and going forwards. And consequently, this is also one of the many reasons you see unused remnants like the MetalBufferCache (introduced in a commit on August 4th it seems on a separate branch that somehow lingered in the "clean" backend, and today is entirely unused) and a variety of other things that probably remain, which I have ignored either due to laziness, in the pursuit of speed, or in ignorance. The total reliance on these systems is also what led me to spend a day or two on a "compute based fallback" to ROV, which I quickly realized, as you rightly pointed out, doesn't even make any sense. It took about 2 or 3 weeks of dedicated effort to bring up the full Metal backend (currently lacking ROG, and also yet to be uploaded to GitHub), effectively from scratch with Codex 5.2. It's only when I realized it was going to be possible a few weeks ago that I gave it a go, and I opted for results over anything else. With previous models, at least in my experience, this kind of work was not realistically achievable or automatable for someone without a pretty deep understanding of the existing GPU backend, expert level knowledge of Metal, or a willingness to really dive into the codebase. I think that because I've come this far conducting this port in the way that I have, my only real goal is to get Halo 3 booting and relatively playable (which coincidentally has also lead many other games to become bootable), just out of my own interest in doing so, and to give people a way to use Xenia on their Mac natively, if its beneficial for them to do so. As you pointed out, it is not really mergeable in its current state anyways, and with the way I've done it I'm not sure I've really gained much technically speaking, other than to prove that it can be done. Maybe this will serve as the motivation to rebase over canary and "do it right" to maximize personal learning and correctness, who knows?

One technical point about your investigation of the code I’d like to mention that may be of interest or of use to you or others in the future. The Metal backend currently upgrades the generated DXBC 5.1 code to DXIL SM 6.0 via a MacOS native port of DXC’s dxbc2dxil tool, which is then piped through Apple’s Metal Shader Converter to do all of the heavy lifting for shader translation to MSL. Note, that this includes geometry and tessellation emulation via Metal’s mesh shaders, so, that's essentially why I've been able to get correct output from the backend so quickly. This also limits the backend to function only on MacOS, meaning that if iOS devices were ever to be targeted (although JIT is an evolving issue and region dependent) MSC would need to be dropped. It was a gamble to rely on this chain of translations, but this method seems to work correctly and performantly enough for my purposes.

Both in software and many other fields, I think we are at a bit of a turning point. LLM's and artificial intelligence more generally have become extremely capable at completing difficult "tasks", whether that be generating code or driving a car, and will only continue to get better or be replaced by a more competent alternative architecture. As I have learned so very well through this exercise, these kinds of tools are a double-edged sword. They're certainly capable of empowering people to do some amazing things, both in the right hands with proper guidance, and without. However what worries me is that it has become far easier to ignore the details and to get something that "looks" correct and functional on the outside, when underneath there may be many dragons lurking, waiting to be found. As a result it has also unfortunately becoming harder for me to see the value in investing in learning the skills and dedication it would require to become able to hand port a project like this to Mac, like would have been required before the times of ChatGPT. It's getting easier and cheaper to build new things that bring value to people, which in the end I guess is a good thing. Maybe for now these tools aren't capable of maintaining or "owning" correctness or completeness on their own, but eventually I suspect they will be. As the value of the knowledge of a particular "tool" or the ability to perform a particular "task" diminishes and the cost goes to zero, I am left wondering what engineering will become when the "how" is devalued? What kinds of skillsets will remain eternally useful, what are the right things to spend time on? As a student, a few years ago potential pathways were more definable. You could pick something you "wanted to do", spend the time to acquire particular knowledge or skillset about a topic of interest that provided value to people, and you could hone your skillset and depth in that field and the reward was compounding. Has that not changed? These days and in the near future, I'm not so sure what happens, it's not all that clear to me what to invest time in, and I'm left worried the things I'm interested in now might not exist in a few years. The landscape is shifting so quickly that it's hard to keep track of what's going on. I'm curious what additional advice or guidance you might have to offer?

has207 · 2026-01-12T15:04:37Z

The LLMs are a tool, and like any tool you can use them well or you can use them poorly. Nobody starts out using their tools well, and with this particular tool there's not a built up body of knowledge or experience in terms of best practices yet, so we're all just trying to make the best of it that we can.

Don't get discouraged, you've made pretty remarkable progress and it sounds like it was a good learning experience and you had a good time doing it, and that's really the most important thing in the end, I personally hope to see you keep going and whip this port into good shape.

wmarti · 2026-01-17T17:57:15Z

I have officially uploaded the first release of the MacOS Xenia port: Pre-release v0.1. Currently only ARM64 Macs (Apple Silicon M-Series) are supported. Note that this port is, as this pull request has suggested repeatedly, very work in progress. Some games boot and are "playable". Most do not. Additionally, there are a variety of rendering bugs (severity depending on the title it seems) that will take more time to narrow down. As an example, it seems like video playback is broken pretty much across the board. Additionally, there appears to be a subtle shader translation error causing incorrect rendering in a variety of titles, such as in GTA IV below. A workaround / fix is in progress, but it will take more time. Please note that as of now the Metal backend performance is also not that great unless you're on a pro series chip. I've tested on the M4 and M4-Pro only. This will be resolved over time. Good news at least is that MetalFX upscaling is implemented and enabled by default (toggle able via the menu), and this seems to really breathe new life into these games!

Please note that a clean rebase over xenia-canary, as suggested by @has207 is in progress, and coming very very soon. Stay tuned.

Examples of good-ish rendering:

Examples of bad rendering:

It appears that as of now, there may be more examples of bad rendering than good :/

Enjoy!

mrdc · 2026-01-23T12:37:58Z

first release of the MacOS Xenia port

What about performance in comparison to Canary x86_64 via Wine?

Wunkolo and others added 30 commits May 8, 2024 09:24

[a64] Fix up-casting zero/sign extensions

b397cc1

[a64] Compute memory offsets as 32-bit registers

6c8f563

Additionally fixes some instruction forms to use the more general `STR` instruction with an offset

[a64] Use offsetof to reload membase

a3eacc6

[a64] Fix 32-bit store

3160d02

You wouldn't believe how much time this bug costed me

[a64] Update guest calling conventions

ca6ff5d

Guest-function calls will use W17 for indirect calls

[a64] Fix instruction constant generation

026fa4b

Fixes some offset generation as well

[a64] Implement multi-arch capstone support

2982b86

[PPC] Add a64 backend testing support

91ab54b

[a64] Protect address-generation from imm-overflow

9f19fe9

[a64] Preserve X0 when resolving functions

04d175f

Fixes indirect branches

[a64] Fix ADDC carry-bit assignment

c3f66dc

[a64] Fix signed MUL_HI

463d1a8

[a64] Fix non-const MUL_I32

7b551b6

Was picking up `W0` rather than src1

[a64] Implement PERMUTE_V128(int8)

24d57f5

[a64] Implement PERMUTE_I32

31029f8

[a64] Implement OPCODE_SWIZZLE

69779d9

[a64] Fix SELECT register usage

c14cda8

[a64] Fix SET_ROUNDING_MODE_I32 exception

cb73eb9

[UI] Implement Arm64 host register info

040094b

[a64] Implement OPCODE_VECTOR_SUB

6c9edf8

[a64] Fix PERMUTE_V128 out-of-index case

03c8e26

[a64] Fix AND_NOT_V128

379d28f

Operand order is wrong.

[a64] Implement VECTOR_COMPARE_{EQ,UGT,UGE,SGT,SGE}_V128

bcdb70e

[a64] Fix OPCODE_SPLAT

15629bf

Writing to the wrong register!

[a64] Fix SELECT_V128_V128

0e02fb0

Potential input-register stomping and operand order is seemingly wrong. Passes generated unit tests.

[a64] Implement OPCODE_VECTOR_AVERAGE

9f529d2

Passes generated unit tests

[CPU] Fix multi-arch cpu-test support

eb8ce21

[Base] Relax the system clock difference allowance in the test

6578777

Hopefully should reduce the CI failure rate, although this testing approach is fundamentally flawed as it depends on OS scheduling.

wmarti added 2 commits December 30, 2025 12:47

[Docs] Prioritize Metal compute fallback and add interlock plan

a7e56c2

wmarti added 14 commits January 3, 2026 11:36

[Docs] Avoid tee for build/run logs

f4671f0

[UI] Update trace viewer clipping and ImGui texture handling

6c86bb0

[Build] Update DirectXShaderCompiler to dxbc2dxil-perf

a7db2db

[Build] Adjust Metal shader debug flags and trace runner

f3251b1

[Metal] Update presenter swap output and gamma handling

3554585

- Track resolve dest_swap for presenter copies and RB swizzle control. - Upload gamma ramps and set 8bpc vs PWL for 10bpc swap formats. - Gate internal swap capture and remove trace dump dummy presenter.

[Metal] Use vertex path for memexport-only draws

d147586

[Metal] Add resolve/EDRAM debug cvars and logging

17035fc

[Metal] Track initial clears and EDRAM buffer storage

852b910

[Metal] Add ordered blend fallback pre-dump and blend shader gamma fixes

c6f033c

[Metal] Improve resolve/transfer paths and add scaled resolve support

1c22d28

[Metal] Expand texture format coverage and add load probes

a8b6815

[Metal] Add command processor debug helpers for memexport

1062acc

[Metal] Remove ROG self-test from command processor

9f440e4

wmarti mentioned this pull request Jan 18, 2026

[MacOS] Xenia Canary MacOS Port xenia-canary/xenia-canary#853

Closed

wmarti closed this Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MacOS Port of Xenia#2332

MacOS Port of Xenia#2332
wmarti wants to merge 3249 commits intoxenia-project:masterfrom
wmarti:metal-backend-clean-msc

wmarti commented Dec 21, 2025 •

edited

Loading

Uh oh!

wmarti commented Jan 2, 2026 •

edited

Loading

Uh oh!

OpenSauce04 commented Jan 4, 2026 •

edited

Loading

Uh oh!

Wunkolo commented Jan 4, 2026 •

edited

Loading

Uh oh!

wmarti commented Jan 5, 2026 •

edited

Loading

Uh oh!

OpenSauce04 commented Jan 5, 2026 •

edited

Loading

Uh oh!

has207 commented Jan 5, 2026

Uh oh!

Triang3l commented Jan 5, 2026 •

edited

Loading

Uh oh!

mrdc commented Jan 9, 2026

Uh oh!

wmarti commented Jan 12, 2026 •

edited

Loading

Uh oh!

has207 commented Jan 12, 2026

Uh oh!

wmarti commented Jan 17, 2026 •

edited

Loading

Uh oh!

mrdc commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

wmarti commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Gears of War

Halo 3

GTA IV

Uh oh!

wmarti commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OpenSauce04 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wunkolo commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wmarti commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OpenSauce04 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

has207 commented Jan 5, 2026

Uh oh!

Triang3l commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrdc commented Jan 9, 2026

Uh oh!

wmarti commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

has207 commented Jan 12, 2026

Uh oh!

wmarti commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples of good-ish rendering:

Examples of bad rendering:

Uh oh!

mrdc commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wmarti commented Dec 21, 2025 •

edited

Loading

wmarti commented Jan 2, 2026 •

edited

Loading

OpenSauce04 commented Jan 4, 2026 •

edited

Loading

Wunkolo commented Jan 4, 2026 •

edited

Loading

wmarti commented Jan 5, 2026 •

edited

Loading

OpenSauce04 commented Jan 5, 2026 •

edited

Loading

Triang3l commented Jan 5, 2026 •

edited

Loading

wmarti commented Jan 12, 2026 •

edited

Loading

wmarti commented Jan 17, 2026 •

edited

Loading