Conversation
Additionally fixes some instruction forms to use the more general `STR` instruction with an offset
You wouldn't believe how much time this bug costed me
Guest-function calls will use W17 for indirect calls
Fixes some offset generation as well
Fixes indirect branches
Was picking up `W0` rather than src1
Operand order is wrong.
Writing to the wrong register!
Potential input-register stomping and operand order is seemingly wrong. Passes generated unit tests.
Passes generated unit tests
Accessing the same memory as different types (other than char) using reinterpret_cast or a union is undefined behavior that has already caused issues like xenia-project#1971. Also adds a XE_RESTRICT_VAR definition for declaring non-aliasing pointers in performance-critical areas in the future.
Hopefully prevents some potential xenia-project#1971-like situations. WAIT_REG_MEM's implementation also allowed the compiler to load the value only once, which caused an infinite loop with the other changes in the commit (even in debug builds), so it's now accessed as volatile. Possibly it would be even better to replace it with some (acquire/release?) atomic load/store some day at least for the registers actually seen as participating in those waits. Also fixes the endianness being handled only on the first wait iteration in WAIT_REG_MEM.
Hopefully should reduce the CI failure rate, although this testing approach is fundamentally flawed as it depends on OS scheduling.
Metal's xesl_firstOneBitHigh used 32 - clz(x), which is off by one and returns 32 for zero, unlike HLSL/GLSL firstbithigh/findMSB. Switch to 31 - clz(x) with uint casts for signed inputs to match backend semantics and fix float10 denorm decode in resolve.
- Replace dxbc2dxil CLI spawns with per-thread IDxbcConverter conversion and expose extra options via XENIA_DXBC2DXIL_FLAGS (default -skip-container-parts) - Link dxilconv/LLVMDxcSupport in Metal targets and add dxilconv include/lib paths - Route shader dumps under dump_shaders/metal_shaders and only emit when enabled
- Track resolve dest_swap for presenter copies and RB swizzle control. - Upload gamma ramps and set 8bpc vs PWL for 10bpc swap formats. - Gate internal swap capture and remove trace dump dummy presenter.
|
What's going on with the commit history in this PR? |
This seems to be an artifact of the vibe coding mentioned from the original post:
In addition to the fact that it uses my arm64 jit backend as a base which already involves a lot of changes. In this PR there is an addition of an AGENTS.md and GEMINI.md and other artifacts that imply that this PR is using a lot of AI-assistance. Merging PR this is also going to be setting a precedent about AI-usage within Xenia moving forward. Not too sure how to feel about this. It looks like it would take an incredible amount of time to recenter this PR into the minimal amount of edits to deliver this feature in a concise way without any of the superficial changes and merge-conflicts. |
Despite the branch I've included in the PR being named Also, for anyone reading, please keep in mind that this is my first ever pull request. I am not a professional software engineer with 10 or 20 years of experience like the rest of the contributors to Xenia, I am just a student. When I started out on this project to port Xenia to MacOS a year ago, I never thought I'd actually get here. It's taken hundreds of hours or more of work, and the fact that it's even booting at all I think pretty incredible. It's really a testament to what can be accomplished with LLMs and a little determination. Rest assured I'm going to get this thing into a state that's usable, and mergeable into one of the mainstream codebases. 😎 For now, here's some more screenshots. Enjoy!
|
|
I think it'd be in everyone's best interest for a maintainer to share their thoughts here. In its current state, this PR seems unreviewable and unmergable due to the sheer volume of code in the pull request, and because of the aforementioned "vibe coding", the code would absolutely warrant a thorough review. A lot of the code changes also seem unrelated to the PR topic. I don't know how you could clean this up without just starting over and manually moving over and cleaning up changes which are actually relevant to the pull request. That's without even considering what Xenia's stance on AI-generated code of this scale would be. |
I don't pretend to represent the xenia project, but there's no danger of any code getting merged into master at this point regardless. He said he'll make a better rebase on top of canary, let him work on it, it's not to anyone detriment and only to everyone's benefit if he makes something that works. Criticizing something that's clearly still a work in-progress and not even requested a review is certainly not helpful however. |
While I can't speak for other developers, my personal stance is that code submissions should be held to the same standards during reviews regardless of the tools used to produce them. Cats playing musical instruments and boss fights with a huge butthurt dude are just the current state of our timeline, and if machine learning can improve our lives in various ways, why not take advantage of what we have. We, humans, don't even really create anything out of nowhere, rather just transform things that we encounter using other things that we encounter — and that's especially applicable to interoperability tools like emulators — and so does generative machine learning, although currently nowhere as meaningfully as humans. However, when you're importing large pieces of code from an LLM, it's much easier to let some unhandled edge cases sneak into the code than when you're carefully thinking about every line you're writing, so it's likely that ML-generated code will raise a lot of questions during the review process. Another issue to consider is copyright, and it's complicated because you don't know the origin of the code you're getting from your AI assistant. In general, code can be separated into 3 categories with their copyright implications:
Now, regarding this merge request itself. After briefly looking at various files in it, it feels to me like the insistence on vibe coding keeps distracting you from something that may accelerate your work much greater than any AI assistant — actual awareness of the architecture. Many changes here seem to be unnecessary for porting to Mac or to POSIX systems in general. I don't know how exactly they appeared here, but it seems like there may be insufficient understanding of what's needed and what isn't, and possibly some reluctance to explore the code base, maybe due to inflated expectations about the AI's knowledge of the code. And that's observable not only in the semi-related commits that somehow ended up in this merge request, but in the new code as well:
The whole idea of a compute shader fallback for programmable blending — how is it even supposed to work? Are you trying to create a full-blown software renderer in compute shaders? I see some EDRAM blend compute shaders, are they even currently used at all, or are they just another bunch of code from the LLM stashed for an unknown purpose for an indeterminate point in the future? If you're trying to, for instance, render a primitive into a separate framebuffer, and then blend that framebuffer with the main one, a lot of problems will need to be solved for it to work even at a more or less acceptable speed — how to detect if overlap actually happens to avoid inserting a barrier between every pair of triangles, how to calculate the bounding rectangles for triangles on the screen to dispatch compute shader grids of a necessary size, etc. A lot of that sounds like having to transform triangles on the CPU to detect overlap… I'm very very confused. The whole fallback isn't really even needed in the first place because in most cases the guest framebuffer pixel formats can be emulated in a reasonable way without ROV/ROG. Xenia already has code for converting depth to 24-bit floating-point in translated pixel shaders (that is disabled by default, but I think should be enabled in the future as its performance cost doesn't seem to be that severe, but it, if I recall correctly, fixes very annoying issues in prominent games like Halo 4), and I'm planning to try promoting the The primary goals that a GPU emulation host backend needs to achieve are:
For the first two, I don't think the current development approach is helpful. From an external point of view, it seems like the idea is not merely to use the AI as an assistant, but rather to just ask for random changes and hoping that they're going to work and pass some "probing" tests. It feels like not only AI is used for code generation, but the whole development process also kind of resembles an AI genetic algorithm. What makes correctness very doubtful is the presence of a lot of code that just has no meaning in the architecture, like the buffer cache from the pre-2018 renderer, and geometry shader generation. It's like the code not only doesn't get properly verified especially in edge cases, but a lot of it has never even been run at all? And regarding host configuration support, there already are some strange decisions, such as checking the presence of formats like All this merge request is really weird in my opinion (and it took me a few hours to write this comment, and I guess how the attitude changed over time is pretty evident). It just looks like a pretty straightforward task (maybe with the exception of integrating vertex shader generation for point sprites and rectangles, as well as adapting the tessellation logic to the more Xbox 360-like interface of Metal tessellation) of replacing Direct3D 12 and Vulkan code with roughly equivalent Metal code, possibly moving some parts to backend-agnostic code along the way, that can be solved quite easily by breaking it down into sort of a dependency tree and simply rewriting each part that's needed at each point in development, instead was massively overcomplicated by wasting a lot of time trying to get something functional from an LLM while being reluctant to spend a couple of weeks to learn the structure of the renderer and the primary challenges of emulating the GPU of the Xbox 360? |
|
I ported Wunkolo’s ARM64 code to canary some time ago and was able to run it on Win 11 ARM. Without llm or vibe coding, so not so many PRs or too many changes. But it was quite rough and I wasn’t able to test video backend - Win 11 Arm on Mac has only DX11, so Xenia wasn’t happy. Need give it another try this year, but no eta. |
@Triang3l First of all, I would like to thank you for taking the time to write this reply. Having taken just a few hours to do so, you have probably spent more time reviewing my work than anyone else in the last 5 years. I have tremendous respect and admiration for what you have built here, you are clearly a very talented, capable and passionate engineer. Additionally, I think your analysis of the situation is spot on, and I am pretty much in agreement with your perspective regarding the merge request at this point. I’ve been keeping up with the status of a Mac port of Xenia for a while. When I realized an ARM64 backend had been developed, I was surprised to see that it was not completed to the point where it was merged into master or canary. I also realized that this added one of the key missing pieces required for supporting Apple Silicon based devices in a meaningful way in Xenia (although x86_64 support for Xenia on ARM64 based Macs is now “technically” supported by Rosetta 2 through the introduction of somewhat limited AVX support, I was not aware or ever interested in making use of this). I was also aware of the fact that a Metal backend would have to be developed due to a variety of limitations in MoltenVK, the issue of primitive restart and a lack of emulation for geometry shaders among other things. Initially, I thought I'd utilize LLMs to assist me in understanding the codebase and how to conduct a proper port. Unfortunately, I quickly abandoned this idea to "hand-craft" a port to MacOS, or “doing it the right way” when I realized (and simply when it became possible with tools like claude-code, codex, etc.) that it was faster and easier to just rely on LLMs to do all of the heavy lifting. These days, it can be pretty easy to just let them "run", intervening when it loses focus or is obviously heading the wrong direction until you get the expected output you want, treating the underlying system as something like a black box, and then verifying the expected output manually when necessary. From the perspective of maximal learning, correctness, maintainability, completeness etc. this is obviously not the correct way to go about conducting a porting project of this scale, but I'm not sure that was my goal. I don't think I ever really thought the project would get to a point where it was actually running games with a relatively functional Metal graphics backend. This effort evolved as a way for me to closely follow and get in touch with the progression of the capabilities of LLMs over the course of the last year or so. Given the complexity and scale of the codebase, I feel that Xenia is a unique testbed for this topic, both now and going forwards. And consequently, this is also one of the many reasons you see unused remnants like the One technical point about your investigation of the code I’d like to mention that may be of interest or of use to you or others in the future. The Metal backend currently upgrades the generated DXBC 5.1 code to DXIL SM 6.0 via a MacOS native port of DXC’s dxbc2dxil tool, which is then piped through Apple’s Metal Shader Converter to do all of the heavy lifting for shader translation to MSL. Note, that this includes geometry and tessellation emulation via Metal’s mesh shaders, so, that's essentially why I've been able to get correct output from the backend so quickly. This also limits the backend to function only on MacOS, meaning that if iOS devices were ever to be targeted (although JIT is an evolving issue and region dependent) MSC would need to be dropped. It was a gamble to rely on this chain of translations, but this method seems to work correctly and performantly enough for my purposes. Both in software and many other fields, I think we are at a bit of a turning point. LLM's and artificial intelligence more generally have become extremely capable at completing difficult "tasks", whether that be generating code or driving a car, and will only continue to get better or be replaced by a more competent alternative architecture. As I have learned so very well through this exercise, these kinds of tools are a double-edged sword. They're certainly capable of empowering people to do some amazing things, both in the right hands with proper guidance, and without. However what worries me is that it has become far easier to ignore the details and to get something that "looks" correct and functional on the outside, when underneath there may be many dragons lurking, waiting to be found. As a result it has also unfortunately becoming harder for me to see the value in investing in learning the skills and dedication it would require to become able to hand port a project like this to Mac, like would have been required before the times of ChatGPT. It's getting easier and cheaper to build new things that bring value to people, which in the end I guess is a good thing. Maybe for now these tools aren't capable of maintaining or "owning" correctness or completeness on their own, but eventually I suspect they will be. As the value of the knowledge of a particular "tool" or the ability to perform a particular "task" diminishes and the cost goes to zero, I am left wondering what engineering will become when the "how" is devalued? What kinds of skillsets will remain eternally useful, what are the right things to spend time on? As a student, a few years ago potential pathways were more definable. You could pick something you "wanted to do", spend the time to acquire particular knowledge or skillset about a topic of interest that provided value to people, and you could hone your skillset and depth in that field and the reward was compounding. Has that not changed? These days and in the near future, I'm not so sure what happens, it's not all that clear to me what to invest time in, and I'm left worried the things I'm interested in now might not exist in a few years. The landscape is shifting so quickly that it's hard to keep track of what's going on. I'm curious what additional advice or guidance you might have to offer? |
|
The LLMs are a tool, and like any tool you can use them well or you can use them poorly. Nobody starts out using their tools well, and with this particular tool there's not a built up body of knowledge or experience in terms of best practices yet, so we're all just trying to make the best of it that we can. Don't get discouraged, you've made pretty remarkable progress and it sounds like it was a good learning experience and you had a good time doing it, and that's really the most important thing in the end, I personally hope to see you keep going and whip this port into good shape. |
|
I have officially uploaded the first release of the MacOS Xenia port: Pre-release v0.1. Currently only ARM64 Macs (Apple Silicon M-Series) are supported. Note that this port is, as this pull request has suggested repeatedly, very work in progress. Some games boot and are "playable". Most do not. Additionally, there are a variety of rendering bugs (severity depending on the title it seems) that will take more time to narrow down. As an example, it seems like video playback is broken pretty much across the board. Additionally, there appears to be a subtle shader translation error causing incorrect rendering in a variety of titles, such as in GTA IV below. A workaround / fix is in progress, but it will take more time. Please note that as of now the Metal backend performance is also not that great unless you're on a pro series chip. I've tested on the M4 and M4-Pro only. This will be resolved over time. Good news at least is that MetalFX upscaling is implemented and enabled by default (toggle able via the menu), and this seems to really breathe new life into these games! Please note that a clean rebase over xenia-canary, as suggested by @has207 is in progress, and coming very very soon. Stay tuned. Examples of good-ish rendering:
Examples of bad rendering:
It appears that as of now, there may be more examples of bad rendering than good :/ Enjoy! |
What about performance in comparison to Canary x86_64 via Wine? |




















This is a work in progress port of Xenia for MacOS, currently only tested on Apple Silicon, over @Wunkolo’s ARM64 backend #2259. In theory this would also work on iOS devices, but only in regions where JIT compilation is available, and distribution is available outside of the AppStore, like the EU.
The Metal backend translates Xbox 360 shader microcode through multiple stages:
The pipeline leverages:
metalirconverterlibrary for DXIL → Metal IRMaybe eventually I'll go the SPIR-V -> MSL route, but this seemed the easiest for now (even though there's a big performance penalty).
The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time. Not expecting this to get merged anytime soon, but just opening this PR for tracking. The "app" builds but does not run games yet. I've got
xenia-gpu-metal-trace-dumpreproducing traces captured in D3D12 backend from Gears of War ~mostly correctly. Other Games are WIP, as you can see below.Gears of War
Halo 3
GTA IV