-
-
Notifications
You must be signed in to change notification settings - Fork 382
nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720
Conversation
Summary of ChangesHello @abaire, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug in the NV2A vertex shader emulation concerning the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue where paired instructions reading from R12 could get an incorrect value if a previous instruction in the same token wrote to oPos. The fix involves latching the value of oPos into a new R12 variable at the start of the instruction token if R12 is used as an input. The changes look correct and effectively solve the problem. A refactoring of decode_opcode also makes the code cleaner. However, this refactoring has introduced a minor memory leak, which I've pointed out in a specific comment with a suggested fix.
32d2b5d to
edc15c1
Compare
Paired MAC+ILU and duplicate output instructions can write to oPos while reading from R12. This leads to a case where xemu's serialized emulation erroneously uses the output of a previous instruction when calculating the value of a later one. E.g., in ``` /* 0x00000000 0x0080201A 0xC4002868 0x7CB0E800 */ MAD oPos.xyz, R12.xyz, R1.x, C[1].xyz + MAD R11.xy, R12.xyz, R1.x, C[1].xyz ``` the value of oPos prior to the first instruction should be used for both MAD calculations. This could alternatively be fixed by writing oPos to a temp vector and deferring the output vector update until after the token is fully processed. Note that we always process output registers writes before temp register writes so the only interesting case should be the oPos/R12 alias. An op that writes to one of its temp inputs will always execute the non-modifying output register write before updating the temp register. [Tests](https://github.com/abaire/nxdk_pgraph_tests/blob/1745a45290a5d607f9582303d6882cbf62f003de/src/tests/vertex_shader_independence_tests.cpp#L125) [HW results](https://abaire.github.io/nxdk_pgraph_tests_golden_results/results/Vertex_shader_independence_tests/index.html) Fixes xemu-project#1864
edc15c1 to
8ff499f
Compare
|
Test a few game don't think any known issues are fixed or have same behaviour. |
|
Thanks for the patch!
The extra register and additional latching/stale checks are adding more complexity. I think this alternative suggestion sounds simple and more maintainable--not limited to oPos, but analyze inputs/outputs to mitigate the hazard generally. We've encountered this issue already, so we should unify the approaches. |
I don't think it can be trivially unified; the current temp register usage will only work for the MAC+ILU case. Then we'd be covered for the most complicated case, which would be a MAC + ILU pairing where one or the other writes to oPos + a temp reg and either/both of them read from R12. The downside is that we don't (currently) parse the ILU args until after the MAC is fully processed, so there may be a situation where we assign to the temp unnecessarily since a MAC write to oPos would have to cover the worst case of an R12 ILU read. |
…tially problematic situations.
|
Committed a version that piggybacks on the existing suffix workaround to conservatively defend modifications of oPos. I think ideally this code would be refactored to have a bit more state tracking so that all inputs could be resolved with special cases (like R12 use) flagged as they're expanded into strings (we can also drop a couple string copies by threading the buffer through). Then we could reserve this hackery for the cases where |

Paired MAC+ILU and duplicate output instructions can write to oPos while reading from R12. This leads to a case where xemu's serialized emulation erroneously uses the output of a previous instruction when calculating the value of a later one.
E.g., in
the value of oPos prior to the first instruction should be used for both MAD calculations.
This could alternatively be fixed by writing oPos to a temp vector and deferring the output vector update until after the token is fully processed.
Tests
HW results
PR results
Fixes #1864