nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720

abaire · 2026-02-03T17:19:28Z

Paired MAC+ILU and duplicate output instructions can write to oPos while reading from R12. This leads to a case where xemu's serialized emulation erroneously uses the output of a previous instruction when calculating the value of a later one.

E.g., in

 /* 0x00000000 0x0080201A 0xC4002868 0x7CB0E800 */
 MAD oPos.xyz, R12.xyz, R1.x, C[1].xyz
   +  MAD R11.xy, R12.xyz, R1.x, C[1].xyz

the value of oPos prior to the first instruction should be used for both MAD calculations.

This could alternatively be fixed by writing oPos to a temp vector and deferring the output vector update until after the token is fully processed.

Tests
HW results
PR results

Fixes #1864

gemini-code-assist · 2026-02-03T17:19:48Z

Summary of Changes

Hello @abaire, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the NV2A vertex shader emulation concerning the R12 register, which acts as a mirror for oPos. Previously, in complex instructions involving both reading from R12 and writing to oPos, the serialized emulation could incorrectly use an updated oPos value for subsequent parts of the same instruction. The changes introduce a "latching" mechanism for R12 by making it an independent variable and explicitly assigning oPos to it at the beginning of an instruction if R12 is used as an input, ensuring consistent input values throughout the instruction's execution.

Highlights

R12 Input Correction: Modified the emulation of R12 in the NV2A vertex shader to correctly capture the oPos value at the start of an instruction, preventing erroneous reads in paired or multi-output operations where oPos is also being written.
R12 Definition Change: Changed R12 from a direct #define alias of oPos to an independent vec4 variable, allowing its value to be explicitly latched before instruction execution.
Shader Code Generation Refactoring: Refactored the decode_opcode function to append generated shader code to an existing string, streamlining the code generation process.

Changelog

hw/xbox/nv2a/pgraph/glsl/vsh-prog.c
- Introduced a bool *uses_r12_latch parameter to decode_opcode_input to track R12 usage.
- Modified decode_opcode_input to set uses_r12_latch to true if R12 is detected as an input.
- Refactored decode_opcode to take an MString *ret argument and append to it, rather than returning a new MString.
- Implemented logic in decode_token to assign R12 = oPos; at the start of an instruction if R12 is used as an input.
- Changed the definition of R12 from a #define R12 oPos to a vec4 R12 = vec4(0.0,0.0,0.0,0.0); to enable the latching behavior.

Activity

The author, abaire, created this pull request to fix issue Prince of Persia: The Sands of Time: Incorrect rendering of water surface #1864.
The pull request includes links to tests and hardware results, with PR results marked as TODO, indicating ongoing validation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses an issue where paired instructions reading from R12 could get an incorrect value if a previous instruction in the same token wrote to oPos. The fix involves latching the value of oPos into a new R12 variable at the start of the instruction token if R12 is used as an input. The changes look correct and effectively solve the problem. A refactoring of decode_opcode also makes the code cleaner. However, this refactoring has introduced a minor memory leak, which I've pointed out in a specific comment with a suggested fix.

hw/xbox/nv2a/pgraph/glsl/vsh-prog.c

Paired MAC+ILU and duplicate output instructions can write to oPos while reading from R12. This leads to a case where xemu's serialized emulation erroneously uses the output of a previous instruction when calculating the value of a later one. E.g., in ``` /* 0x00000000 0x0080201A 0xC4002868 0x7CB0E800 */ MAD oPos.xyz, R12.xyz, R1.x, C[1].xyz + MAD R11.xy, R12.xyz, R1.x, C[1].xyz ``` the value of oPos prior to the first instruction should be used for both MAD calculations. This could alternatively be fixed by writing oPos to a temp vector and deferring the output vector update until after the token is fully processed. Note that we always process output registers writes before temp register writes so the only interesting case should be the oPos/R12 alias. An op that writes to one of its temp inputs will always execute the non-modifying output register write before updating the temp register. [Tests](https://github.com/abaire/nxdk_pgraph_tests/blob/1745a45290a5d607f9582303d6882cbf62f003de/src/tests/vertex_shader_independence_tests.cpp#L125) [HW results](https://abaire.github.io/nxdk_pgraph_tests_golden_results/results/Vertex_shader_independence_tests/index.html) Fixes xemu-project#1864

viniciusol263 · 2026-02-03T20:55:51Z

Its fixed indeed, thanks!

Triticum0 · 2026-02-04T05:10:47Z

Test a few game don't think any known issues are fixed or have same behaviour.

mborgerson · 2026-02-05T04:19:01Z

Thanks for the patch!

This could alternatively be fixed by writing oPos to a temp vector and deferring the output vector update until after the token is fully processed.

The extra register and additional latching/stale checks are adding more complexity. I think this alternative suggestion sounds simple and more maintainable--not limited to oPos, but analyze inputs/outputs to mitigate the hazard generally. We've encountered this issue already, so we should unify the approaches.

abaire · 2026-02-05T04:29:45Z

The extra register and additional latching/stale checks are adding more complexity. I think this alternative suggestion sounds simple and more maintainable--not limited to oPos, but analyze inputs/outputs to mitigate the hazard generally. We've encountered this issue already, so we should unify the approaches.

I don't think it can be trivially unified; the current temp register usage will only work for the MAC+ILU case.
The least complex approach is to detect writes to oPos and always defer them until after the instruction is fully processed with an additional special temp register. E.g.,

tempPos.mask = something;
oPos.mask = tempPos.mask

Then we'd be covered for the most complicated case, which would be a MAC + ILU pairing where one or the other writes to oPos + a temp reg and either/both of them read from R12.

The downside is that we don't (currently) parse the ILU args until after the MAC is fully processed, so there may be a situation where we assign to the temp unnecessarily since a MAC write to oPos would have to cover the worst case of an R12 ILU read.

…tially problematic situations.

abaire · 2026-02-05T05:18:23Z

Committed a version that piggybacks on the existing suffix workaround to conservatively defend modifications of oPos.

I think ideally this code would be refactored to have a bit more state tracking so that all inputs could be resolved with special cases (like R12 use) flagged as they're expanded into strings (we can also drop a couple string copies by threading the buffer through). Then we could reserve this hackery for the cases where oPos is modified before a use of R12. For now I'm optimistic that this simple approach won't have much perf impact since it's only triggered in MAC+ILU or multi-write situations that output position.

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

hw/xbox/nv2a/pgraph/glsl/vsh-prog.c Show resolved Hide resolved

abaire force-pushed the fix_1864_r12_input_parallel_with_opos_write branch from 32d2b5d to edc15c1 Compare February 3, 2026 17:54

abaire force-pushed the fix_1864_r12_input_parallel_with_opos_write branch from edc15c1 to 8ff499f Compare February 3, 2026 18:34

abaire marked this pull request as ready for review February 3, 2026 19:01

SQUASHME: Alternate approach that simply defends oPos writes in poten…

47e579e

…tially problematic situations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720

nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720

abaire commented Feb 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

viniciusol263 commented Feb 3, 2026

Uh oh!

Triticum0 commented Feb 4, 2026

Uh oh!

mborgerson commented Feb 5, 2026 •

edited

Loading

Uh oh!

abaire commented Feb 5, 2026

Uh oh!

abaire commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720

Are you sure you want to change the base?

nv2a: Fix R12 input in paired/multiout ops writing to oPos #2720

Conversation

abaire commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

viniciusol263 commented Feb 3, 2026

Uh oh!

Triticum0 commented Feb 4, 2026

Uh oh!

mborgerson commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abaire commented Feb 5, 2026

Uh oh!

abaire commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abaire commented Feb 3, 2026 •

edited

Loading

mborgerson commented Feb 5, 2026 •

edited

Loading