forked from kyx0r/FA-Binary-Patches
-
Notifications
You must be signed in to change notification settings - Fork 11
Fix/range ring stencil overflow #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
M3RT1N99
wants to merge
3
commits into
FAForever:master
Choose a base branch
from
M3RT1N99:fix/range-ring-stencil-overflow
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| // Fix range ring stencil overflow with 128+ overlapping units. | ||
| // | ||
| // Background: func_RenderRings (0x007EF5A0) renders range rings via two | ||
| // loops sharing the same "Cast" effect technique that increments the GPU | ||
| // stencil per overdraw layer. The stencil is 7-bit (max 127), so 128+ | ||
| // overlapping rings cause counter wrap-around and the early-out optimization | ||
| // fails -- individual circle outlines appear instead of a unified shape. | ||
| // | ||
| // Loop 1 (0x007EF774-0x007EF7FC) draws the FILL of the ring band (1 quad | ||
| // per ring, high per-pixel overdraw on overlap) -- this is the loop that | ||
| // overflows. Loop 2 draws only the thin EDGE strips (2 quads per ring) and | ||
| // has near-zero per-pixel overdraw, so it does not need patching. | ||
| // | ||
| // Fix: reduce Loop 1 batch size from 1000 to 30, and inject an intermediate | ||
| // RangeMask flush between batches that resets the stencil counter. | ||
|
|
||
| // Loop 1 batch cap: cmp eax, 0x3E8 -> cmp eax, 30 | ||
| // (patches the imm32 of the 5-byte 'cmp eax, imm32' instruction at 0x007EF77D) | ||
| 0x007EF77E: | ||
| .byte 0x1E, 0x00, 0x00, 0x00 | ||
|
|
||
| // Loop 1 batch ceiling: mov eax, 0x3E8 -> mov eax, 30 | ||
| // (patches the imm32 of the 5-byte 'mov eax, imm32' instruction at 0x007EF784) | ||
| 0x007EF785: | ||
| .byte 0x1E, 0x00, 0x00, 0x00 | ||
|
|
||
| // Inject intermediate RangeMask flush at the bottom of the Loop 1 body. | ||
| // Replaces the 7-byte 'mov eax, [esp+0x84]' with a 5-byte call + 2 NOPs. | ||
| // IntermediateRangeMask reproduces the replaced instruction after the flush. | ||
| 0x007EF7EA: | ||
| call @IntermediateRangeMask | ||
| nop | ||
| nop |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| // Intermediate RangeMask flush to prevent 7-bit stencil counter overflow | ||
| // in the first Cast loop of func_RenderRings (0x007EF5A0). | ||
| // | ||
| // === Why this exists === | ||
| // | ||
| // func_RenderRings draws range rings via two loops, both calling | ||
| // func_Draw_Rings with the same "Cast" effect technique. That technique | ||
| // increments the GPU stencil per overdraw layer, then later passes use | ||
| // the stencil count for early-out culling and edge masking. The stencil | ||
| // is 7-bit (max 127). With 128+ overlapping rings the counter wraps and | ||
| // the visual collapses to individual circle outlines instead of a merged | ||
| // shape. | ||
| // | ||
| // Loop 1 (0x007EF774-0x007EF7FC): draws the FILL band of each ring. | ||
| // sub_7EDC80 emits 1 quad per ring covering the area between | ||
| // inner-radius+thickness and outer-radius-thickness. High per-pixel | ||
| // overdraw at dense build sites -- this is the loop that overflows. | ||
| // | ||
| // Loop 2 (0x007EF861-0x007EF8D5): draws the two thin EDGE strips per ring. | ||
| // sub_7EDC80 emits 2 quads per ring at the inner and outer borders. | ||
| // Near-zero per-pixel overdraw because edges of differently-sized rings | ||
| // almost never coincide -- does not need patching in practice. | ||
| // | ||
| // Fix: drop Loop 1 batch size 1000 -> 30 and inject this flush between | ||
| // batches. The flush replays the engine's own end-of-loop sequence | ||
| // (InitTransformedVerts -> RangeMask string -> Render) which clears the | ||
| // stencil and reapplies the mask before the next batch starts fresh. | ||
| // | ||
| // === Calling conventions === | ||
| // | ||
| // Despite the IDA mangled-name display showing standard __thiscall, this | ||
| // build uses a custom register-based thiscall for the CRenFrame methods | ||
| // (verified by inspecting the function bodies and the engine's own call | ||
| // sites at 0x007EF820-0x007EF849). A direct C++ call is impossible | ||
| // because no C++ calling convention puts `this` in EBX or EDI, so the | ||
| // flush is implemented in raw asm. | ||
| // | ||
| // 0x007F5DA0 Moho::CRenFrame::InitTransformedVerts | ||
| // this @ EBX, (float w, float h), retn 8 | ||
| // first usage at 0x007F5DD1: cmp [ebx+0x1C], 0 | ||
| // | ||
| // 0x004059E0 std::string::string (the engine builds an std::string | ||
| // in-place at the technique slot to select "RangeMask") | ||
| // this @ ECX, (char const* s, size_t n), retn 8 | ||
| // standard MSVC __thiscall | ||
| // | ||
| // 0x007F6030 Moho::CRenFrame::Render | ||
| // this @ EDI, (int w, int h), retn 8 | ||
| // first usage at 0x007F6055: cmp [edi+0x18], 0x10 | ||
| // | ||
| // === Stack layout === | ||
| // | ||
| // At entry esp = caller_esp - 4 (return address pushed by the call hook). | ||
| // After pushad esp = caller_esp - 36. The hook is injected at 0x007EF7EA | ||
| // inside func_RenderRings where its esp delta from frame base puts: | ||
| // | ||
| // [caller_esp + 0x78] = idxa (gpg::gal::Head*) -- needed for width/height | ||
| // [caller_esp + 0x74] = arg_0 (CRenFrame container, +0x4C = technique ptr) | ||
| // | ||
| // Adding 36 for the pushad/return-address gives the offsets used below. | ||
| // | ||
| // The instruction replaced at 0x007EF7EA is the 7-byte | ||
| // `mov eax, [esp+0x84]` (= load loop counter `i`). It is reproduced after | ||
| // popad as `mov eax, [esp+0x88]` because esp is still 4 bytes lower from | ||
| // the call's return address being pushed. | ||
|
|
||
| void IntermediateRangeMask() | ||
| { | ||
| asm( | ||
| "pushad;" | ||
|
|
||
| // esi = idxa (Head*), edi = technique ptr | ||
| "mov esi, dword ptr [esp+0x9C];" // [caller+0x78] | ||
| "mov edi, dword ptr [esp+0x98];" // [caller+0x74] | ||
| "add edi, 0x4C;" | ||
|
|
||
| // --- InitTransformedVerts(this=ebx, width, height) --- | ||
| "cvtsi2ss xmm0, dword ptr [esi+0x14];" | ||
| "sub esp, 8;" | ||
| "movss dword ptr [esp+4], xmm0;" | ||
| "cvtsi2ss xmm0, dword ptr [esi+0x10];" | ||
| "mov ebx, edi;" // this @ ebx | ||
| "movss dword ptr [esp], xmm0;" | ||
| "call 0x007F5DA0;" | ||
|
|
||
| // --- std::string::string(this=ecx, "RangeMask", 9) --- | ||
| "push 9;" | ||
| "push 0x00E3F8E8;" // address of "RangeMask" string | ||
| "mov ecx, edi;" // this @ ecx | ||
| "call 0x004059E0;" | ||
|
|
||
| // --- Render(this=edi, width, height) --- | ||
| "mov ecx, dword ptr [esi+0x14];" | ||
| "mov edx, dword ptr [esi+0x10];" | ||
| "push ecx;" | ||
| "push edx;" | ||
| "call 0x007F6030;" // this @ edi (preserved through above) | ||
|
|
||
| "popad;" | ||
| // Reproduce the 7-byte instruction we overwrote at 0x007EF7EA. | ||
| // esp is still -4 from the call's return address, so [esp+0x88] | ||
| // here equals [original_esp+0x84] in the host function. | ||
| "mov eax, dword ptr [esp+0x88];" | ||
| "ret;" | ||
| ); | ||
| } | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.