Skip to content

Fix/range ring stencil overflow#149

Open
M3RT1N99 wants to merge 3 commits intoFAForever:masterfrom
M3RT1N99:fix/range-ring-stencil-overflow
Open

Fix/range ring stencil overflow#149
M3RT1N99 wants to merge 3 commits intoFAForever:masterfrom
M3RT1N99:fix/range-ring-stencil-overflow

Conversation

@M3RT1N99
Copy link
Copy Markdown

@M3RT1N99 M3RT1N99 commented Mar 31, 2026

Fix range ring stencil overflow with 128+ overlapping units

Problem
When 128+ units overlap (e.g., 200 engineers in a cluster), their range ring overlays render as individual circles instead of merging into a single unified shape. This is caused by the 7-bit stencil counter in func_RenderRings (0x7EF5A0) wrapping from 0x7F to 0x00, creating holes in the fill area that the boundary Cast loop then fills with visible ring outlines.

Old:
image

New:
image

Fix
Insert intermediate RangeMask flushes in the first Cast loop of func_RenderRings. Batch size is reduced from 1000 to 30 units. After each batch, a RangeMask fullscreen pass converts accumulated stencil marks to bit 7 before overflow can occur. Subsequent Cast operations skip already-marked pixels via StencilFunc=notequal on bit 7, which also enables GPU stencil early-out for dense clusters.

Files
hooks/HRenderRingsFlush.cpp — 3 hooks: two data patches reducing batch size at 0x7EF77E/0x7EF785, one call hook at 0x7EF7EA
section/RenderRingsFlush.cpp — IntermediateRangeMask() function replicating the RangeMask sequence from 0x7EF802-0x7EF849
section.ld — external symbols 0x7F5DA0 (InitTransformedVerts) and 0x7F6030 (CRenFrame::Render)
Test instructions
Spawn 200+ engineers at the same location
Enable Build Range overlay — should show one merged fill area, no individual circles
Select a single unit — normal range ring, no artifacts
Compare FPS with large clusters before/after — should be equal or better due to stencil early-out

Checklist

  • Read all guides
  • Clear naming of variables and structs
  • Add new data into moho.h/global.h/Info.txt
  • Add description of changes to README.md
  • Add test hints/instructions

Summary by CodeRabbit

  • Bug Fixes
    • Fixed range ring rendering issue that occurred when 128 or more overlapping units were displayed simultaneously. The bug previously caused visible individual circle outlines to appear instead of a properly merged fill. Overall rendering performance for dense unit clusters has been improved through optimized batch processing and GPU efficiency enhancements.

Marti and others added 2 commits March 31, 2026 17:44
The 7-bit stencil counter in func_RenderRings overflows when 128+
units overlap, causing individual circle outlines instead of merged
fill. Fix: batch size 1000->30 with intermediate RangeMask flushes
in the first Cast loop. Enables GPU stencil early-out for dense
clusters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Info.txt: document CRenFrame::InitTransformedVerts and CRenFrame::Render addresses and calling conventions
- changelog.md: add entry under Optimizations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

This change patches the range ring rendering system to fix a visual artifact occurring when 128+ overlapping units create stencil counter overflow. The fix reduces the per-batch processing size from 1000 to 30, introduces an intermediate flush mechanism via a new assembly function, and enables GPU stencil early-out optimization to prevent individual circle outlines from appearing in dense clusters.

Changes

Cohort / File(s) Summary
Symbol Registration
Info.txt
Updated RenderRings exported symbol with qualified label and added three new public symbol entries for CRenFrame::InitTransformedVerts, CRenFrame::Render, and std::string::string constructor with retn 8 annotations.
Documentation
changelog.md
Added technical-patch entry documenting the stencil counter wrap fix for range ring rendering, describing batch size reduction (1000 → 30), intermediate flush operations, and GPU stencil early-out optimization.
Rendering Flush Implementation
section/RenderRingsFlush.cpp
Introduced new inline x86 assembly function IntermediateRangeMask() that performs intermediate rendering flushes by invoking CRenFrame::InitTransformedVerts and CRenFrame::Render with proper stack frame manipulation and technique construction.
Rendering Hook
hooks/RenderRingsFlush.hook
Added hook to patch func_RenderRings at address 0x007EF5A0, reducing Loop 1 batch cap and ceiling from 1000 to 30, and injecting @IntermediateRangeMask call to reset stencil counter between batches.

Sequence Diagram

sequenceDiagram
    participant RenderRings as func_RenderRings
    participant Loop1 as Loop 1 (batch)
    participant Flush as IntermediateRangeMask
    participant Frame as CRenFrame
    participant GPU as GPU Stencil

    RenderRings->>Loop1: Process rings (batch size: 30)
    Loop1->>Flush: Call intermediate flush
    Flush->>Frame: InitTransformedVerts(width, height)
    Flush->>Flush: Construct std::string("RangeMask")
    Flush->>Frame: Render(width, height)
    Frame->>GPU: Execute with stencil early-out
    GPU-->>Frame: Return
    Frame-->>Flush: Return
    Flush-->>Loop1: Return
    Loop1->>Loop1: Continue next batch
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Rings now render clear and bright,
No stencil wrap-around in sight!
Batches flush at thirty-go,
Dense clusters steal the show.
GPU cheers with early-out glow,
Moho's rendering steals the show! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix/range ring stencil overflow' is concise, clear, and directly describes the main change in the changeset.
Description check ✅ Passed The PR description is comprehensive and addresses all template sections: clear memo with problem/fix explanation and visuals, checklist items all marked as complete, and test instructions included.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
section/RenderRingsFlush.cpp (2)

8-8: Consider adding AssignString to Info.txt for consistency.

InitTransformedVerts (0x7F5DA0) and CRenFrame::Render (0x7F6030) were added to Info.txt, but AssignString (0x4059E0) referenced here is not. For maintainability, consider adding it alongside the other entries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RenderRingsFlush.cpp` at line 8, Add the missing AssignString entry
to Info.txt to match the other symbols; locate the existing entries for
InitTransformedVerts and CRenFrame::Render and add a line for AssignString with
its signature and address ("AssignString (0x4059E0): this@<ecx>, (char* s,
size_t n), retn 8") so Info.txt consistently documents all referenced symbols.

1-1: Unused include.

The desync_fix_global.h header is included but none of its declarations (tick_num, tag_sent, num_clients, etc.) are used in IntermediateRangeMask(). Consider removing it or replacing with a more minimal include if only specific definitions are needed.

🔧 Suggested fix
-#include "desync_fix_global.h"
+#include "global.h"

Or if no includes are actually required:

-#include "desync_fix_global.h"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RenderRingsFlush.cpp` at line 1, The include "desync_fix_global.h" at
the top of RenderRingsFlush.cpp is unused in the IntermediateRangeMask()
implementation; remove that include (or replace it with a more specific header
that provides only the symbols actually needed) and then rebuild to ensure no
unresolved symbols remain; specifically check the IntermediateRangeMask function
and any helper/inline functions in this translation unit for usages of tick_num,
tag_sent, num_clients, or other symbols from desync_fix_global.h before
removing, and if any are required, include only the minimal header that declares
those specific symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@section/RenderRingsFlush.cpp`:
- Line 8: Add the missing AssignString entry to Info.txt to match the other
symbols; locate the existing entries for InitTransformedVerts and
CRenFrame::Render and add a line for AssignString with its signature and address
("AssignString (0x4059E0): this@<ecx>, (char* s, size_t n), retn 8") so Info.txt
consistently documents all referenced symbols.
- Line 1: The include "desync_fix_global.h" at the top of RenderRingsFlush.cpp
is unused in the IntermediateRangeMask() implementation; remove that include (or
replace it with a more specific header that provides only the symbols actually
needed) and then rebuild to ensure no unresolved symbols remain; specifically
check the IntermediateRangeMask function and any helper/inline functions in this
translation unit for usages of tick_num, tag_sent, num_clients, or other symbols
from desync_fix_global.h before removing, and if any are required, include only
the minimal header that declares those specific symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 086a5596-9b2e-4ce4-9447-2ded30da7df3

📥 Commits

Reviewing files that changed from the base of the PR and between 8da86fe and 8027c74.

📒 Files selected for processing (4)
  • Info.txt
  • changelog.md
  • hooks/HRenderRingsFlush.cpp
  • section/RenderRingsFlush.cpp

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to the .hook format. Maybe:

// === FIRST Cast loop (fill rings) ===
// Reduce batch size from 1000 to 30 for stencil early-out optimization
0x007EF77D:
    cmp eax, 0x1E
    jl .+0x9
    mov eax, 0x1E
    nop
    nop

// Hook after func_Draw_Rings: intermediate RangeMask flush
// Replaces 7-byte mov eax,[esp+0x84] with call + 2 NOPs
0x007EF7EA:
    call IntermediateRangeMask
    nop
    nop

"movss dword ptr [esp], xmm0;"
);
asm(
"call -0x1000 +8351136;" // InitTransformedVerts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call -0x1000 +8351136; -> call 0x007F5DA0;

Address review feedback on PR FAForever#149:
- Move hook injection from hooks/HRenderRingsFlush.cpp (legacy .cpp form)
  to hooks/RenderRingsFlush.hook (canonical .hook format)
- Replace `call -0x1000 +N` address arithmetic with explicit hex addresses
  (call 0x007F5DA0 / 0x004059E0 / 0x007F6030)
- Consolidate the four split asm() blocks in IntermediateRangeMask into a
  single asm() to prevent the compiler from inserting code between them
- Document the custom register-based thiscall used by CRenFrame methods
  (this@ebx for InitTransformedVerts, this@edi for Render) and explain
  why a direct C++ call is impossible
- Add detailed header comment explaining the loop1/loop2 stencil-overflow
  analysis and why only loop 1 needs patching (loop 2 draws thin edge
  strips with negligible per-pixel overdraw)
- Drop unused desync_fix_global.h include
- Add std::string::string ctor (0x004059E0) entry to Info.txt
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
section/RenderRingsFlush.cpp (2)

86-90: Minor: consider documenting the hardcoded string address.

The address 0x00E3F8E8 for the "RangeMask" string literal works but could benefit from a brief comment noting where this address comes from (e.g., "string literal in .rdata at 0x00E3F8E8" or referencing the engine's existing usage).

📝 Suggested documentation
         // --- std::string::string(this=ecx, "RangeMask", 9) ---
         "push 9;"
-        "push 0x00E3F8E8;"                 // address of "RangeMask" string
+        "push 0x00E3F8E8;"                 // "RangeMask" literal in .rdata (used by engine at 0x7EF82C)
         "mov ecx, edi;"                    // this @ ecx
         "call 0x004059E0;"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RenderRingsFlush.cpp` around lines 86 - 90, Document the hardcoded
string address 0x00E3F8E8 used when pushing the "RangeMask" literal (seen in the
push sequence before the call to 0x004059E0) by adding a brief comment
explaining its origin (for example: "string literal in .rdata at 0x00E3F8E8" or
referencing the engine global where "RangeMask" is defined/used), so future
readers can trace why this specific address is used and where the literal comes
from.

67-106: Consider using __attribute__((naked)) for explicit stack frame control.

The stack offset calculations assume esp = caller_esp - 4 at entry. While the extensive header comments show careful stack analysis, this is fragile if the compiler generates prologue code (push ebp; mov ebp, esp).

However, similar inline-asm-only functions in the codebase (e.g., selectionPriority() in section/selectionPriority.cpp) work without this attribute, suggesting the compiler does not generate prologue for such functions in the current build environment. To make the intent explicit and improve robustness across compiler versions, consider marking the function as naked:

-void IntermediateRangeMask()
+void __attribute__((naked)) IntermediateRangeMask()

This is not a confirmed issue but a defensive measure to prevent future fragility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RenderRingsFlush.cpp` around lines 67 - 106, Add the GCC/Clang naked
attribute to the IntermediateRangeMask function declaration to prevent the
compiler from emitting a prologue/epilogue (which would invalidate the current
esp-based offsets); update the function signature to include
__attribute__((naked)) so the inline asm remains the sole emitted code and stack
layout is stable, and verify the function body remains a single asm block that
performs its own push/pop and returns (keeping the existing
IntermediateRangeMask asm sequence and restoring registers as it does now).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@section/RenderRingsFlush.cpp`:
- Around line 86-90: Document the hardcoded string address 0x00E3F8E8 used when
pushing the "RangeMask" literal (seen in the push sequence before the call to
0x004059E0) by adding a brief comment explaining its origin (for example:
"string literal in .rdata at 0x00E3F8E8" or referencing the engine global where
"RangeMask" is defined/used), so future readers can trace why this specific
address is used and where the literal comes from.
- Around line 67-106: Add the GCC/Clang naked attribute to the
IntermediateRangeMask function declaration to prevent the compiler from emitting
a prologue/epilogue (which would invalidate the current esp-based offsets);
update the function signature to include __attribute__((naked)) so the inline
asm remains the sole emitted code and stack layout is stable, and verify the
function body remains a single asm block that performs its own push/pop and
returns (keeping the existing IntermediateRangeMask asm sequence and restoring
registers as it does now).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e69a21d4-189a-469e-a495-3a3bc2bcb2d9

📥 Commits

Reviewing files that changed from the base of the PR and between 8027c74 and fe4f13b.

📒 Files selected for processing (4)
  • Info.txt
  • changelog.md
  • hooks/RenderRingsFlush.hook
  • section/RenderRingsFlush.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • changelog.md
  • Info.txt

Copy link
Copy Markdown

@RutreD RutreD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks satisfactory to me. The patch works and, in addition, increases FPS by about 15% (500 units)

@M3RT1N99
Copy link
Copy Markdown
Author

M3RT1N99 commented Apr 8, 2026

@RutreD I’m currently implementing a feature that displays health bars only for damaged units, as constantly rendering them—especially in dense unit clusters—may impact FPS.

@RutreD
Copy link
Copy Markdown

RutreD commented Apr 9, 2026

@RutreD I’m currently implementing a feature that displays health bars only for damaged units, as constantly rendering them—especially in dense unit clusters—may impact FPS.

Nice
Take a look at the shaders too. I’m no expert, calculating view * proj matrix once on the CPU and passing it to the shader might be more efficient than doing it per-vertex in the shader. See mul(viewMatrix, projMatrix) in some files under https://github.com/FAForever/fa/blob/develop/effects . We can use the float4x4 viewProjMatrix implementation from sky.fx as a reference and apply it to the rest of the shaders

@M3RT1N99
Copy link
Copy Markdown
Author

M3RT1N99 commented Apr 9, 2026

@RutreD

The real performance bottleneck with range rings is not the stencil overflow itself — it's the sheer number of rings being processed per frame. With all rings enabled and 600+ units crowded together, func_RenderRings (0x7EF5A0) drops FPS from ~110 to ~16 because every single unit renders its own ring even when fully hidden by neighbours.

I confirmed this by toggling ren_Ranges 0 in-game which instantly recovered FPS from 30 → 80. No other render toggle (ren_Shadows, ren_Fx, ren_MeshSkinned, ren_Decals) had any effect.

The stencil overflow fix (batch cap + intermediate flush) addresses the visual bug at 128+ overlapping rings but does not help with the underlying performance cost. A separate hull-cull patch on feature/range-ring-hull-cull tackles the FPS side by dropping interior rings whose outer circle is fully covered by neighbours, reducing 600 rings to ~50 with no visible change in the merged outline.

M3RT1N99 added a commit to M3RT1N99/FA-Binary-Patches that referenced this pull request Apr 9, 2026
The engine's range ring pipeline uses a 7-bit GPU stencil counter that
increments per overlapping ring fill. At 128+ rings the counter wraps,
breaking the stencil-based outline merge (individual broken circles
instead of a unified shape). This hard cap guarantees correct visuals
even for pathological formations. Makes PR FAForever#149 fully obsolete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants