Conversation
The 7-bit stencil counter in func_RenderRings overflows when 128+ units overlap, causing individual circle outlines instead of merged fill. Fix: batch size 1000->30 with intermediate RangeMask flushes in the first Cast loop. Enables GPU stencil early-out for dense clusters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Info.txt: document CRenFrame::InitTransformedVerts and CRenFrame::Render addresses and calling conventions - changelog.md: add entry under Optimizations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis change patches the range ring rendering system to fix a visual artifact occurring when 128+ overlapping units create stencil counter overflow. The fix reduces the per-batch processing size from 1000 to 30, introduces an intermediate flush mechanism via a new assembly function, and enables GPU stencil early-out optimization to prevent individual circle outlines from appearing in dense clusters. Changes
Sequence DiagramsequenceDiagram
participant RenderRings as func_RenderRings
participant Loop1 as Loop 1 (batch)
participant Flush as IntermediateRangeMask
participant Frame as CRenFrame
participant GPU as GPU Stencil
RenderRings->>Loop1: Process rings (batch size: 30)
Loop1->>Flush: Call intermediate flush
Flush->>Frame: InitTransformedVerts(width, height)
Flush->>Flush: Construct std::string("RangeMask")
Flush->>Frame: Render(width, height)
Frame->>GPU: Execute with stencil early-out
GPU-->>Frame: Return
Frame-->>Flush: Return
Flush-->>Loop1: Return
Loop1->>Loop1: Continue next batch
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
section/RenderRingsFlush.cpp (2)
8-8: Consider addingAssignStringto Info.txt for consistency.
InitTransformedVerts(0x7F5DA0) andCRenFrame::Render(0x7F6030) were added toInfo.txt, butAssignString(0x4059E0) referenced here is not. For maintainability, consider adding it alongside the other entries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@section/RenderRingsFlush.cpp` at line 8, Add the missing AssignString entry to Info.txt to match the other symbols; locate the existing entries for InitTransformedVerts and CRenFrame::Render and add a line for AssignString with its signature and address ("AssignString (0x4059E0): this@<ecx>, (char* s, size_t n), retn 8") so Info.txt consistently documents all referenced symbols.
1-1: Unused include.The
desync_fix_global.hheader is included but none of its declarations (tick_num,tag_sent,num_clients, etc.) are used inIntermediateRangeMask(). Consider removing it or replacing with a more minimal include if only specific definitions are needed.🔧 Suggested fix
-#include "desync_fix_global.h" +#include "global.h"Or if no includes are actually required:
-#include "desync_fix_global.h"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@section/RenderRingsFlush.cpp` at line 1, The include "desync_fix_global.h" at the top of RenderRingsFlush.cpp is unused in the IntermediateRangeMask() implementation; remove that include (or replace it with a more specific header that provides only the symbols actually needed) and then rebuild to ensure no unresolved symbols remain; specifically check the IntermediateRangeMask function and any helper/inline functions in this translation unit for usages of tick_num, tag_sent, num_clients, or other symbols from desync_fix_global.h before removing, and if any are required, include only the minimal header that declares those specific symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@section/RenderRingsFlush.cpp`:
- Line 8: Add the missing AssignString entry to Info.txt to match the other
symbols; locate the existing entries for InitTransformedVerts and
CRenFrame::Render and add a line for AssignString with its signature and address
("AssignString (0x4059E0): this@<ecx>, (char* s, size_t n), retn 8") so Info.txt
consistently documents all referenced symbols.
- Line 1: The include "desync_fix_global.h" at the top of RenderRingsFlush.cpp
is unused in the IntermediateRangeMask() implementation; remove that include (or
replace it with a more specific header that provides only the symbols actually
needed) and then rebuild to ensure no unresolved symbols remain; specifically
check the IntermediateRangeMask function and any helper/inline functions in this
translation unit for usages of tick_num, tag_sent, num_clients, or other symbols
from desync_fix_global.h before removing, and if any are required, include only
the minimal header that declares those specific symbols.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 086a5596-9b2e-4ce4-9447-2ded30da7df3
📒 Files selected for processing (4)
Info.txtchangelog.mdhooks/HRenderRingsFlush.cppsection/RenderRingsFlush.cpp
hooks/HRenderRingsFlush.cpp
Outdated
There was a problem hiding this comment.
Convert to the .hook format. Maybe:
// === FIRST Cast loop (fill rings) ===
// Reduce batch size from 1000 to 30 for stencil early-out optimization
0x007EF77D:
cmp eax, 0x1E
jl .+0x9
mov eax, 0x1E
nop
nop
// Hook after func_Draw_Rings: intermediate RangeMask flush
// Replaces 7-byte mov eax,[esp+0x84] with call + 2 NOPs
0x007EF7EA:
call IntermediateRangeMask
nop
nop
section/RenderRingsFlush.cpp
Outdated
| "movss dword ptr [esp], xmm0;" | ||
| ); | ||
| asm( | ||
| "call -0x1000 +8351136;" // InitTransformedVerts |
There was a problem hiding this comment.
call -0x1000 +8351136; -> call 0x007F5DA0;
Address review feedback on PR FAForever#149: - Move hook injection from hooks/HRenderRingsFlush.cpp (legacy .cpp form) to hooks/RenderRingsFlush.hook (canonical .hook format) - Replace `call -0x1000 +N` address arithmetic with explicit hex addresses (call 0x007F5DA0 / 0x004059E0 / 0x007F6030) - Consolidate the four split asm() blocks in IntermediateRangeMask into a single asm() to prevent the compiler from inserting code between them - Document the custom register-based thiscall used by CRenFrame methods (this@ebx for InitTransformedVerts, this@edi for Render) and explain why a direct C++ call is impossible - Add detailed header comment explaining the loop1/loop2 stencil-overflow analysis and why only loop 1 needs patching (loop 2 draws thin edge strips with negligible per-pixel overdraw) - Drop unused desync_fix_global.h include - Add std::string::string ctor (0x004059E0) entry to Info.txt
There was a problem hiding this comment.
🧹 Nitpick comments (2)
section/RenderRingsFlush.cpp (2)
86-90: Minor: consider documenting the hardcoded string address.The address
0x00E3F8E8for the"RangeMask"string literal works but could benefit from a brief comment noting where this address comes from (e.g., "string literal in .rdata at 0x00E3F8E8" or referencing the engine's existing usage).📝 Suggested documentation
// --- std::string::string(this=ecx, "RangeMask", 9) --- "push 9;" - "push 0x00E3F8E8;" // address of "RangeMask" string + "push 0x00E3F8E8;" // "RangeMask" literal in .rdata (used by engine at 0x7EF82C) "mov ecx, edi;" // this @ ecx "call 0x004059E0;"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@section/RenderRingsFlush.cpp` around lines 86 - 90, Document the hardcoded string address 0x00E3F8E8 used when pushing the "RangeMask" literal (seen in the push sequence before the call to 0x004059E0) by adding a brief comment explaining its origin (for example: "string literal in .rdata at 0x00E3F8E8" or referencing the engine global where "RangeMask" is defined/used), so future readers can trace why this specific address is used and where the literal comes from.
67-106: Consider using__attribute__((naked))for explicit stack frame control.The stack offset calculations assume
esp = caller_esp - 4at entry. While the extensive header comments show careful stack analysis, this is fragile if the compiler generates prologue code (push ebp; mov ebp, esp).However, similar inline-asm-only functions in the codebase (e.g.,
selectionPriority()insection/selectionPriority.cpp) work without this attribute, suggesting the compiler does not generate prologue for such functions in the current build environment. To make the intent explicit and improve robustness across compiler versions, consider marking the function as naked:-void IntermediateRangeMask() +void __attribute__((naked)) IntermediateRangeMask()This is not a confirmed issue but a defensive measure to prevent future fragility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@section/RenderRingsFlush.cpp` around lines 67 - 106, Add the GCC/Clang naked attribute to the IntermediateRangeMask function declaration to prevent the compiler from emitting a prologue/epilogue (which would invalidate the current esp-based offsets); update the function signature to include __attribute__((naked)) so the inline asm remains the sole emitted code and stack layout is stable, and verify the function body remains a single asm block that performs its own push/pop and returns (keeping the existing IntermediateRangeMask asm sequence and restoring registers as it does now).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@section/RenderRingsFlush.cpp`:
- Around line 86-90: Document the hardcoded string address 0x00E3F8E8 used when
pushing the "RangeMask" literal (seen in the push sequence before the call to
0x004059E0) by adding a brief comment explaining its origin (for example:
"string literal in .rdata at 0x00E3F8E8" or referencing the engine global where
"RangeMask" is defined/used), so future readers can trace why this specific
address is used and where the literal comes from.
- Around line 67-106: Add the GCC/Clang naked attribute to the
IntermediateRangeMask function declaration to prevent the compiler from emitting
a prologue/epilogue (which would invalidate the current esp-based offsets);
update the function signature to include __attribute__((naked)) so the inline
asm remains the sole emitted code and stack layout is stable, and verify the
function body remains a single asm block that performs its own push/pop and
returns (keeping the existing IntermediateRangeMask asm sequence and restoring
registers as it does now).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e69a21d4-189a-469e-a495-3a3bc2bcb2d9
📒 Files selected for processing (4)
Info.txtchangelog.mdhooks/RenderRingsFlush.hooksection/RenderRingsFlush.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- changelog.md
- Info.txt
|
@RutreD I’m currently implementing a feature that displays health bars only for damaged units, as constantly rendering them—especially in dense unit clusters—may impact FPS. |
Nice |
|
The real performance bottleneck with range rings is not the stencil overflow itself — it's the sheer number of rings being processed per frame. With all rings enabled and 600+ units crowded together, func_RenderRings (0x7EF5A0) drops FPS from ~110 to ~16 because every single unit renders its own ring even when fully hidden by neighbours. I confirmed this by toggling ren_Ranges 0 in-game which instantly recovered FPS from 30 → 80. No other render toggle (ren_Shadows, ren_Fx, ren_MeshSkinned, ren_Decals) had any effect. The stencil overflow fix (batch cap + intermediate flush) addresses the visual bug at 128+ overlapping rings but does not help with the underlying performance cost. A separate hull-cull patch on feature/range-ring-hull-cull tackles the FPS side by dropping interior rings whose outer circle is fully covered by neighbours, reducing 600 rings to ~50 with no visible change in the merged outline. |
The engine's range ring pipeline uses a 7-bit GPU stencil counter that increments per overlapping ring fill. At 128+ rings the counter wraps, breaking the stencil-based outline merge (individual broken circles instead of a unified shape). This hard cap guarantees correct visuals even for pathological formations. Makes PR FAForever#149 fully obsolete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix range ring stencil overflow with 128+ overlapping units
Problem
When 128+ units overlap (e.g., 200 engineers in a cluster), their range ring overlays render as individual circles instead of merging into a single unified shape. This is caused by the 7-bit stencil counter in func_RenderRings (0x7EF5A0) wrapping from 0x7F to 0x00, creating holes in the fill area that the boundary Cast loop then fills with visible ring outlines.
Old:

New:

Fix
Insert intermediate RangeMask flushes in the first Cast loop of func_RenderRings. Batch size is reduced from 1000 to 30 units. After each batch, a RangeMask fullscreen pass converts accumulated stencil marks to bit 7 before overflow can occur. Subsequent Cast operations skip already-marked pixels via StencilFunc=notequal on bit 7, which also enables GPU stencil early-out for dense clusters.
Files
hooks/HRenderRingsFlush.cpp — 3 hooks: two data patches reducing batch size at 0x7EF77E/0x7EF785, one call hook at 0x7EF7EA
section/RenderRingsFlush.cpp — IntermediateRangeMask() function replicating the RangeMask sequence from 0x7EF802-0x7EF849
section.ld — external symbols 0x7F5DA0 (InitTransformedVerts) and 0x7F6030 (CRenFrame::Render)
Test instructions
Spawn 200+ engineers at the same location
Enable Build Range overlay — should show one merged fill area, no individual circles
Select a single unit — normal range ring, no artifacts
Compare FPS with large clusters before/after — should be equal or better due to stencil early-out
Checklist
Summary by CodeRabbit