Add range ring hull cull for dense crowd FPS recovery#150
Add range ring hull cull for dense crowd FPS recovery#150M3RT1N99 wants to merge 6 commits intoFAForever:masterfrom
Conversation
func_RenderRings (0x007EF5A0) renders one ring per visible unit per weapon category. With dense crowds (600+ units) the per-ring setup cost dominates the renderer and tanks frame time even though most interior units' rings are completely hidden by their neighbours' rings. This patch installs a JMP trampoline at 0x007EF5E2 (replacing the 5-byte `mov eax, sWldMap` right after the count is computed) that calls a small C compaction routine. The routine walks the position vector and keeps only entries whose disk is NOT covered by an already-kept neighbour, using a greedy 16-sample geometric coverage test on the outer circle. The trampoline patches the loop counter (`ebp`) on the way out so the two downstream batch loops only iterate the smaller, hull-only count. Vector entry layout was reverse-engineered from Moho::WeaponExtractor::Range (0x7EC650): [0]=worldX, [1]=worldZ, [2]=innerRadius, [3]=outerRadius Disabled by default. Enable with `ui_RangeRingClusterHull 1` in the in-game console. Visual is identical to vanilla because culled units' disks are guaranteed to be inside the kept set's union, so the engine's stencil-based outline rendering produces the same merged hull. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update header comment to describe actual greedy 16-sample algorithm (was still describing the old 4-quadrant topological test) - Fix ConVar description to match current behavior - Fix hook comment referencing old ConVar name ui_RangeRingClusterDistance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an optional hull-culling stage for range-ring rendering: a greedy, in-place compaction that drops rings whose outer circles are fully covered by earlier-kept rings. Introduces Changes
Sequence Diagram(s)sequenceDiagram
participant Renderer as Renderer
participant Trampoline as `@0x007EF5E2` Trampoline
participant Cluster as ClusterRingPositions
participant RingBuf as RingBuffer (float* data)
Renderer->>Trampoline: jmp (hook at 0x007EF5E2)
Trampoline->>Cluster: call ClusterRingPositions(RingBuf.ptr, orig_count)
Cluster-->>Trampoline: returns clustered_count
Trampoline->>Trampoline: write clustered_count to saved ebp slot
Trampoline->>Renderer: replay displaced instr, jmp back to 0x007EF5E7
Renderer->>RingBuf: render using clustered_count
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@section/RangeRingCluster.cpp`:
- Line 73: The current early-return when g_RingClusterHull == 0.0f || count <= 4
does not guarantee the hard cap the renderer expects because writeIdx may still
equal the original count; modify the logic around writeIdx (and the return
value) in RangeRingCluster.cpp so after any packing/compaction (the code using
writeIdx, and also in the similar block around lines 129-139) you explicitly
clamp writeIdx (or the final returned count) to the renderer's hard limit (e.g.,
127) using something like writeIdx = std::min(writeIdx, HARD_RING_COUNT_LIMIT)
and return that clamped value to ensure the renderer never sees >127 entries
regardless of g_RingClusterHull or sparsity.
- Around line 103-127: The fixed 16-sample loop can under-sample large rings;
replace the magic "16" in the sampling loop with an adaptive sampleCount
computed from pOuter and nearby ring sizes: compute the smallest neighbour outer
radius (iterate j over writeIdx and read Q[3]) and set sampleCount = max(16,
ceil(2*PI * pOuter / max(1e-6f, minNeighbourOuter * 0.5f))) (or another
conservative fraction), then iterate k from 0..sampleCount-1 using COSDIR/SINDIR
scaled or computed per-index; this increases angular resolution for large pOuter
so fully_covered is only set when sampled spacing is small enough to avoid
missing uncovered arcs. Ensure you still fall back to 16 when no neighbours
found (minNeighbourOuter undefined).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c607c04-ce22-4d3c-ba55-6571879106e4
📒 Files selected for processing (2)
hooks/RangeRingCluster.cppsection/RangeRingCluster.cpp
- Rename hooks/RangeRingCluster.cpp → .hook (pure JMP trampoline) - Add changelog.md entry under Optimizations - Document discovered addresses in Info.txt (func_RenderRings, WeaponExtractor::Range, sWldMap, ring entry layout, sub_7F32E0, sub_7EDC80) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hooks/RangeRingCluster.hook`:
- Around line 6-7: The trampoline at RangeRingClusterTrampoline currently
preserves registers with pushad/popad but not EFLAGS; modify the trampoline to
pushfd immediately after pushad and popfd immediately before popad so EFLAGS are
saved and restored across the C call (i.e., add pushfd right after the existing
pushad and add popfd just before the existing popad) ensuring the original code
at 0x7EF5E7 sees the same EFLAGS state as without the hook.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 661e08b7-6763-455f-9a25-b184b0011f3c
📒 Files selected for processing (3)
Info.txtchangelog.mdhooks/RangeRingCluster.hook
✅ Files skipped from review due to trivial changes (1)
- Info.txt
| 0x007EF5E2: | ||
| jmp @RangeRingClusterTrampoline No newline at end of file |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect trampoline register/flag preservation:"
rg -n -C5 'RangeRingClusterTrampoline|pushad|popad|pushfd|popfd|0x007EF5E7|0x10a6438' section/RangeRingCluster.cpp
echo
echo "Inspect any documented disassembly around 0x007EF5E2/0x007EF5E7 (if present):"
fd -i 'Info.txt' | xargs -r rg -n -C8 '007EF5E2|007EF5E7|7EF5E'Repository: FAForever/FA-Binary-Patches
Length of output: 1884
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for any references to 0x007EF5E7 or 0x7EF5E7 in codebase ==="
rg -i '0x*7EF5E7|RangeRingCluster' --max-count=20
echo
echo "=== Check for test files or additional documentation ==="
fd -i 'test|spec|info|readme' | head -20
echo
echo "=== Look for any conditional code patterns after the hook site ==="
fd -i '\.cpp$|\.asm$|\.txt$' | xargs -r rg -l 'func_RenderRings|RangeRingCluster' 2>/dev/null | head -10Repository: FAForever/FA-Binary-Patches
Length of output: 1140
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Reading Info.txt to find RangeRingCluster or 0x7EF5E context ==="
rg -n -B5 -A10 'RangeRingCluster|0x7EF5E|007EF5E2|007EF5E7' Info.txt
echo
echo "=== Full Info.txt file size and structure ==="
wc -l Info.txt
head -50 Info.txtRepository: FAForever/FA-Binary-Patches
Length of output: 139
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check Info.txt structure ==="
head -100 Info.txt
echo
echo "=== Search for any binary info or disassembly files ==="
find . -type f \( -name '*.txt' -o -name '*.md' -o -name '*disasm*' \) | head -20
echo
echo "=== Examine the hook file directly ==="
cat hooks/RangeRingCluster.hookRepository: FAForever/FA-Binary-Patches
Length of output: 3642
Add pushfd/popfd to preserve EFLAGS across the C call in the trampoline.
The trampoline executes add esp, 8 after the C call, which modifies EFLAGS (ZF, CF, OF, SF, AF, PF). These flags are never restored before the re-executed mov and return to 0x007EF5E7. The code's own comment states the goal is "so the original code at 0x7EF5E7 sees the same machine state it would have without the hook," but the current implementation contradicts this by leaking flag changes.
Add pushfd immediately after pushad and popfd immediately before popad to preserve flags across the flag-modifying stack adjustment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hooks/RangeRingCluster.hook` around lines 6 - 7, The trampoline at
RangeRingClusterTrampoline currently preserves registers with pushad/popad but
not EFLAGS; modify the trampoline to pushfd immediately after pushad and popfd
immediately before popad so EFLAGS are saved and restored across the C call
(i.e., add pushfd right after the existing pushad and add popfd just before the
existing popad) ensuring the original code at 0x7EF5E7 sees the same EFLAGS
state as without the hook.
The engine's range ring pipeline uses a 7-bit GPU stencil counter that increments per overlapping ring fill. At 128+ rings the counter wraps, breaking the stencil-based outline merge (individual broken circles instead of a unified shape). This hard cap guarantees correct visuals even for pathological formations. Makes PR FAForever#149 fully obsolete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
section/RangeRingCluster.cpp (1)
129-136: Minor: Avoid self-copy wheni == writeIdx.When a ring is the first kept entry or happens to be at the write position, it copies to itself unnecessarily.
♻️ Proposed optimization
if (!fully_covered) { - float *dest = data + writeIdx * 4; - dest[0] = px; - dest[1] = pz; - dest[2] = pInner; - dest[3] = pOuter; + if (writeIdx != i) { + float *dest = data + writeIdx * 4; + dest[0] = px; + dest[1] = pz; + dest[2] = pInner; + dest[3] = pOuter; + } writeIdx++; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@section/RangeRingCluster.cpp` around lines 129 - 136, The loop that writes kept rings unconditionally copies four floats into data at index writeIdx even when the source index equals writeIdx, causing an unnecessary self-copy; update the write logic (the block that assigns dest[0..3] = px,pz,pInner,pOuter and increments writeIdx) to only perform the copy when the source index differs from writeIdx (i.e., if (i != writeIdx) { ... }), otherwise just increment writeIdx (or skip assignment), referencing the existing variables/data buffer and writeIdx to avoid changing surrounding logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@section/RangeRingCluster.cpp`:
- Around line 129-136: The loop that writes kept rings unconditionally copies
four floats into data at index writeIdx even when the source index equals
writeIdx, causing an unnecessary self-copy; update the write logic (the block
that assigns dest[0..3] = px,pz,pInner,pOuter and increments writeIdx) to only
perform the copy when the source index differs from writeIdx (i.e., if (i !=
writeIdx) { ... }), otherwise just increment writeIdx (or skip assignment),
referencing the existing variables/data buffer and writeIdx to avoid changing
surrounding logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1050ddb7-91ac-492f-bc89-37928bb15b88
📒 Files selected for processing (1)
section/RangeRingCluster.cpp
- Bump outer-circle samples from 16 (22.5 deg) to 24 (15 deg) to reduce outline choppiness at cluster boundaries - Remove the 127 hard cap: it drops real boundary entries that contribute visible outline. For pathological formations with 128+ boundary units, the minor stencil overflow artifact is preferable to missing rings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@section/RangeRingCluster.cpp`:
- Around line 108-123: The current cull loop only samples the outer radius
(pOuter) to set fully_covered, so annuli with a visible inner hole can be
incorrectly culled; update the loop that iterates k (the 24-direction samples)
to also test samples at pInner when pInner > 0 using the same coverage logic
against existing kept entries (writeIdx, data, Q) and require both the outer and
inner sample to be covered before marking fully_covered, or alternatively, if
you prefer simpler conservative behavior, skip annulus culling when pInner > 0
(i.e., do not set fully_covered for rings with pInner > 0). Ensure you reference
and reuse the same distance/inner/outer checks (distSq vs qOuter*qOuter and
qInner*qInner) as currently used for pOuter.
- Around line 21-23: Update the console/help description string that still says
"greedy 16-sample geometric test" to reflect the current implementation using 24
samples; locate the descriptive string in RangeRingCluster.cpp (the
option/description registered for the range-ring culling knob used by
RangeRingCluster) and replace "16-sample" with "24-sample" so the help text
matches the implementation and performance/quality guidance is accurate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: db09cf1c-268c-49e6-9d4b-8222526980ce
📒 Files selected for processing (1)
section/RangeRingCluster.cpp
| for (int k = 0; k < 24 && fully_covered; ++k) { | ||
| float tx = px + pOuter * COSDIR[k]; | ||
| float tz = pz + pOuter * SINDIR[k]; | ||
|
|
||
| bool sample_covered = false; | ||
| for (int j = 0; j < writeIdx; ++j) { // ONLY already-kept set | ||
| float *Q = data + j * 4; | ||
| float dx = Q[0] - tx; | ||
| float dz = Q[1] - tz; | ||
| float distSq = dx * dx + dz * dz; | ||
|
|
||
| float qOuter = Q[3]; | ||
| if (distSq > qOuter * qOuter) continue; | ||
|
|
||
| float qInner = Q[2]; | ||
| if (qInner > 0.0f && distSq < qInner * qInner) continue; |
There was a problem hiding this comment.
Outer-circle coverage is not enough for annuli.
fully_covered is decided only from samples on pOuter, but pInner never participates in the cull test. For mixed [innerR, outerR] batches, that can drop a ring whose outer boundary is hidden while its min-range hole is still visible in the merged stencil. Please either add an inner-circle coverage check when pInner > 0, or conservatively skip culling annuli until both boundaries are proven hidden.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@section/RangeRingCluster.cpp` around lines 108 - 123, The current cull loop
only samples the outer radius (pOuter) to set fully_covered, so annuli with a
visible inner hole can be incorrectly culled; update the loop that iterates k
(the 24-direction samples) to also test samples at pInner when pInner > 0 using
the same coverage logic against existing kept entries (writeIdx, data, Q) and
require both the outer and inner sample to be covered before marking
fully_covered, or alternatively, if you prefer simpler conservative behavior,
skip annulus culling when pInner > 0 (i.e., do not set fully_covered for rings
with pInner > 0). Ensure you reference and reuse the same distance/inner/outer
checks (distSq vs qOuter*qOuter and qInner*qInner) as currently used for pOuter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Regarding the self-copy nitpick (guard Regarding inner-radius (annuli) sampling: most units have Both CodeRabbit review suggestions addressed. The only code change applied was the description string fix (16 → 24 samples). |
Summary
func_RenderRings(0x007EF5A0) that drops interior range rings whose outer circle is fully covered by already-kept neighboursren_Ranges 0baseline)ui_RangeRingClusterHull 1(disabled by default)fix/range-ring-stencil-overflow) obsolete — with hull-cull active, ring count per batch never exceeds 127 so the stencil counter cannot overflowDemo
How it works
JMP trampoline at 0x007EF5E2 (replaces 5-byte
mov eax, sWldMap) calls a C compaction routine before the rendering loops. For each entry (processed in order):[innerR, outerR]bandThe greedy order guarantees every culled unit is covered by units that survive in the final set, so the engine's stencil-based outline rendering produces the same merged hull from fewer rings.
Entry layout was reverse-engineered from
Moho::WeaponExtractor::Range(0x7EC650):[0]=worldX, [1]=worldZ, [2]=innerRadius, [3]=outerRadiusPerformance
Test plan
ui_RangeRingClusterHull 1and compare visual to vanilla (0) — merged ring outline should be identical🤖 Generated with Claude Code
Summary by CodeRabbit