Skip to content

Add range ring hull cull for dense crowd FPS recovery#150

Open
M3RT1N99 wants to merge 6 commits intoFAForever:masterfrom
M3RT1N99:feature/range-ring-hull-cull
Open

Add range ring hull cull for dense crowd FPS recovery#150
M3RT1N99 wants to merge 6 commits intoFAForever:masterfrom
M3RT1N99:feature/range-ring-hull-cull

Conversation

@M3RT1N99
Copy link
Copy Markdown

@M3RT1N99 M3RT1N99 commented Apr 9, 2026

Summary

  • Adds a greedy hull-cull pass to func_RenderRings (0x007EF5A0) that drops interior range rings whose outer circle is fully covered by already-kept neighbours
  • With 600+ units crowded together and all range rings enabled, FPS recovers from ~16 to ~80 (confirmed via ren_Ranges 0 baseline)
  • Toggle via console: ui_RangeRingClusterHull 1 (disabled by default)
  • Makes Fix/range ring stencil overflow #149 (fix/range-ring-stencil-overflow) obsolete — with hull-cull active, ring count per batch never exceeds 127 so the stencil counter cannot overflow

Demo

Before After
600 units, all rings enabled Same scene with hull-cull

How it works

JMP trampoline at 0x007EF5E2 (replaces 5-byte mov eax, sWldMap) calls a C compaction routine before the rendering loops. For each entry (processed in order):

  1. Sample 16 points on the outer circle (every 22.5°)
  2. Check if each sample lies inside any already-kept neighbour's [innerR, outerR] band
  3. If all 16 covered → drop (interior). Otherwise → keep (boundary/isolated)

The greedy order guarantees every culled unit is covered by units that survive in the final set, so the engine's stencil-based outline rendering produces the same merged hull from fewer rings.

Entry layout was reverse-engineered from Moho::WeaponExtractor::Range (0x7EC650):
[0]=worldX, [1]=worldZ, [2]=innerRadius, [3]=outerRadius

Performance

Scenario Before After
600 units, all rings, zoomed in ~16 FPS ~80 FPS
CPU cost of cull pass ~0.5ms/frame
Rings rendered 600 ~50 (hull only)

Test plan

  • Spawn 600+ units in sandbox, enable all range rings, zoom in — verify FPS stays above 60
  • Toggle ui_RangeRingClusterHull 1 and compare visual to vanilla (0) — merged ring outline should be identical
  • Verify ACU / unique-radius units keep their rings when surrounded by smaller units
  • Verify isolated units and sparse formations render all rings unchanged

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added an optional "Range Ring Hull Culling" toggle (off by default) that hides interior range rings fully covered by neighbors. When enabled, visual clutter and overdraw in dense unit groups are reduced, preserving outermost tactical rings for clearer range visualization and modest rendering improvement.

M3RT1N99 and others added 2 commits April 9, 2026 02:42
func_RenderRings (0x007EF5A0) renders one ring per visible unit per
weapon category. With dense crowds (600+ units) the per-ring setup cost
dominates the renderer and tanks frame time even though most interior
units' rings are completely hidden by their neighbours' rings.

This patch installs a JMP trampoline at 0x007EF5E2 (replacing the
5-byte `mov eax, sWldMap` right after the count is computed) that
calls a small C compaction routine. The routine walks the position
vector and keeps only entries whose disk is NOT covered by an
already-kept neighbour, using a greedy 16-sample geometric coverage
test on the outer circle. The trampoline patches the loop counter
(`ebp`) on the way out so the two downstream batch loops only iterate
the smaller, hull-only count.

Vector entry layout was reverse-engineered from
Moho::WeaponExtractor::Range (0x7EC650):
  [0]=worldX, [1]=worldZ, [2]=innerRadius, [3]=outerRadius

Disabled by default. Enable with `ui_RangeRingClusterHull 1` in the
in-game console. Visual is identical to vanilla because culled units'
disks are guaranteed to be inside the kept set's union, so the engine's
stencil-based outline rendering produces the same merged hull.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update header comment to describe actual greedy 16-sample algorithm
  (was still describing the old 4-quadrant topological test)
- Fix ConVar description to match current behavior
- Fix hook comment referencing old ConVar name ui_RangeRingClusterDistance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b5c6f4c-ec63-4f8f-a745-4f4cb0b2346c

📥 Commits

Reviewing files that changed from the base of the PR and between daa19e3 and 9fa8743.

📒 Files selected for processing (1)
  • section/RangeRingCluster.cpp

📝 Walkthrough

Walkthrough

Adds an optional hull-culling stage for range-ring rendering: a greedy, in-place compaction that drops rings whose outer circles are fully covered by earlier-kept rings. Introduces g_RingClusterHull (exposed as ui_RangeRingClusterHull), exported ClusterRingPositions, and an assembly trampoline hook at 0x007EF5E2.

Changes

Cohort / File(s) Summary
Core Culling Implementation
section/RangeRingCluster.cpp
New module adding float g_RingClusterHull, ConDescReg ring_cluster_hull_reg (ui_RangeRingClusterHull) and extern "C" int ClusterRingPositions(float *data, int count). Implements greedy in-place compaction of ring entries (4 floats each: world X, world Z, inner, outer) sampling 24 directions; returns kept count or original when disabled/too small.
Hook / Trampoline
hooks/RangeRingCluster.hook
New hook at 0x007EF5E2 replacing mov eax, ds:0x10a6438 with a jump to RangeRingClusterTrampoline. Trampoline preserves registers, calls _ClusterRingPositions with the ring buffer pointer and count, writes returned clustered count into the saved ebp slot, replays the displaced instruction, and resumes at 0x007EF5E7.
Docs / Changelog / Notes
changelog.md, Info.txt
Added changelog entry and annotated comments documenting the range-ring hull-cull patch, function entry points, and ring-entry memory layout; no exported/public symbol changes in docs.

Sequence Diagram(s)

sequenceDiagram
    participant Renderer as Renderer
    participant Trampoline as `@0x007EF5E2` Trampoline
    participant Cluster as ClusterRingPositions
    participant RingBuf as RingBuffer (float* data)

    Renderer->>Trampoline: jmp (hook at 0x007EF5E2)
    Trampoline->>Cluster: call ClusterRingPositions(RingBuf.ptr, orig_count)
    Cluster-->>Trampoline: returns clustered_count
    Trampoline->>Trampoline: write clustered_count to saved ebp slot
    Trampoline->>Renderer: replay displaced instr, jmp back to 0x007EF5E7
    Renderer->>RingBuf: render using clustered_count
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble angles, twenty-four a-go,
I fold the rings where neighbors grow.
A hop, a copy, a compacted tune,
Fewer circles beneath the moon.
Hop-hop hooray — the render's light and new!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: a range ring hull cull optimization for improving FPS in dense crowd scenarios.
Description check ✅ Passed The description provides comprehensive details on implementation, performance, and testing, but the checklist items are not explicitly marked as completed.
Linked Issues check ✅ Passed The PR description references issue #149 and explains how this change makes it obsolete, demonstrating awareness of related work and proper issue linkage.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the range ring hull cull feature; no unrelated modifications or scope creep detected across the four modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@section/RangeRingCluster.cpp`:
- Line 73: The current early-return when g_RingClusterHull == 0.0f || count <= 4
does not guarantee the hard cap the renderer expects because writeIdx may still
equal the original count; modify the logic around writeIdx (and the return
value) in RangeRingCluster.cpp so after any packing/compaction (the code using
writeIdx, and also in the similar block around lines 129-139) you explicitly
clamp writeIdx (or the final returned count) to the renderer's hard limit (e.g.,
127) using something like writeIdx = std::min(writeIdx, HARD_RING_COUNT_LIMIT)
and return that clamped value to ensure the renderer never sees >127 entries
regardless of g_RingClusterHull or sparsity.
- Around line 103-127: The fixed 16-sample loop can under-sample large rings;
replace the magic "16" in the sampling loop with an adaptive sampleCount
computed from pOuter and nearby ring sizes: compute the smallest neighbour outer
radius (iterate j over writeIdx and read Q[3]) and set sampleCount = max(16,
ceil(2*PI * pOuter / max(1e-6f, minNeighbourOuter * 0.5f))) (or another
conservative fraction), then iterate k from 0..sampleCount-1 using COSDIR/SINDIR
scaled or computed per-index; this increases angular resolution for large pOuter
so fully_covered is only set when sampled spacing is small enough to avoid
missing uncovered arcs. Ensure you still fall back to 16 when no neighbours
found (minNeighbourOuter undefined).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c607c04-ce22-4d3c-ba55-6571879106e4

📥 Commits

Reviewing files that changed from the base of the PR and between 8da86fe and da66f12.

📒 Files selected for processing (2)
  • hooks/RangeRingCluster.cpp
  • section/RangeRingCluster.cpp

- Rename hooks/RangeRingCluster.cpp → .hook (pure JMP trampoline)
- Add changelog.md entry under Optimizations
- Document discovered addresses in Info.txt (func_RenderRings,
  WeaponExtractor::Range, sWldMap, ring entry layout, sub_7F32E0,
  sub_7EDC80)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/RangeRingCluster.hook`:
- Around line 6-7: The trampoline at RangeRingClusterTrampoline currently
preserves registers with pushad/popad but not EFLAGS; modify the trampoline to
pushfd immediately after pushad and popfd immediately before popad so EFLAGS are
saved and restored across the C call (i.e., add pushfd right after the existing
pushad and add popfd just before the existing popad) ensuring the original code
at 0x7EF5E7 sees the same EFLAGS state as without the hook.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 661e08b7-6763-455f-9a25-b184b0011f3c

📥 Commits

Reviewing files that changed from the base of the PR and between da66f12 and d29d932.

📒 Files selected for processing (3)
  • Info.txt
  • changelog.md
  • hooks/RangeRingCluster.hook
✅ Files skipped from review due to trivial changes (1)
  • Info.txt

Comment on lines +6 to +7
0x007EF5E2:
jmp @RangeRingClusterTrampoline No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect trampoline register/flag preservation:"
rg -n -C5 'RangeRingClusterTrampoline|pushad|popad|pushfd|popfd|0x007EF5E7|0x10a6438' section/RangeRingCluster.cpp

echo
echo "Inspect any documented disassembly around 0x007EF5E2/0x007EF5E7 (if present):"
fd -i 'Info.txt' | xargs -r rg -n -C8 '007EF5E2|007EF5E7|7EF5E'

Repository: FAForever/FA-Binary-Patches

Length of output: 1884


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for any references to 0x007EF5E7 or 0x7EF5E7 in codebase ==="
rg -i '0x*7EF5E7|RangeRingCluster' --max-count=20

echo
echo "=== Check for test files or additional documentation ==="
fd -i 'test|spec|info|readme' | head -20

echo
echo "=== Look for any conditional code patterns after the hook site ==="
fd -i '\.cpp$|\.asm$|\.txt$' | xargs -r rg -l 'func_RenderRings|RangeRingCluster' 2>/dev/null | head -10

Repository: FAForever/FA-Binary-Patches

Length of output: 1140


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Reading Info.txt to find RangeRingCluster or 0x7EF5E context ==="
rg -n -B5 -A10 'RangeRingCluster|0x7EF5E|007EF5E2|007EF5E7' Info.txt

echo
echo "=== Full Info.txt file size and structure ==="
wc -l Info.txt
head -50 Info.txt

Repository: FAForever/FA-Binary-Patches

Length of output: 139


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check Info.txt structure ==="
head -100 Info.txt

echo
echo "=== Search for any binary info or disassembly files ==="
find . -type f \( -name '*.txt' -o -name '*.md' -o -name '*disasm*' \) | head -20

echo
echo "=== Examine the hook file directly ==="
cat hooks/RangeRingCluster.hook

Repository: FAForever/FA-Binary-Patches

Length of output: 3642


Add pushfd/popfd to preserve EFLAGS across the C call in the trampoline.

The trampoline executes add esp, 8 after the C call, which modifies EFLAGS (ZF, CF, OF, SF, AF, PF). These flags are never restored before the re-executed mov and return to 0x007EF5E7. The code's own comment states the goal is "so the original code at 0x7EF5E7 sees the same machine state it would have without the hook," but the current implementation contradicts this by leaking flag changes.

Add pushfd immediately after pushad and popfd immediately before popad to preserve flags across the flag-modifying stack adjustment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hooks/RangeRingCluster.hook` around lines 6 - 7, The trampoline at
RangeRingClusterTrampoline currently preserves registers with pushad/popad but
not EFLAGS; modify the trampoline to pushfd immediately after pushad and popfd
immediately before popad so EFLAGS are saved and restored across the C call
(i.e., add pushfd right after the existing pushad and add popfd just before the
existing popad) ensuring the original code at 0x7EF5E7 sees the same EFLAGS
state as without the hook.

The engine's range ring pipeline uses a 7-bit GPU stencil counter that
increments per overlapping ring fill. At 128+ rings the counter wraps,
breaking the stencil-based outline merge (individual broken circles
instead of a unified shape). This hard cap guarantees correct visuals
even for pathological formations. Makes PR FAForever#149 fully obsolete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
section/RangeRingCluster.cpp (1)

129-136: Minor: Avoid self-copy when i == writeIdx.

When a ring is the first kept entry or happens to be at the write position, it copies to itself unnecessarily.

♻️ Proposed optimization
         if (!fully_covered) {
-            float *dest = data + writeIdx * 4;
-            dest[0] = px;
-            dest[1] = pz;
-            dest[2] = pInner;
-            dest[3] = pOuter;
+            if (writeIdx != i) {
+                float *dest = data + writeIdx * 4;
+                dest[0] = px;
+                dest[1] = pz;
+                dest[2] = pInner;
+                dest[3] = pOuter;
+            }
             writeIdx++;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RangeRingCluster.cpp` around lines 129 - 136, The loop that writes
kept rings unconditionally copies four floats into data at index writeIdx even
when the source index equals writeIdx, causing an unnecessary self-copy; update
the write logic (the block that assigns dest[0..3] = px,pz,pInner,pOuter and
increments writeIdx) to only perform the copy when the source index differs from
writeIdx (i.e., if (i != writeIdx) { ... }), otherwise just increment writeIdx
(or skip assignment), referencing the existing variables/data buffer and
writeIdx to avoid changing surrounding logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@section/RangeRingCluster.cpp`:
- Around line 129-136: The loop that writes kept rings unconditionally copies
four floats into data at index writeIdx even when the source index equals
writeIdx, causing an unnecessary self-copy; update the write logic (the block
that assigns dest[0..3] = px,pz,pInner,pOuter and increments writeIdx) to only
perform the copy when the source index differs from writeIdx (i.e., if (i !=
writeIdx) { ... }), otherwise just increment writeIdx (or skip assignment),
referencing the existing variables/data buffer and writeIdx to avoid changing
surrounding logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1050ddb7-91ac-492f-bc89-37928bb15b88

📥 Commits

Reviewing files that changed from the base of the PR and between d29d932 and acd7b06.

📒 Files selected for processing (1)
  • section/RangeRingCluster.cpp

- Bump outer-circle samples from 16 (22.5 deg) to 24 (15 deg) to reduce
  outline choppiness at cluster boundaries
- Remove the 127 hard cap: it drops real boundary entries that contribute
  visible outline. For pathological formations with 128+ boundary units,
  the minor stencil overflow artifact is preferable to missing rings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@section/RangeRingCluster.cpp`:
- Around line 108-123: The current cull loop only samples the outer radius
(pOuter) to set fully_covered, so annuli with a visible inner hole can be
incorrectly culled; update the loop that iterates k (the 24-direction samples)
to also test samples at pInner when pInner > 0 using the same coverage logic
against existing kept entries (writeIdx, data, Q) and require both the outer and
inner sample to be covered before marking fully_covered, or alternatively, if
you prefer simpler conservative behavior, skip annulus culling when pInner > 0
(i.e., do not set fully_covered for rings with pInner > 0). Ensure you reference
and reuse the same distance/inner/outer checks (distSq vs qOuter*qOuter and
qInner*qInner) as currently used for pOuter.
- Around line 21-23: Update the console/help description string that still says
"greedy 16-sample geometric test" to reflect the current implementation using 24
samples; locate the descriptive string in RangeRingCluster.cpp (the
option/description registered for the range-ring culling knob used by
RangeRingCluster) and replace "16-sample" with "24-sample" so the help text
matches the implementation and performance/quality guidance is accurate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db09cf1c-268c-49e6-9d4b-8222526980ce

📥 Commits

Reviewing files that changed from the base of the PR and between acd7b06 and daa19e3.

📒 Files selected for processing (1)
  • section/RangeRingCluster.cpp

Comment on lines +108 to +123
for (int k = 0; k < 24 && fully_covered; ++k) {
float tx = px + pOuter * COSDIR[k];
float tz = pz + pOuter * SINDIR[k];

bool sample_covered = false;
for (int j = 0; j < writeIdx; ++j) { // ONLY already-kept set
float *Q = data + j * 4;
float dx = Q[0] - tx;
float dz = Q[1] - tz;
float distSq = dx * dx + dz * dz;

float qOuter = Q[3];
if (distSq > qOuter * qOuter) continue;

float qInner = Q[2];
if (qInner > 0.0f && distSq < qInner * qInner) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Outer-circle coverage is not enough for annuli.

fully_covered is decided only from samples on pOuter, but pInner never participates in the cull test. For mixed [innerR, outerR] batches, that can drop a ring whose outer boundary is hidden while its min-range hole is still visible in the merged stencil. Please either add an inner-circle coverage check when pInner > 0, or conservatively skip culling annuli until both boundaries are proven hidden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@section/RangeRingCluster.cpp` around lines 108 - 123, The current cull loop
only samples the outer radius (pOuter) to set fully_covered, so annuli with a
visible inner hole can be incorrectly culled; update the loop that iterates k
(the 24-direction samples) to also test samples at pInner when pInner > 0 using
the same coverage logic against existing kept entries (writeIdx, data, Q) and
require both the outer and inner sample to be covered before marking
fully_covered, or alternatively, if you prefer simpler conservative behavior,
skip annulus culling when pInner > 0 (i.e., do not set fully_covered for rings
with pInner > 0). Ensure you reference and reuse the same distance/inner/outer
checks (distSq vs qOuter*qOuter and qInner*qInner) as currently used for pOuter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@M3RT1N99
Copy link
Copy Markdown
Author

Regarding the self-copy nitpick (guard writeIdx != i): we tested this and it actually worsened FPS. The unconditional 4-float copy (16 bytes, likely a single cache line hit) is cheaper than a branch that mispredicts in the hot loop. The self-copy is harmless — same memory written with same values — so we're keeping the branchless version.

Regarding inner-radius (annuli) sampling: most units have innerR = 0 (solid disks). Only artillery/missiles have inner radii, and those are rare in the dense-crowd scenario this patch targets. Adding inner-edge sampling doubles the sample count for a case that doesn't measurably affect the visual in practice. Skipping for now — can revisit if a real-world case surfaces.

Both CodeRabbit review suggestions addressed. The only code change applied was the description string fix (16 → 24 samples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant