perf: Overwhelming Audit w/ Multi-Threading#8203
Open
AzmodiusX wants to merge 26 commits intocataclysmbn:mainfrom
Open
perf: Overwhelming Audit w/ Multi-Threading#8203AzmodiusX wants to merge 26 commits intocataclysmbn:mainfrom
AzmodiusX wants to merge 26 commits intocataclysmbn:mainfrom
Conversation
Map cache, scent map, and light map all threaded. Made thread pool
Contributor
|
Autofix has formatted code style violation in this PR. I edit commits locally (e.g: git, github desktop) and want to keep autofix
I do not want the automated commit
If you don't do this, your following commits will be based on the old commit, and cause MERGE CONFLICT. |
monster_plan restructure Monster planning distance culling Thread local RNG skew_vision_cache Parallel monmove dispatch preplanning
memset in map.cpp rebuild_pq() called after every successful act_on_map(), not just on destruction Extracted map::update_weather_transparency_lookup() ITEM_PROCESS_RADIUS_SM now mapped to MAPSIZE, kept for testing or easy changes parallel_for → parallel_for_chunked(..., 8) for monster planning rate_target gains optional precalc_dist parameter Pre-warm pass: calls mon->sees(u) for every plannable monster before parallel_for_chunked active_items.empty() early-out deferred + plan_lookup merged into a single plan_index map Fix vehicle collosions Fix heap corruption during vehmove Removed stale comments
Documented non-determinism rng for threading in some cases Fixed performance losses in vehmove, scent_map::decay, and monMove Removed process-items distance check entirely Added sight cache Added has_cargo_recharge check
Collaborator
Author
|
I see the failing tests, looking into it. |
Use monster budget to tier monsters based on distance. Defers and simplifies monmove for monsters when counts grow
Collaborator
Author
|
I am cooking. |
Also did work for future threading work. Deferred for fear of collision with sound rework and chunk loading.
Fixed macro step drifting monsters toward player_pos --friendly moved from after the idle-path early returns to before them shove_vehicle(dest, dest) fixed, now shove_vehicle(goal, dest) Added effective_friendly = friendly > 0 ? friendly - 1 : friendly alongside the existing effective_wandf simulation decide_action and execute_action: Changed lod_tier == 0 to lod_tier <= 1 in both the repath signal (decide) and the A* call site (execute) Added monster::prewarm_sight(const Creature &) plan_index is constructed fresh each call; plan_index.size() was always 0 at that point Removed the outer turn_cached_sees(*this, g->u) guard before the player target block monster_action_kind::special commented out as a Phase 3 placeholder (future implementation) must_serialize field removed from monster_action_t. The only setter (action.must_serialize = true in the push case) removed, note left in comment for future. Consolidated overly verbose comments (useful for development at the time) Tier-1 monsters (20-60 tiles) can now run A* when genuinely stuck
Split sleep performance options for NPCs, with force NPC sleep functionality added
Collaborator
Author
|
I need to do my own testing and profiling some more, but based on what I've done so far, this should be working. |
Collaborator
Author
|
I marked it ready so it would run the tests while I slept, plz forgib |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose of change (The Why)
Always looking for performance gains.
Notably, monmove seems to be an incredible source of lag.
Many things that could use distance culling don't.
Describe the solution (The How)
Implemented robust thread pool.
Map cache, scent map, and light map all threaded.
vehmove, monmove, and process_items all gained some form of performance upgrade.
Not every change was multi-threading related, but much was.
Also added several performance options, including splitting the sleeping monster movement blocking setting into 2, for npc specific. That version specifically forces NPCs to sleep, too.
Describe alternatives you've considered
Focusing on non-threaded improvements
Threading increases code complexity and maintenance burden, so it has to be worth it.
Testing
I used Tracy for profiling. Auto saves are off.
Load into world, in the middle of a city with several mid sized fires. There are many enemies, vehicles, and items nearby. The player is in complete debug mode, meaning nothing interacts with them. This removes many confounding variables such as combat or aggro checks. Though there are some downsides to this setup, it is incredibly reliable and standardized. I make the player wait 2 in game hours (7200 turns) while profiling, then quickly cease profiling.
At most steps, individual commits were tested for timing, noting that sequentially at least, each change resulted in at worst no change in timing (structural), with most resulting in an improvement.
The final comparison resulted in 66.36% time for do_turn. That's about 50% faster.
Necropolis got even more extreme gains, having a 56.5% comparison time.
I've also tested the following:
Vehicle collisions work correctly
Light sources activate and deactivate as expected; no "ghost lights"
Fire processing and monster movement still occur correctly near the edge of vision
I did another 2 hour test in necropolis, and the performance gains were more limited. The comparison is 88.5% (13% increase)Honestly, what is going on in necropolis?
Those were old numbers so I can flex the massive improvement for worst case. Wowee.
Additional context
I'm sure I missed something, there's a lot here.
Of note: The graphics performance costs are actually marginal from my estimations. SDL2 isn't really the bottleneck for most setups.
Though, even if it were, SDL2 is not thread safe. Only some minor gains could be had from doing some setup steps via multi-threading beforehand. I think the largest benefit for that would be latency, not total game speed (which seems to be a more common issue).
Fair warning to people profiling this PR:
I added a lot of profiling. It might be a bit overwhelming, but it sure is informative. For a PR of this complexity, it seems valuable to keep.