-
-
Notifications
You must be signed in to change notification settings - Fork 7
Improve particle_engine performance #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReplaces dynamic linked-list particle storage with fixed-size arrays (MAX_PARTICLES = 4096) and per-array indices; adds in-place builders and NEXT_* macros; updates all particle spawn/update/render paths to index-based loops with end-swap compaction; removes dynamic allocation/cleanup and updates engine struct and function signatures. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Game
participant Engine as ParticleEngine (s_engine)
participant Arrays as Particles[game/global]
participant Renderer
Game->>Engine: spawn event (e.g., explosion)
Engine->>Arrays: NEXT_*_PARTICLE() -> slot or warn
Engine->>Arrays: build_rect_particle(p[slot], params)
loop per frame
Game->>Engine: particle_engine_update(dt)
Engine->>Arrays: update_particles(p, &idx, dt)
Note right of Engine #f4f4f8: decrement timers, move alive\ncompact dead by swapping from end
Game->>Engine: particle_engine_render_game(cam)
Engine->>Renderer: render_particles(game_particles, game_idx, cam)
Game->>Engine: particle_engine_render_global(cam)
Engine->>Renderer: render_particles(global_particles, global_idx, cam)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1950139 to
7ec7212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (20)
src/particle_engine.c (1)
256-258: Remove unreachable NULL check.The
NEXT_GAME_PARTICLEmacro already returns whenpis NULL, making this check dead code.Apply this diff:
for (unsigned int i = 0; i < count; ++i) { NEXT_GAME_PARTICLE(p); - - if (p == NULL) { - return; - } x = get_random(dim.width) + pos.x;lib/profiler/include/rdtsc.h (3)
1-2: Consider using a project-specific include guard prefix.The include guard
_RDTSC_H_uses a leading underscore followed by a capital letter, which is reserved for the implementation in C. While this may work in practice, it could theoretically conflict with system headers.Consider using a project-specific prefix:
-#ifndef _RDTSC_H_ -#define _RDTSC_H_ +#ifndef PROFILER_RDTSC_H_ +#define PROFILER_RDTSC_H_And update line 19:
-#endif // _RDTSC_H_ +#endif // PROFILER_RDTSC_H_
6-7: Remove unused include.
stddef.his included butsize_tis not used in this header. Onlystdint.his needed foruint64_t.#include <stdint.h> -#include <stddef.h>
9-15: Consider adding API documentation.The function declarations lack documentation comments explaining their purpose, parameters, and return values. This would improve usability, especially for
estimate_cpu_freq(uint64_t test_time)where the unit and meaning oftest_timeis not clear.Example documentation:
/** * Get the OS timer frequency in Hz. * @return Frequency in Hz (typically 1,000,000 for microsecond resolution) */ uint64_t get_os_time_freq(void); /** * Read the current OS timer value. * @return Timer value in units of get_os_time_freq() */ uint64_t read_os_timer(void); /** * Read the CPU timestamp counter (RDTSC). * @return CPU cycle count */ uint64_t read_cpu_timer(void); /** * Estimate CPU frequency by comparing CPU timer to OS timer. * @param test_time Duration of the test in milliseconds * @return Estimated CPU frequency in Hz */ uint64_t estimate_cpu_freq(uint64_t test_time);lib/profiler/src/rdtsc.c (1)
23-28: No error handling for gettimeofday failure.
gettimeofdaycan fail (returns -1), but the code doesn't check for this. While failures are rare, they could occur in edge cases (e.g., system clock issues).Consider adding error handling:
uint64_t read_os_timer(void) { struct timeval tv; - gettimeofday(&tv, NULL); + if (gettimeofday(&tv, NULL) != 0) { + // Fallback or error handling + return 0; + } return get_os_time_freq()*tv.tv_sec + tv.tv_usec; }However, given this is a profiler library where failure modes are less critical than in production code, this may be acceptable as-is.
lib/profiler/README.md (2)
54-60: Add language specifier to fenced code block.The example output code block lacks a language specifier, which prevents proper syntax highlighting and may cause linting warnings.
As per coding guidelines (static analysis hint), add a language specifier:
-``` +```text ===== Benchmarks : =====
69-70: TODO: Complete API documentation.The API section is marked as TODO. Consider adding documentation for the public API functions to help users understand the available interfaces.
Would you like me to generate API documentation based on the profiler headers, or open a new issue to track this task?
lib/profiler/include/internal/common.h (1)
1-2: Consider adding a profiler-specific prefix to the include guard.The include guard
COMMON_H_is generic and could conflict with other headers namedcommon.hin a larger project. Since this is in an internal directory, the risk is lower, but a prefix would improve clarity.-#ifndef COMMON_H_ -#define COMMON_H_ +#ifndef PROFILER_INTERNAL_COMMON_H_ +#define PROFILER_INTERNAL_COMMON_H_And update line 13:
-#endif // COMMON_H_ +#endif // PROFILER_INTERNAL_COMMON_H_lib/profiler/include/profiler.h (1)
1-2: Consider using a non-reserved include guard name.Similar to
rdtsc.h, the include guard_PROFILER_H_uses a leading underscore followed by a capital letter, which is reserved by the C standard. While unlikely to cause issues in practice, using a project-specific prefix is safer.-#ifndef _PROFILER_H_ -#define _PROFILER_H_ +#ifndef PROFILER_H_ +#define PROFILER_H_And update line 21:
-#endif // _PROFILER_H_ +#endif // PROFILER_H_lib/profiler/include/perf.h (2)
1-2: Consider using a non-reserved include guard name.The include guard
_PERF_H_uses a leading underscore followed by a capital letter, which is reserved by the C standard. This is the same pattern seen in other profiler headers.-#ifndef _PERF_H_ -#define _PERF_H_ +#ifndef PROFILER_PERF_H_ +#define PROFILER_PERF_H_And update line 14:
-#endif // _PERF_H_ +#endif // PROFILER_PERF_H_
6-12: Consider documenting platform requirements.The perf API is Linux-specific (based on
perf_event_openin the implementation). The header doesn't indicate this platform dependency, which could confuse users on other platforms.Add a comment documenting the platform requirement:
+/** + * Linux-specific performance counter API. + * Uses perf_event_open to count page faults. + * On non-Linux platforms, these functions will be no-ops or return 0. + */ + void perf_setup(void);Also consider whether the declarations should be platform-guarded:
#ifdef __linux__ void perf_setup(void); void perf_reset_page_fault_count(void); uint64_t perf_read_page_fault_count(void); void perf_close(void); #else // Stub declarations or empty inline functions for non-Linux static inline void perf_setup(void) {} static inline void perf_reset_page_fault_count(void) {} static inline uint64_t perf_read_page_fault_count(void) { return 0; } static inline void perf_close(void) {} #endifThis would make the API work safely on all platforms without requiring conditional compilation at call sites.
CMakeLists.txt (1)
13-13: Use option()/BOOL and target-scoped definitions; rely on transitive includes
- Prefer a BOOL option and target_compile_definitions to limit scope.
- profiler target already exposes PUBLIC include dir; the extra include is redundant.
-set(PROFILER OFF CACHE STRING "Enable profiler") +option(PROFILER "Enable profiler" OFF) @@ -if (PROFILER) - add_subdirectory(lib/profiler) -endif() +if (PROFILER) + add_subdirectory(lib/profiler) +endif() @@ -if (PROFILER) - add_definitions("-DPROFILER") - target_link_libraries(${PROJECT_NAME} profiler) - target_include_directories(${PROJECT_NAME} PRIVATE lib/profiler/include) -endif () +if (PROFILER) + target_link_libraries(${PROJECT_NAME} PRIVATE profiler) + target_compile_definitions(${PROJECT_NAME} PRIVATE PROFILER=1) + # Optional: profiler's PUBLIC include makes this unnecessary + # target_include_directories(${PROJECT_NAME} PRIVATE lib/profiler/include) +endif ()Also applies to: 54-56, 223-227
lib/profiler/CMakeLists.txt (1)
5-10: Gate perf.c to Linux to avoid cross‑platform build breaksperf.c includes linux/perf_event.h; enabling PROFILER on non‑Linux will fail.
add_library(profiler src/rdtsc.c src/profiler.c src/repetition_tester.c - src/perf.c ) target_include_directories(profiler PUBLIC include PRIVATE include/internal ) + +if (CMAKE_SYSTEM_NAME STREQUAL "Linux") + target_sources(profiler PRIVATE src/perf.c) + target_compile_definitions(profiler PUBLIC PROFILER_HAS_PERF=1) +endif()Please verify CI on macOS/Windows with PROFILER=ON after this change.
Also applies to: 12-15
lib/profiler/test/calc_cpu_freq.c (1)
1-31: Fix uint64_t printf formats and return from mainUse inttypes PRIu64 for portability (Windows LLP64) and return 0 from main.
#include <stdio.h> #include <stdint.h> +#include <inttypes.h> #include "rdtsc.h" @@ - printf(" OS Freq: %lu (reported)\n", os_freq); - printf(" OS Seconds: %.4f\n", (f64) os_elapsed/ (f64) os_freq); - printf(" CPU Timer: %lu -> %lu = %lu\n", cpu_start, cpu_end, cpu_elapsed); - printf(" CPU Freq: %lu (guessed)\n", cpu_freq); + printf(" OS Freq: %" PRIu64 " (reported)\n", os_freq); + printf(" OS Seconds: %.4f\n", (f64) os_elapsed / (f64) os_freq); + printf(" CPU Timer: %" PRIu64 " -> %" PRIu64 " = %" PRIu64 "\n", cpu_start, cpu_end, cpu_elapsed); + printf(" CPU Freq: %" PRIu64 " (guessed)\n", cpu_freq); + return 0; }lib/profiler/src/calc_cpu_freq.c (1)
1-31: Portability: use PRIu64 for uint64_t and return 0Same fix as the test variant.
#include <stdio.h> #include <stdint.h> +#include <inttypes.h> #include <rdtsc.h> @@ - printf(" OS Freq: %lu (reported)\n", os_freq); - printf(" OS Seconds: %.4f\n", (f64) os_elapsed/ (f64) os_freq); - printf(" CPU Timer: %lu -> %lu = %lu\n", cpu_start, cpu_end, cpu_elapsed); - printf(" CPU Freq: %lu (guessed)\n", cpu_freq); + printf(" OS Freq: %" PRIu64 " (reported)\n", os_freq); + printf(" OS Seconds: %.4f\n", (f64) os_elapsed / (f64) os_freq); + printf(" CPU Timer: %" PRIu64 " -> %" PRIu64 " = %" PRIu64 "\n", cpu_start, cpu_end, cpu_elapsed); + printf(" CPU Freq: %" PRIu64 " (guessed)\n", cpu_freq); + return 0; }lib/profiler/src/rep_return_data.c (1)
46-56: Minor: check malloc result in test helperAvoid potential NULL deref in malloc_create_data; quick guard improves robustness.
static ReturnData* malloc_create_data(void) { - ReturnData *data = malloc(sizeof(ReturnData)); + ReturnData *data = malloc(sizeof(ReturnData)); + if (!data) return NULL; data->fnum = 1.2; @@ return data; }And in test_allocated_data, guard free/use:
- ReturnData *data = malloc_create_data(); + ReturnData *data = malloc_create_data(); + if (!data) { rept_end(tester); break; }lib/profiler/src/perf.c (1)
11-21: FD sentinel, ioctl arg, and type fixes; improve robustness
- Use -1 as invalid FD (0 can be valid).
- Pass 0 to PERF_EVENT_IOC_ENABLE.
- Read into uint64_t to match API.
-static int _FD; +static int _FD = -1; @@ - _FD = perf_event_open(&pe, 0, -1, -1, 0); - if (_FD == -1) { + _FD = perf_event_open(&pe, 0, -1, -1, 0); + if (_FD == -1) { fprintf(stderr, "Error opening leader %llx : %s\n", pe.config, strerror(errno)); fprintf(stderr, "Page errors will not be counted\n"); - _FD = 0; return; } @@ - ioctl(_FD, PERF_EVENT_IOC_RESET, 0); - ioctl(_FD, PERF_EVENT_IOC_ENABLE); + ioctl(_FD, PERF_EVENT_IOC_RESET, 0); + ioctl(_FD, PERF_EVENT_IOC_ENABLE, 0); @@ - if (_FD) { + if (_FD != -1) { ioctl(_FD, PERF_EVENT_IOC_RESET, 0); ioctl(_FD, PERF_EVENT_IOC_ENABLE); } @@ - size_t count = 0; - if (_FD) { - read(_FD, &count, sizeof(count)); + uint64_t count = 0; + if (_FD != -1) { + (void)read(_FD, &count, sizeof(count)); } return count; @@ - if (_FD) { - close(_FD); + if (_FD != -1) { + close(_FD); + _FD = -1; }Also ensure perf.c is only compiled on Linux (see my CMake comment).
Also applies to: 34-44, 46-52, 54-61, 63-68
lib/profiler/src/rep_read_test.c (1)
78-79: Use binary mode for portabilityOpen file in "rb" to avoid newline translation on platforms that differentiate text/binary.
lib/profiler/src/profiler.c (1)
52-60: Consider adding prof_init to reset stateHeader declares prof_init but no implementation here. Provide a simple initializer.
Add this function:
void prof_init(void) { memset(&_Prof, 0, sizeof(_Prof)); CurrentAnchorLabel = NULL; }lib/profiler/include/internal/profiler_c.h (1)
24-31: Const-correctness for parent anchor labelparent_anchor should be const to match usage and avoid accidental mutation.
- char *parent_anchor; + const char *parent_anchor;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.gitignore(1 hunks)CMakeLists.txt(3 hunks)lib/profiler/CMakeLists.txt(1 hunks)lib/profiler/README.md(1 hunks)lib/profiler/include/internal/common.h(1 hunks)lib/profiler/include/internal/profiler_c.h(1 hunks)lib/profiler/include/macros.h(1 hunks)lib/profiler/include/perf.h(1 hunks)lib/profiler/include/profiler.h(1 hunks)lib/profiler/include/rdtsc.h(1 hunks)lib/profiler/include/repetition_tester.h(1 hunks)lib/profiler/src/calc_cpu_freq.c(1 hunks)lib/profiler/src/perf.c(1 hunks)lib/profiler/src/profiler.c(1 hunks)lib/profiler/src/rdtsc.c(1 hunks)lib/profiler/src/rep_read_test.c(1 hunks)lib/profiler/src/rep_return_data.c(1 hunks)lib/profiler/src/repetition_tester.c(1 hunks)lib/profiler/test/calc_cpu_freq.c(1 hunks)src/main.c(3 hunks)src/particle_engine.c(16 hunks)
🧰 Additional context used
🧬 Code graph analysis (11)
lib/profiler/include/perf.h (1)
lib/profiler/src/perf.c (4)
perf_setup(23-44)perf_reset_page_fault_count(46-52)perf_read_page_fault_count(54-61)perf_close(63-68)
lib/profiler/src/profiler.c (1)
lib/profiler/src/rdtsc.c (2)
read_cpu_timer(30-33)estimate_cpu_freq(35-55)
lib/profiler/src/calc_cpu_freq.c (4)
lib/profiler/src/rep_read_test.c (1)
main(125-181)lib/profiler/src/rep_return_data.c (1)
main(89-112)lib/profiler/test/calc_cpu_freq.c (1)
main(8-31)lib/profiler/src/rdtsc.c (3)
get_os_time_freq(18-21)read_cpu_timer(30-33)read_os_timer(23-28)
lib/profiler/include/rdtsc.h (1)
lib/profiler/src/rdtsc.c (4)
get_os_time_freq(18-21)read_os_timer(23-28)read_cpu_timer(30-33)estimate_cpu_freq(35-55)
lib/profiler/src/perf.c (1)
src/main.c (1)
close(1357-1423)
lib/profiler/src/repetition_tester.c (2)
lib/profiler/src/perf.c (4)
perf_setup(23-44)perf_reset_page_fault_count(46-52)perf_read_page_fault_count(54-61)perf_close(63-68)lib/profiler/src/rdtsc.c (1)
read_cpu_timer(30-33)
lib/profiler/include/internal/profiler_c.h (1)
lib/profiler/src/profiler.c (6)
prof_start(52-55)prof_stop(57-60)prof_add_anchor(69-85)prof_print(119-174)make_anchor_block(176-182)read_anchor_block(184-189)
lib/profiler/include/repetition_tester.h (1)
lib/profiler/src/repetition_tester.c (6)
rept_setup(10-36)rept_begin(38-45)rept_end(47-54)rept_count_bytes(56-60)rept_is_testing(98-146)rept_print_results(148-158)
lib/profiler/test/calc_cpu_freq.c (1)
lib/profiler/src/rdtsc.c (3)
get_os_time_freq(18-21)read_cpu_timer(30-33)read_os_timer(23-28)
lib/profiler/src/rep_read_test.c (3)
lib/profiler/src/repetition_tester.c (5)
rept_is_testing(98-146)rept_begin(38-45)rept_end(47-54)rept_count_bytes(56-60)rept_setup(10-36)lib/profiler/src/rep_return_data.c (1)
main(89-112)lib/profiler/src/rdtsc.c (1)
estimate_cpu_freq(35-55)
src/particle_engine.c (1)
src/random.c (1)
get_random(76-81)
🪛 markdownlint-cli2 (0.18.1)
lib/profiler/README.md
54-54: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Mac (Clang)
- GitHub Check: Ubuntu (mingw)
- GitHub Check: Ubuntu (Clang)
- GitHub Check: Mac (GCC)
- GitHub Check: Windows (MSVC)
- GitHub Check: Ubuntu (GCC)
🔇 Additional comments (12)
src/particle_engine.c (4)
68-84: LGTM! Clean slot allocation with capacity checks.The helpers correctly validate capacity and return NULL on exhaustion, with helpful warnings for debugging.
86-96: LGTM! Macros follow best practices.The do-while(0) pattern ensures safe macro expansion, and early return on NULL appropriately handles capacity exhaustion.
507-529: LGTM! Array-based rendering is correct.The iteration logic correctly uses indices as counts with proper bounds checking.
532-535: Verify if indices should be reset.With static allocation, no memory needs to be freed. However, confirm whether
particle_engine_clearshould resetglobal_idxandgame_idxto 0 to clear all active particles. The current no-op may be intentional if particles are expected to persist across clear operations..gitignore (1)
20-20: LGTM!Adding
profile.txtto.gitignoreis appropriate, as this file is generated by the profiler at runtime and should not be version-controlled.lib/profiler/include/internal/common.h (1)
4-11: LGTM!The ANSI color escape codes are standard and correctly defined. They'll provide useful colorization for profiler output.
lib/profiler/include/profiler.h (1)
4-19: LGTM!The header structure is well-organized with proper C++ compatibility guards and conditional compilation. The includes are appropriately grouped under the PROFILER guard.
src/main.c (2)
65-67: LGTM!The conditional inclusion of the profiler header is correctly guarded by the PROFILER preprocessor flag.
1472-1474: LGTM!The profiler setup is correctly placed early in main(), after PHYSFS initialization but before the main game initialization. This ensures profiling captures the full execution.
lib/profiler/README.md (1)
14-16: Correct header include in README example
Replace the non-existentbenchmark.hwith the actualprofiler.hin the usage snippet:```c #include <stdio.h> -#include "benchmark.h" +#include "profiler.h"lib/profiler/src/rdtsc.c (1)
23-28: Potential integer overflow in OS timer calculation.Line 27 computes
get_os_time_freq() * tv.tv_sec, which multiplies 1,000,000 by the seconds value. For largetv_secvalues (e.g., years of uptime), this could overflow uint64_t.Consider using a safer calculation order or checking for overflow:
uint64_t read_os_timer(void) { struct timeval tv; gettimeofday(&tv, NULL); - return get_os_time_freq()*tv.tv_sec + tv.tv_usec; + uint64_t freq = get_os_time_freq(); + // freq is 1,000,000, tv_usec is < 1,000,000 + // So tv_usec is already in the right units + return freq * (uint64_t)tv.tv_sec + (uint64_t)tv.tv_usec; }Note: While overflow is unlikely in practice (uint64_t can hold ~584 million years at microsecond resolution), casting ensures no implicit conversions occur.
Likely an incorrect or invalid review comment.
lib/profiler/include/repetition_tester.h (1)
8-12: Macro side effects and semicolonCurrent macro ends with a semicolon inside; safe due to do{ }while(0); just ensure callers don’t add an extra semicolon if compiling with pedantic warnings. No change required.
7ec7212 to
6624042
Compare
6624042 to
97c69bf
Compare
Replaces usage of linkedlists for statically allocated swaplists which are a lot more cpu cache friendly. The size cap can be addressed with a dynamiccally allocated swaplist should the need arise in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/particle_engine.c (1)
186-186: Consider adding robustness for the c_count-1 pattern.The expression
(unsigned int) c_count-1will underflow toUINT_MAXifc_countis 0, causing an out-of-bounds array access. Current call sites (lines 194, 200, 206) always passc_count >= 1, so this is safe in practice.For robustness, consider adding an assertion at the function entry:
static void create_explosion(Position pos, Dimension dim, unsigned int c_count, ...) { + assert(c_count > 0 && "c_count must be at least 1"); SDL_Color *colors = ec_malloc(c_count * sizeof(SDL_Color));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/particle_engine.c(16 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/particle_engine.c (1)
src/random.c (1)
get_random(76-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Mac (GCC)
- GitHub Check: Mac (Clang)
- GitHub Check: Windows (MSVC)
- GitHub Check: Ubuntu (mingw)
- GitHub Check: Ubuntu (Clang)
- GitHub Check: Ubuntu (GCC)
🔇 Additional comments (3)
src/particle_engine.c (3)
58-96: LGTM! Solid allocation and capacity management.The fixed-size array approach with bounds checking is well-implemented:
- Zero-initialization of the static engine is correct
- Allocation helpers check capacity before returning a pointer
- Post-increment ensures valid indices (0-4095) are used
- Early-return macros prevent NULL propagation to callers
This eliminates dynamic allocation overhead and improves cache locality compared to linked lists.
425-443: LGTM! Critical issue from previous review has been correctly fixed.The update logic with array compaction is now correct:
- Prefix decrement
--(*idx)decrements first, then accessesparticles[*idx], preventing the out-of-bounds read flagged in the previous review- When a particle dies at position
i, the last active particle is moved to positioni, and*idxis decremented- The loop re-checks position
i(without incrementing) to handle the newly moved particle, correctly processing chains of dead particles- Direct struct assignment is clearer than
memcpyand allows compiler optimizationThe compaction strategy maintains a dense array of active particles, improving cache locality during iteration.
98-110: Excellent performance refactor: array-based particle system.The migration from linked lists to fixed-size arrays delivers the intended cache-locality improvements:
- In-place construction (
build_rect_particle) eliminates allocation overhead- Static arrays remove all dynamic memory management (no allocations, no frees)
- Index-based iteration in update and render paths is cache-friendly
- Simplified cleanup (line 534) - no-op due to static allocation
The consistent pattern across all particle generator functions (bloodspray, explosion, dust, wind, heat, sparkle, speed lines) makes the code maintainable and predictable.
Also applies to: 448-450, 507-529, 534-534
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/particle_engine.c (1)
528-531: LGTM!The no-op implementation is correct for static allocation. If desired,
particle_engine_closecould be simplified since it now just calls an empty function, but this is purely optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/particle_engine.c(16 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/particle_engine.c (1)
src/random.c (1)
get_random(76-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Mac (GCC)
- GitHub Check: Mac (Clang)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Ubuntu (Clang)
- GitHub Check: Ubuntu (GCC)
- GitHub Check: Windows (MSVC)
- GitHub Check: Ubuntu (mingw)
🔇 Additional comments (8)
src/particle_engine.c (8)
58-66: Well-designed cache-friendly refactor.The switch from dynamic linked lists to fixed-size arrays is a solid performance optimization. Zero-initialization correctly sets indices to 0, and MAX_PARTICLES = 4096 provides a reasonable upper bound with overflow warnings in place.
68-96: LGTM!The helper functions correctly check capacity before allocation, and the macros cleanly eliminate the need for NULL checks at each call site. The warning messages on capacity overflow are helpful for debugging.
98-110: LGTM!The shift to in-place initialization via
build_rect_particleeliminates allocation overhead. All particle fields are properly initialized.
120-396: LGTM!All particle creation functions correctly use the new allocation macros. The past review concern about unreachable NULL checks has been resolved—the code no longer has redundant checks after
NEXT_*_PARTICLEcalls.
421-439: Excellent: Past critical issue resolved + correct compaction logic.The swap-with-last pattern is implemented correctly with prefix decrement at line 436. This resolves the past critical out-of-bounds issue and ensures that when a particle dies, it's replaced by the last active particle without accessing beyond array bounds. The logic correctly re-checks the swapped particle on the next iteration.
442-446: LGTM!The update calls correctly pass array pointers and index addresses to the refactored
update_particlesfunction.
503-525: LGTM!The render functions correctly iterate over arrays by index. Array bounds are safe since
size(passed ass_engine.*_idx) never exceeds MAX_PARTICLES.
58-531: Outstanding performance optimization.The refactor achieves its stated goal: particle_engine_update dropped from 1.27% to 0.03% of total runtime (>40× improvement). The array-based approach provides superior cache locality, eliminates allocation overhead, and simplifies memory management—all while maintaining correctness. The past critical out-of-bounds issue has been properly resolved with prefix decrement.
Introduced a profiling lib (from computerenhance). Profiled the particle
engine code in the update loop and implemented a more cpu-cache friendly
solution that doesn't rely on linked lists.
Profile result before
Profile result after
Details
The above outputs lists the total time the program was run and how much percent
of the
run_game_updatefunction total execution time was used by the particleengine.
It's more or less obvious that things improved.
In both runs the
maproomgenerato.luacode had been modified so that all roomswould be windy to allow for a lot of particles being rendered.
Summary by CodeRabbit
New Features
Performance
Refactor
Known Impacts