Skip to content

Conversation

@LiquidityC
Copy link
Member

@LiquidityC LiquidityC commented Oct 11, 2025

Introduced a profiling lib (from computerenhance). Profiled the particle
engine code in the update loop and implemented a more cpu-cache friendly
solution that doesn't rely on linked lists.

Profile result before

                  =====   Benchmarks :   =====
                          Total time : 106691427524 (34.7301 s)
               run_game_update[1933] : 20781332     0.02% (2.53% w/children)
 update_before_particle_engine[1933] : 195193852    0.18%
  update_after_particle_engine[1933] : 1123800376   1.05%
        particle_engine_update[1933] : 1354889225   1.27%

Profile result after

                  =====   Benchmarks :   =====
                          Total time : 177223245943 (57.6895 s)
               run_game_update[3293] : 45724050     0.03% (1.56% w/children)
 update_before_particle_engine[3293] : 463608710    0.26%
  update_after_particle_engine[3293] : 2209962002   1.25%
        particle_engine_update[3293] : 51584450     0.03%

Details

The above outputs lists the total time the program was run and how much percent
of the run_game_update function total execution time was used by the particle
engine.

It's more or less obvious that things improved.

In both runs the maproomgenerato.lua code had been modified so that all rooms
would be windy to allow for a lot of particles being rendered.

Summary by CodeRabbit

  • New Features

    • None
  • Performance

    • Particle effects run faster and smoother with fewer frame drops and reduced runtime allocation overhead.
  • Refactor

    • Particle system rewritten to use a fixed-size pool for more predictable performance and stability.
  • Known Impacts

    • Particle counts are capped under extreme load, which may slightly reduce effect density in rare cases.

@coderabbitai
Copy link

coderabbitai bot commented Oct 11, 2025

Walkthrough

Replaces dynamic linked-list particle storage with fixed-size arrays (MAX_PARTICLES = 4096) and per-array indices; adds in-place builders and NEXT_* macros; updates all particle spawn/update/render paths to index-based loops with end-swap compaction; removes dynamic allocation/cleanup and updates engine struct and function signatures.

Changes

Cohort / File(s) Summary
Particle engine core
src/particle_engine.c
Replaced LinkedList storage with global_particles[MAX_PARTICLES] and game_particles[MAX_PARTICLES] plus global_idx/game_idx; added NEXT_GLOBAL_PARTICLE/NEXT_GAME_PARTICLE macros and next_* helpers; introduced build_rect_particle for in-place init; removed allocation/free and linked-list management; updated update/render to index-based iteration and compact-by-swapping dead entries; simplified clear and exposed arrays/indices.
Particle creation sites (in-file rewrites)
src/particle_engine.c (bloodspray, explosion, dust puff, speed lines, wind, heat, sparkle, etc.)
Replaced create_*_particle + linkedlist_append usage with NEXT_*_PARTICLE + build_rect_particle in-place construction for all particle spawn locations; removed linked-list node creation code.
Public API / headers (implicit signature changes)
include/... (engine headers implied)
Updated function signatures and Engine layout to use Particle * arrays and size_t indices instead of LinkedList*; update_particles(Particle *particles, size_t *idx, float deltaTime) and render_particles(Particle *particles, size_t size, Camera *cam); engine callers updated to pass arrays and indices.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Game
  participant Engine as ParticleEngine (s_engine)
  participant Arrays as Particles[game/global]
  participant Renderer

  Game->>Engine: spawn event (e.g., explosion)
  Engine->>Arrays: NEXT_*_PARTICLE() -> slot or warn
  Engine->>Arrays: build_rect_particle(p[slot], params)

  loop per frame
    Game->>Engine: particle_engine_update(dt)
    Engine->>Arrays: update_particles(p, &idx, dt)
    Note right of Engine #f4f4f8: decrement timers, move alive\ncompact dead by swapping from end
    Game->>Engine: particle_engine_render_game(cam)
    Engine->>Renderer: render_particles(game_particles, game_idx, cam)
    Game->>Engine: particle_engine_render_global(cam)
    Engine->>Renderer: render_particles(global_particles, global_idx, cam)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I nibble code and count each hop,
Slots lined up — no heap to stop.
Indices hop, I build in place,
Dead ones swapped without a race.
Tiny sparks stay in their space ✨🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Improve particle_engine performance” accurately and concisely describes the main intent of this pull request, namely refactoring the particle engine to use static arrays and optimize cache usage for improved performance.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch improve_particle_engine

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LiquidityC LiquidityC force-pushed the improve_particle_engine branch from 1950139 to 7ec7212 Compare October 11, 2025 13:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (20)
src/particle_engine.c (1)

256-258: Remove unreachable NULL check.

The NEXT_GAME_PARTICLE macro already returns when p is NULL, making this check dead code.

Apply this diff:

 	for (unsigned int i = 0; i < count; ++i) {
 		NEXT_GAME_PARTICLE(p);
-
-		if (p == NULL) {
-			return;
-		}
 
 		x = get_random(dim.width) + pos.x;
lib/profiler/include/rdtsc.h (3)

1-2: Consider using a project-specific include guard prefix.

The include guard _RDTSC_H_ uses a leading underscore followed by a capital letter, which is reserved for the implementation in C. While this may work in practice, it could theoretically conflict with system headers.

Consider using a project-specific prefix:

-#ifndef _RDTSC_H_
-#define _RDTSC_H_
+#ifndef PROFILER_RDTSC_H_
+#define PROFILER_RDTSC_H_

And update line 19:

-#endif // _RDTSC_H_
+#endif // PROFILER_RDTSC_H_

6-7: Remove unused include.

stddef.h is included but size_t is not used in this header. Only stdint.h is needed for uint64_t.

 #include <stdint.h>
-#include <stddef.h>

9-15: Consider adding API documentation.

The function declarations lack documentation comments explaining their purpose, parameters, and return values. This would improve usability, especially for estimate_cpu_freq(uint64_t test_time) where the unit and meaning of test_time is not clear.

Example documentation:

/**
 * Get the OS timer frequency in Hz.
 * @return Frequency in Hz (typically 1,000,000 for microsecond resolution)
 */
uint64_t get_os_time_freq(void);

/**
 * Read the current OS timer value.
 * @return Timer value in units of get_os_time_freq()
 */
uint64_t read_os_timer(void);

/**
 * Read the CPU timestamp counter (RDTSC).
 * @return CPU cycle count
 */
uint64_t read_cpu_timer(void);

/**
 * Estimate CPU frequency by comparing CPU timer to OS timer.
 * @param test_time Duration of the test in milliseconds
 * @return Estimated CPU frequency in Hz
 */
uint64_t estimate_cpu_freq(uint64_t test_time);
lib/profiler/src/rdtsc.c (1)

23-28: No error handling for gettimeofday failure.

gettimeofday can fail (returns -1), but the code doesn't check for this. While failures are rare, they could occur in edge cases (e.g., system clock issues).

Consider adding error handling:

 uint64_t read_os_timer(void)
 {
     struct timeval tv;
-    gettimeofday(&tv, NULL);
+    if (gettimeofday(&tv, NULL) != 0) {
+        // Fallback or error handling
+        return 0;
+    }
     return get_os_time_freq()*tv.tv_sec + tv.tv_usec;
 }

However, given this is a profiler library where failure modes are less critical than in production code, this may be acceptable as-is.

lib/profiler/README.md (2)

54-60: Add language specifier to fenced code block.

The example output code block lacks a language specifier, which prevents proper syntax highlighting and may cause linting warnings.

As per coding guidelines (static analysis hint), add a language specifier:

-```
+```text
          =====   Benchmarks :   =====

69-70: TODO: Complete API documentation.

The API section is marked as TODO. Consider adding documentation for the public API functions to help users understand the available interfaces.

Would you like me to generate API documentation based on the profiler headers, or open a new issue to track this task?

lib/profiler/include/internal/common.h (1)

1-2: Consider adding a profiler-specific prefix to the include guard.

The include guard COMMON_H_ is generic and could conflict with other headers named common.h in a larger project. Since this is in an internal directory, the risk is lower, but a prefix would improve clarity.

-#ifndef COMMON_H_
-#define COMMON_H_
+#ifndef PROFILER_INTERNAL_COMMON_H_
+#define PROFILER_INTERNAL_COMMON_H_

And update line 13:

-#endif  // COMMON_H_
+#endif  // PROFILER_INTERNAL_COMMON_H_
lib/profiler/include/profiler.h (1)

1-2: Consider using a non-reserved include guard name.

Similar to rdtsc.h, the include guard _PROFILER_H_ uses a leading underscore followed by a capital letter, which is reserved by the C standard. While unlikely to cause issues in practice, using a project-specific prefix is safer.

-#ifndef _PROFILER_H_
-#define _PROFILER_H_
+#ifndef PROFILER_H_
+#define PROFILER_H_

And update line 21:

-#endif // _PROFILER_H_
+#endif // PROFILER_H_
lib/profiler/include/perf.h (2)

1-2: Consider using a non-reserved include guard name.

The include guard _PERF_H_ uses a leading underscore followed by a capital letter, which is reserved by the C standard. This is the same pattern seen in other profiler headers.

-#ifndef _PERF_H_
-#define _PERF_H_
+#ifndef PROFILER_PERF_H_
+#define PROFILER_PERF_H_

And update line 14:

-#endif  // _PERF_H_
+#endif  // PROFILER_PERF_H_

6-12: Consider documenting platform requirements.

The perf API is Linux-specific (based on perf_event_open in the implementation). The header doesn't indicate this platform dependency, which could confuse users on other platforms.

Add a comment documenting the platform requirement:

+/**
+ * Linux-specific performance counter API.
+ * Uses perf_event_open to count page faults.
+ * On non-Linux platforms, these functions will be no-ops or return 0.
+ */
+
 void perf_setup(void);

Also consider whether the declarations should be platform-guarded:

#ifdef __linux__
void perf_setup(void);
void perf_reset_page_fault_count(void);
uint64_t perf_read_page_fault_count(void);
void perf_close(void);
#else
// Stub declarations or empty inline functions for non-Linux
static inline void perf_setup(void) {}
static inline void perf_reset_page_fault_count(void) {}
static inline uint64_t perf_read_page_fault_count(void) { return 0; }
static inline void perf_close(void) {}
#endif

This would make the API work safely on all platforms without requiring conditional compilation at call sites.

CMakeLists.txt (1)

13-13: Use option()/BOOL and target-scoped definitions; rely on transitive includes

  • Prefer a BOOL option and target_compile_definitions to limit scope.
  • profiler target already exposes PUBLIC include dir; the extra include is redundant.
-set(PROFILER OFF CACHE STRING "Enable profiler")
+option(PROFILER "Enable profiler" OFF)
@@
-if (PROFILER)
-    add_subdirectory(lib/profiler)
-endif()
+if (PROFILER)
+    add_subdirectory(lib/profiler)
+endif()
@@
-if (PROFILER)
-    add_definitions("-DPROFILER")
-    target_link_libraries(${PROJECT_NAME} profiler)
-    target_include_directories(${PROJECT_NAME} PRIVATE lib/profiler/include)
-endif ()
+if (PROFILER)
+    target_link_libraries(${PROJECT_NAME} PRIVATE profiler)
+    target_compile_definitions(${PROJECT_NAME} PRIVATE PROFILER=1)
+    # Optional: profiler's PUBLIC include makes this unnecessary
+    # target_include_directories(${PROJECT_NAME} PRIVATE lib/profiler/include)
+endif ()

Also applies to: 54-56, 223-227

lib/profiler/CMakeLists.txt (1)

5-10: Gate perf.c to Linux to avoid cross‑platform build breaks

perf.c includes linux/perf_event.h; enabling PROFILER on non‑Linux will fail.

 add_library(profiler
   src/rdtsc.c
   src/profiler.c
   src/repetition_tester.c
-  src/perf.c
 )
 
 target_include_directories(profiler
   PUBLIC include
   PRIVATE include/internal
 )
+
+if (CMAKE_SYSTEM_NAME STREQUAL "Linux")
+  target_sources(profiler PRIVATE src/perf.c)
+  target_compile_definitions(profiler PUBLIC PROFILER_HAS_PERF=1)
+endif()

Please verify CI on macOS/Windows with PROFILER=ON after this change.

Also applies to: 12-15

lib/profiler/test/calc_cpu_freq.c (1)

1-31: Fix uint64_t printf formats and return from main

Use inttypes PRIu64 for portability (Windows LLP64) and return 0 from main.

 #include <stdio.h>
 #include <stdint.h>
+#include <inttypes.h>
 #include "rdtsc.h"
@@
-    printf("    OS Freq: %lu (reported)\n", os_freq);
-    printf(" OS Seconds: %.4f\n", (f64) os_elapsed/ (f64) os_freq);
-    printf("  CPU Timer: %lu -> %lu = %lu\n", cpu_start, cpu_end, cpu_elapsed);
-    printf("   CPU Freq: %lu (guessed)\n", cpu_freq);
+    printf("    OS Freq: %" PRIu64 " (reported)\n", os_freq);
+    printf(" OS Seconds: %.4f\n", (f64) os_elapsed / (f64) os_freq);
+    printf("  CPU Timer: %" PRIu64 " -> %" PRIu64 " = %" PRIu64 "\n", cpu_start, cpu_end, cpu_elapsed);
+    printf("   CPU Freq: %" PRIu64 " (guessed)\n", cpu_freq);
+    return 0;
 }
lib/profiler/src/calc_cpu_freq.c (1)

1-31: Portability: use PRIu64 for uint64_t and return 0

Same fix as the test variant.

 #include <stdio.h>
 #include <stdint.h>
+#include <inttypes.h>
 #include <rdtsc.h>
@@
-    printf("    OS Freq: %lu (reported)\n", os_freq);
-    printf(" OS Seconds: %.4f\n", (f64) os_elapsed/ (f64) os_freq);
-    printf("  CPU Timer: %lu -> %lu = %lu\n", cpu_start, cpu_end, cpu_elapsed);
-    printf("   CPU Freq: %lu (guessed)\n", cpu_freq);
+    printf("    OS Freq: %" PRIu64 " (reported)\n", os_freq);
+    printf(" OS Seconds: %.4f\n", (f64) os_elapsed / (f64) os_freq);
+    printf("  CPU Timer: %" PRIu64 " -> %" PRIu64 " = %" PRIu64 "\n", cpu_start, cpu_end, cpu_elapsed);
+    printf("   CPU Freq: %" PRIu64 " (guessed)\n", cpu_freq);
+    return 0;
 }
lib/profiler/src/rep_return_data.c (1)

46-56: Minor: check malloc result in test helper

Avoid potential NULL deref in malloc_create_data; quick guard improves robustness.

 static ReturnData* malloc_create_data(void)
 {
-    ReturnData *data = malloc(sizeof(ReturnData));
+    ReturnData *data = malloc(sizeof(ReturnData));
+    if (!data) return NULL;
     data->fnum = 1.2;
@@
     return data;
 }

And in test_allocated_data, guard free/use:

-        ReturnData *data = malloc_create_data();
+        ReturnData *data = malloc_create_data();
+        if (!data) { rept_end(tester); break; }
lib/profiler/src/perf.c (1)

11-21: FD sentinel, ioctl arg, and type fixes; improve robustness

  • Use -1 as invalid FD (0 can be valid).
  • Pass 0 to PERF_EVENT_IOC_ENABLE.
  • Read into uint64_t to match API.
-static int _FD;
+static int _FD = -1;
@@
-    _FD = perf_event_open(&pe, 0, -1, -1, 0);
-    if (_FD == -1) {
+    _FD = perf_event_open(&pe, 0, -1, -1, 0);
+    if (_FD == -1) {
         fprintf(stderr, "Error opening leader %llx : %s\n", pe.config, strerror(errno));
         fprintf(stderr, "Page errors will not be counted\n");
-        _FD = 0;
         return;
     }
@@
-    ioctl(_FD, PERF_EVENT_IOC_RESET, 0);
-    ioctl(_FD, PERF_EVENT_IOC_ENABLE);
+    ioctl(_FD, PERF_EVENT_IOC_RESET, 0);
+    ioctl(_FD, PERF_EVENT_IOC_ENABLE, 0);
@@
-    if (_FD) {
+    if (_FD != -1) {
         ioctl(_FD, PERF_EVENT_IOC_RESET, 0);
         ioctl(_FD, PERF_EVENT_IOC_ENABLE);
     }
@@
-    size_t count = 0;
-    if (_FD) {
-        read(_FD, &count, sizeof(count));
+    uint64_t count = 0;
+    if (_FD != -1) {
+        (void)read(_FD, &count, sizeof(count));
     }
     return count;
@@
-    if (_FD) {
-        close(_FD);
+    if (_FD != -1) {
+        close(_FD);
+        _FD = -1;
     }

Also ensure perf.c is only compiled on Linux (see my CMake comment).

Also applies to: 34-44, 46-52, 54-61, 63-68

lib/profiler/src/rep_read_test.c (1)

78-79: Use binary mode for portability

Open file in "rb" to avoid newline translation on platforms that differentiate text/binary.

lib/profiler/src/profiler.c (1)

52-60: Consider adding prof_init to reset state

Header declares prof_init but no implementation here. Provide a simple initializer.

Add this function:

void prof_init(void)
{
    memset(&_Prof, 0, sizeof(_Prof));
    CurrentAnchorLabel = NULL;
}
lib/profiler/include/internal/profiler_c.h (1)

24-31: Const-correctness for parent anchor label

parent_anchor should be const to match usage and avoid accidental mutation.

-    char *parent_anchor;
+    const char *parent_anchor;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d530239 and 7ec7212.

📒 Files selected for processing (21)
  • .gitignore (1 hunks)
  • CMakeLists.txt (3 hunks)
  • lib/profiler/CMakeLists.txt (1 hunks)
  • lib/profiler/README.md (1 hunks)
  • lib/profiler/include/internal/common.h (1 hunks)
  • lib/profiler/include/internal/profiler_c.h (1 hunks)
  • lib/profiler/include/macros.h (1 hunks)
  • lib/profiler/include/perf.h (1 hunks)
  • lib/profiler/include/profiler.h (1 hunks)
  • lib/profiler/include/rdtsc.h (1 hunks)
  • lib/profiler/include/repetition_tester.h (1 hunks)
  • lib/profiler/src/calc_cpu_freq.c (1 hunks)
  • lib/profiler/src/perf.c (1 hunks)
  • lib/profiler/src/profiler.c (1 hunks)
  • lib/profiler/src/rdtsc.c (1 hunks)
  • lib/profiler/src/rep_read_test.c (1 hunks)
  • lib/profiler/src/rep_return_data.c (1 hunks)
  • lib/profiler/src/repetition_tester.c (1 hunks)
  • lib/profiler/test/calc_cpu_freq.c (1 hunks)
  • src/main.c (3 hunks)
  • src/particle_engine.c (16 hunks)
🧰 Additional context used
🧬 Code graph analysis (11)
lib/profiler/include/perf.h (1)
lib/profiler/src/perf.c (4)
  • perf_setup (23-44)
  • perf_reset_page_fault_count (46-52)
  • perf_read_page_fault_count (54-61)
  • perf_close (63-68)
lib/profiler/src/profiler.c (1)
lib/profiler/src/rdtsc.c (2)
  • read_cpu_timer (30-33)
  • estimate_cpu_freq (35-55)
lib/profiler/src/calc_cpu_freq.c (4)
lib/profiler/src/rep_read_test.c (1)
  • main (125-181)
lib/profiler/src/rep_return_data.c (1)
  • main (89-112)
lib/profiler/test/calc_cpu_freq.c (1)
  • main (8-31)
lib/profiler/src/rdtsc.c (3)
  • get_os_time_freq (18-21)
  • read_cpu_timer (30-33)
  • read_os_timer (23-28)
lib/profiler/include/rdtsc.h (1)
lib/profiler/src/rdtsc.c (4)
  • get_os_time_freq (18-21)
  • read_os_timer (23-28)
  • read_cpu_timer (30-33)
  • estimate_cpu_freq (35-55)
lib/profiler/src/perf.c (1)
src/main.c (1)
  • close (1357-1423)
lib/profiler/src/repetition_tester.c (2)
lib/profiler/src/perf.c (4)
  • perf_setup (23-44)
  • perf_reset_page_fault_count (46-52)
  • perf_read_page_fault_count (54-61)
  • perf_close (63-68)
lib/profiler/src/rdtsc.c (1)
  • read_cpu_timer (30-33)
lib/profiler/include/internal/profiler_c.h (1)
lib/profiler/src/profiler.c (6)
  • prof_start (52-55)
  • prof_stop (57-60)
  • prof_add_anchor (69-85)
  • prof_print (119-174)
  • make_anchor_block (176-182)
  • read_anchor_block (184-189)
lib/profiler/include/repetition_tester.h (1)
lib/profiler/src/repetition_tester.c (6)
  • rept_setup (10-36)
  • rept_begin (38-45)
  • rept_end (47-54)
  • rept_count_bytes (56-60)
  • rept_is_testing (98-146)
  • rept_print_results (148-158)
lib/profiler/test/calc_cpu_freq.c (1)
lib/profiler/src/rdtsc.c (3)
  • get_os_time_freq (18-21)
  • read_cpu_timer (30-33)
  • read_os_timer (23-28)
lib/profiler/src/rep_read_test.c (3)
lib/profiler/src/repetition_tester.c (5)
  • rept_is_testing (98-146)
  • rept_begin (38-45)
  • rept_end (47-54)
  • rept_count_bytes (56-60)
  • rept_setup (10-36)
lib/profiler/src/rep_return_data.c (1)
  • main (89-112)
lib/profiler/src/rdtsc.c (1)
  • estimate_cpu_freq (35-55)
src/particle_engine.c (1)
src/random.c (1)
  • get_random (76-81)
🪛 markdownlint-cli2 (0.18.1)
lib/profiler/README.md

54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Mac (Clang)
  • GitHub Check: Ubuntu (mingw)
  • GitHub Check: Ubuntu (Clang)
  • GitHub Check: Mac (GCC)
  • GitHub Check: Windows (MSVC)
  • GitHub Check: Ubuntu (GCC)
🔇 Additional comments (12)
src/particle_engine.c (4)

68-84: LGTM! Clean slot allocation with capacity checks.

The helpers correctly validate capacity and return NULL on exhaustion, with helpful warnings for debugging.


86-96: LGTM! Macros follow best practices.

The do-while(0) pattern ensures safe macro expansion, and early return on NULL appropriately handles capacity exhaustion.


507-529: LGTM! Array-based rendering is correct.

The iteration logic correctly uses indices as counts with proper bounds checking.


532-535: Verify if indices should be reset.

With static allocation, no memory needs to be freed. However, confirm whether particle_engine_clear should reset global_idx and game_idx to 0 to clear all active particles. The current no-op may be intentional if particles are expected to persist across clear operations.

.gitignore (1)

20-20: LGTM!

Adding profile.txt to .gitignore is appropriate, as this file is generated by the profiler at runtime and should not be version-controlled.

lib/profiler/include/internal/common.h (1)

4-11: LGTM!

The ANSI color escape codes are standard and correctly defined. They'll provide useful colorization for profiler output.

lib/profiler/include/profiler.h (1)

4-19: LGTM!

The header structure is well-organized with proper C++ compatibility guards and conditional compilation. The includes are appropriately grouped under the PROFILER guard.

src/main.c (2)

65-67: LGTM!

The conditional inclusion of the profiler header is correctly guarded by the PROFILER preprocessor flag.


1472-1474: LGTM!

The profiler setup is correctly placed early in main(), after PHYSFS initialization but before the main game initialization. This ensures profiling captures the full execution.

lib/profiler/README.md (1)

14-16: Correct header include in README example
Replace the non-existent benchmark.h with the actual profiler.h in the usage snippet:

```c
 #include <stdio.h>
-#include "benchmark.h"
+#include "profiler.h"
lib/profiler/src/rdtsc.c (1)

23-28: Potential integer overflow in OS timer calculation.

Line 27 computes get_os_time_freq() * tv.tv_sec, which multiplies 1,000,000 by the seconds value. For large tv_sec values (e.g., years of uptime), this could overflow uint64_t.

Consider using a safer calculation order or checking for overflow:

 uint64_t read_os_timer(void)
 {
     struct timeval tv;
     gettimeofday(&tv, NULL);
-    return get_os_time_freq()*tv.tv_sec + tv.tv_usec;
+    uint64_t freq = get_os_time_freq();
+    // freq is 1,000,000, tv_usec is < 1,000,000
+    // So tv_usec is already in the right units
+    return freq * (uint64_t)tv.tv_sec + (uint64_t)tv.tv_usec;
 }

Note: While overflow is unlikely in practice (uint64_t can hold ~584 million years at microsecond resolution), casting ensures no implicit conversions occur.

Likely an incorrect or invalid review comment.

lib/profiler/include/repetition_tester.h (1)

8-12: Macro side effects and semicolon

Current macro ends with a semicolon inside; safe due to do{ }while(0); just ensure callers don’t add an extra semicolon if compiling with pedantic warnings. No change required.

@LiquidityC LiquidityC marked this pull request as draft October 11, 2025 13:13
@LiquidityC LiquidityC force-pushed the improve_particle_engine branch from 7ec7212 to 6624042 Compare October 11, 2025 14:08
@LiquidityC LiquidityC marked this pull request as ready for review October 11, 2025 14:08
@LiquidityC LiquidityC force-pushed the improve_particle_engine branch from 6624042 to 97c69bf Compare October 11, 2025 14:10
Replaces usage of linkedlists for statically allocated swaplists which
are a lot more cpu cache friendly.
The size cap can be addressed with a dynamiccally allocated swaplist
should the need arise in the future.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/particle_engine.c (1)

186-186: Consider adding robustness for the c_count-1 pattern.

The expression (unsigned int) c_count-1 will underflow to UINT_MAX if c_count is 0, causing an out-of-bounds array access. Current call sites (lines 194, 200, 206) always pass c_count >= 1, so this is safe in practice.

For robustness, consider adding an assertion at the function entry:

 static void
 create_explosion(Position pos, Dimension dim, unsigned int c_count, ...)
 {
+	assert(c_count > 0 && "c_count must be at least 1");
 	SDL_Color *colors = ec_malloc(c_count * sizeof(SDL_Color));
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6624042 and 97c69bf.

📒 Files selected for processing (1)
  • src/particle_engine.c (16 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/particle_engine.c (1)
src/random.c (1)
  • get_random (76-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Mac (GCC)
  • GitHub Check: Mac (Clang)
  • GitHub Check: Windows (MSVC)
  • GitHub Check: Ubuntu (mingw)
  • GitHub Check: Ubuntu (Clang)
  • GitHub Check: Ubuntu (GCC)
🔇 Additional comments (3)
src/particle_engine.c (3)

58-96: LGTM! Solid allocation and capacity management.

The fixed-size array approach with bounds checking is well-implemented:

  • Zero-initialization of the static engine is correct
  • Allocation helpers check capacity before returning a pointer
  • Post-increment ensures valid indices (0-4095) are used
  • Early-return macros prevent NULL propagation to callers

This eliminates dynamic allocation overhead and improves cache locality compared to linked lists.


425-443: LGTM! Critical issue from previous review has been correctly fixed.

The update logic with array compaction is now correct:

  • Prefix decrement --(*idx) decrements first, then accesses particles[*idx], preventing the out-of-bounds read flagged in the previous review
  • When a particle dies at position i, the last active particle is moved to position i, and *idx is decremented
  • The loop re-checks position i (without incrementing) to handle the newly moved particle, correctly processing chains of dead particles
  • Direct struct assignment is clearer than memcpy and allows compiler optimization

The compaction strategy maintains a dense array of active particles, improving cache locality during iteration.


98-110: Excellent performance refactor: array-based particle system.

The migration from linked lists to fixed-size arrays delivers the intended cache-locality improvements:

  • In-place construction (build_rect_particle) eliminates allocation overhead
  • Static arrays remove all dynamic memory management (no allocations, no frees)
  • Index-based iteration in update and render paths is cache-friendly
  • Simplified cleanup (line 534) - no-op due to static allocation

The consistent pattern across all particle generator functions (bloodspray, explosion, dust, wind, heat, sparkle, speed lines) makes the code maintainable and predictable.

Also applies to: 448-450, 507-529, 534-534

@LiquidityC LiquidityC enabled auto-merge (squash) October 11, 2025 14:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/particle_engine.c (1)

528-531: LGTM!

The no-op implementation is correct for static allocation. If desired, particle_engine_close could be simplified since it now just calls an empty function, but this is purely optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97c69bf and c3773ae.

📒 Files selected for processing (1)
  • src/particle_engine.c (16 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/particle_engine.c (1)
src/random.c (1)
  • get_random (76-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Mac (GCC)
  • GitHub Check: Mac (Clang)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Ubuntu (Clang)
  • GitHub Check: Ubuntu (GCC)
  • GitHub Check: Windows (MSVC)
  • GitHub Check: Ubuntu (mingw)
🔇 Additional comments (8)
src/particle_engine.c (8)

58-66: Well-designed cache-friendly refactor.

The switch from dynamic linked lists to fixed-size arrays is a solid performance optimization. Zero-initialization correctly sets indices to 0, and MAX_PARTICLES = 4096 provides a reasonable upper bound with overflow warnings in place.


68-96: LGTM!

The helper functions correctly check capacity before allocation, and the macros cleanly eliminate the need for NULL checks at each call site. The warning messages on capacity overflow are helpful for debugging.


98-110: LGTM!

The shift to in-place initialization via build_rect_particle eliminates allocation overhead. All particle fields are properly initialized.


120-396: LGTM!

All particle creation functions correctly use the new allocation macros. The past review concern about unreachable NULL checks has been resolved—the code no longer has redundant checks after NEXT_*_PARTICLE calls.


421-439: Excellent: Past critical issue resolved + correct compaction logic.

The swap-with-last pattern is implemented correctly with prefix decrement at line 436. This resolves the past critical out-of-bounds issue and ensures that when a particle dies, it's replaced by the last active particle without accessing beyond array bounds. The logic correctly re-checks the swapped particle on the next iteration.


442-446: LGTM!

The update calls correctly pass array pointers and index addresses to the refactored update_particles function.


503-525: LGTM!

The render functions correctly iterate over arrays by index. Array bounds are safe since size (passed as s_engine.*_idx) never exceeds MAX_PARTICLES.


58-531: Outstanding performance optimization.

The refactor achieves its stated goal: particle_engine_update dropped from 1.27% to 0.03% of total runtime (>40× improvement). The array-based approach provides superior cache locality, eliminates allocation overhead, and simplifies memory management—all while maintaining correctness. The past critical out-of-bounds issue has been properly resolved with prefix decrement.

@LiquidityC LiquidityC merged commit 5f94006 into dev Oct 11, 2025
9 checks passed
@LiquidityC LiquidityC deleted the improve_particle_engine branch October 11, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants