Skip to content

Conversation

@stefano-p
Copy link
Contributor

Fix critical bug in tau_realloc causing memory leaks and undefined behavior

🐛 Bug Analysis

The current implementation of tau_realloc contains a critical logic error that causes memory leaks and potentially undefined behavior:

// CURRENT BUGGY CODE
static inline void* tau_realloc(void* const ptr, const tau_ull new_size) {
    void* const new_ptr = realloc(ptr, new_size);
    
    if(TAU_NONE(new_ptr))  // If realloc fails (returns NULL)
        free(new_ptr);     // BUG: This frees NULL, not the original memory!
    
    return new_ptr;
}

The Problem

  1. Memory Leak: When realloc() fails:

    • It returns NULL
    • The original memory block pointed to by ptr remains allocated and valid
    • The code incorrectly attempts free(NULL) which is a no-op
    • The original memory is never freed → memory leak
  2. Silent Failure: The function returns NULL to callers, but:

    • No error is reported
    • Calling code doesn't check the return value
    • Subsequent code will dereference NULL → segmentation fault
  3. Real-world Impact: This bug affects critical paths:

    // In test registration (lines ~1182, ~1201)
    tauTestContext.tests = tau_realloc(...);
    tauTestContext.tests[index].func = ...;  // CRASH if realloc failed!
    
    // In failure tracking (line ~1477)
    tauStatsFailedTestSuites = tau_realloc(...);
    tauStatsFailedTestSuites[failed_testcase_index] = i;  // CRASH if realloc failed!

✅ The Fix

static inline void* tau_realloc(void* const ptr, const tau_ull new_size) {
    void* const new_ptr = realloc(ptr, new_size);
    
    if(TAU_NONE(new_ptr) && new_size > 0) {
        fprintf(stderr, "FATAL: tau_realloc failed to allocate %" TAU_PRIu64 " bytes\n",
                (tau_u64)new_size);
        TAU_ABORT;
    }
    
    return new_ptr;
}

📋 Design Rationale

Why Abort Instead of Continuing?

  1. Memory allocation failure is catastrophic for a test framework:

    • Cannot register new tests
    • Cannot track test results
    • Cannot continue in a predictable state
  2. Fail-fast philosophy:

    • Better to stop immediately with clear error
    • Prevents cascading failures and confusing crashes
    • Makes debugging easier (error at point of failure, not later)
  3. No memory leak on abort:

    • Modern operating systems (Linux/Windows/macOS) reclaim ALL process memory on exit
    • The OS performs complete cleanup of heap, stack, handles, etc.
    • This is standard and reliable behavior
  4. Alternative approaches considered but rejected:

    // Option A: Free original memory on failure
    if(TAU_NONE(new_ptr)) {
        free(ptr);  // Lost data, caller can't recover
        return NULL;
    }
    
    // Option B: Return NULL, keep original memory  
    return new_ptr;  // Caller must handle, but they don't check!
    
    // Option C: Try to continue with original memory
    return ptr;  // Wrong size! Will cause corruption

    All alternatives lead to worse outcomes: data loss, crashes, or memory corruption.

🔧 Implementation Details

  • Error reporting: Uses fprintf(stderr, ...) as stderr is always available
  • Format specifier: Uses TAU_PRIu64 macro for cross-compiler compatibility (MSVC vs GCC/Clang)
  • Type casting: Casts tau_ull to tau_u64 to match the format specifier
  • Abort mechanism: Uses framework's TAU_ABORT macro for consistent termination
  • Zero-size check: Only aborts if new_size > 0 (realloc with size 0 is valid for freeing memory)

✅ Testing Performed

  • Tested that normal operation is unaffected when memory is available

🎯 Impact

This fix prevents:

  • Silent memory leaks during test execution
  • Segmentation faults from NULL dereference
  • Undefined behavior from continuing with failed allocations
  • Confusing crashes distant from the actual failure point

The fail-fast approach with clear error messaging makes the framework more robust and debuggable.

📈 Before and After

Before (Buggy Behavior)

Test runs → realloc fails → returns NULL → free(NULL) does nothing → 
memory leak → code continues → dereferences NULL → SEGFAULT

After (Fixed Behavior)

Test runs → realloc fails → returns NULL → prints clear error → 
aborts cleanly → OS reclaims all memory → no leak, no corruption

With this fix, you'll see a clear error message instead of a mysterious crash.

The current implementation of `tau_realloc` has a critical bug where it attempts to free a NULL pointer instead of the original memory when realloc fails, causing memory leaks.
@stefano-p stefano-p requested a review from jasmcaus as a code owner August 19, 2025 12:48
@jasmcaus
Copy link
Owner

good catch!

@jasmcaus jasmcaus merged commit 743554d into jasmcaus:dev Aug 19, 2025
13 of 87 checks passed
@stefano-p stefano-p deleted the fix/tau_realloc branch August 19, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants