Skip to content

Efficiency improvements for error code/class management #13453

@naughtont3

Description

@naughtont3

Is your feature request related to a problem? Please describe.

During the work on fixing multi-threaded OBJ_RELEASE issues in MPI_Remove_error_code (commits 44dee9a and 6d3ee24), we identified an efficiency issue with how error codes and classes are allocated in the ompi_mpi_errcodes pointer array.

The current implementation in ompi/errhandler/errcode.c always increments ompi_mpi_errcode_lastused and allocates at the end of the array, even when there are NULL slots available from previously removed error codes/classes. When ompi_mpi_errcode_remove() or ompi_mpi_errclass_remove() is called, they set the pointer array entry to NULL, but these NULL slots are never reused, causing the array to grow indefinitely even if the actual number of active error codes/classes remains constant or decreases.

Impact:

  1. Memory waste: The pointer array continues to grow even when error codes/classes are removed
  2. Cache inefficiency: Sparse arrays lead to poor cache locality
  3. Handle exhaustion: In long-running applications with many add/remove cycles, the array could grow unnecessarily large

Current implementation details:

ompi_mpi_errcode_add() (errcode.c:376-390):

int ompi_mpi_errcode_add(int errclass)
{
    ompi_mpi_errcode_t *newerrcode;

    newerrcode = OBJ_NEW(ompi_mpi_errcode_t);

    opal_mutex_lock(&errcode_lock);
    newerrcode->code = (ompi_mpi_errcode_lastused+1);
    newerrcode->cls = errclass;
    opal_pointer_array_set_item(&ompi_mpi_errcodes, newerrcode->code, newerrcode
);
    ompi_mpi_errcode_lastused++;
    opal_mutex_unlock(&errcode_lock);

    return newerrcode->code;
}

ompi_mpi_errclass_add() (errcode.c:392-405):

int ompi_mpi_errclass_add(void)
{
    ompi_mpi_errcode_t *newerrcode;

    newerrcode = OBJ_NEW(ompi_mpi_errcode_t);

    opal_mutex_lock(&errcode_lock);
    newerrcode->cls = (ompi_mpi_errcode_lastused+1);
    opal_pointer_array_set_item(&ompi_mpi_errcodes, newerrcode->cls, newerrcode)
;
    ompi_mpi_errcode_lastused++;
    opal_mutex_unlock(&errcode_lock);

    return newerrcode->cls;
}

Describe the solution you'd like

Implement a mechanism to track and reuse NULL slots in the ompi_mpi_errcodes pointer array before allocating new entries at the end. The preferred approach is to scan for free slots:

Before incrementing ompi_mpi_errcode_lastused, scan the array from ompi_mpi_errcode_lastpredefined + 1 to ompi_mpi_errcode_lastused for NULL entries:

int ompi_mpi_errcode_add(int errclass)
{
    ompi_mpi_errcode_t *newerrcode;
    int new_code = -1;

    newerrcode = OBJ_NEW(ompi_mpi_errcode_t);

    opal_mutex_lock(&errcode_lock);

    // Try to find a free slot first
    for (int i = ompi_mpi_errcode_lastpredefined + 1; i <= ompi_mpi_errcode_lastused; i++) {
        if (NULL == opal_pointer_array_get_item(&ompi_mpi_errcodes, i)) {
            new_code = i;
            break;
        }
    }

    // If no free slot found, allocate at the end
    if (new_code < 0) {
        new_code = ompi_mpi_errcode_lastused + 1;
        ompi_mpi_errcode_lastused++;
    }

    newerrcode->code = new_code;
    newerrcode->cls = errclass;
    opal_pointer_array_set_item(&ompi_mpi_errcodes, new_code, newerrcode);

    opal_mutex_unlock(&errcode_lock);

    return new_code;
}

Benefits:

  • Simple implementation with no additional data structures
  • Minimal performance overhead for typical usage patterns
  • Efficient memory utilization
  • Better cache locality with denser arrays

Implementation files to modify:

  • ompi/errhandler/errcode.c: Modify ompi_mpi_errcode_add() and ompi_mpi_errclass_add()

Testing requirements:

  1. Add error code/class
  2. Remove it
  3. Add another error code/class
  4. Verify that the second add reuses the first slot (same error code value)
  5. Test with multiple add/remove cycles
  6. Thread safety: Ensure concurrent add/remove operations work correctly

The scan-for-free-slots approach is preferred initially because it's simple and straightforward. It can possibly be optimized later to a free list approach if profiling shows it's a bottleneck.

Additional context

This efficiency improvement builds upon recent bug fixes:

  • Commit 44dee9a: "properly release errcode object and skip NULL on finalize"
  • Commit 6d3ee24: "avoid data race setting LASTUSED attribute"
  • Commit da50804: "Support MPI 4.1 removal of error class/code/string"

These commits fixed the multi-threading issues with OBJ_RELEASE and proper NULL handling, which are prerequisites for safely implementing efficient slot reuse.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions