Skip to content

O(N*Threads) memory overhead in IntegrateFromGeometry due to per-thread cloning #767

@rzhang20

Description

@rzhang20

The Problem

The current OpenMP implementation in ViewFactors.c forces a full deep copy of the Elements array for every single thread to avoid race conditions on mutable fields (like Flags and dynamic links).

This approach scales poorly:

  1. Memory Explosion: Memory usage grows as $O(N \times Threads)$, causing massive overhead for large geometries on many-core systems.
  2. Bus Saturation: Simultaneous memcpy calls by all threads at the start of the parallel region saturate memory bandwidth.

Problematic Code

#pragma omp parallel
{
Geometry_t *lel;
lel = (Geometry_t *)malloc(N*sizeof(Geometry_t));
memcpy(lel,Elements,N*sizeof(Geometry_t));
#pragma omp for private(i,j,k,l,Fact) schedule(dynamic,10)
for( i=0; i<N; i++ )
{
for( j=i; j<N; j++ ) Factors[i*N+j] = 0.0;
if ( lel[i].Area<1.0e-10 ) continue;
for( j=i+1; j<N; j++ )
{
if ( lel[j].Area<1.0e-10 ) continue;
FreeLinks( &lel[i] );
FreeLinks( &lel[j] );
lel[j].Flags |= GEOMETRY_FLAG_LEAF;
lel[i].Flags |= GEOMETRY_FLAG_LEAF;
(*ViewFactorCompute[lel[i].GeometryType])( &lel[i],&lel[j],0,0 );
Fact = ComputeViewFactorValue( &lel[i],0 );
Factors[i*N+j] = Fact / lel[i].Area;
Factors[j*N+i] = Fact / lel[j].Area;
}
FreeChilds( lel[i].Left );
lel[i].Left = NULL;
FreeChilds( lel[i].Right );
lel[i].Right = NULL;
}
free(lel);
}

Proposed Solution

Refactor the solver to be const-correct by decoupling the immutable geometry data from the mutable traversal state.

  1. Separate State: Isolate mutable fields (e.g., Flags, Left, Right) into a lightweight TraversalContext_t.
  2. Shared Access: Treat the main Geometry_t array as read-only (const) shared memory.
  3. Local Context: Allocate only the lightweight context per thread.

Suggested Implementation

/* 1. New lightweight state structure */
typedef struct {
    int Flags; 
    void *Left;
    void *Right;
} TraversalContext_t;

/* 2. Optimized Parallel Region */
#pragma omp parallel
{
    // PROPOSED: Allocate only lightweight mutable state (No memcpy needed)
    TraversalContext_t *local_ctx = (TraversalContext_t *)calloc(N, sizeof(TraversalContext_t));

    #pragma omp for schedule(dynamic,10)
    for(int i=0; i<N; i++) {
        // Access global geometry as Read-Only
        const Geometry_t *shared_el_i = &Elements[i]; 
        
        for(int j=i+1; j<N; j++) {
             // Pass separated state explicitly to the compute kernel
             ViewFactorCompute_Safe(shared_el_i, &local_ctx[i], 
                                    &Elements[j],    &local_ctx[j], ...);
        }
    }
    free(local_ctx);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions