-
Notifications
You must be signed in to change notification settings - Fork 365
Open
Description
The Problem
The current OpenMP implementation in ViewFactors.c forces a full deep copy of the Elements array for every single thread to avoid race conditions on mutable fields (like Flags and dynamic links).
This approach scales poorly:
-
Memory Explosion: Memory usage grows as
$O(N \times Threads)$ , causing massive overhead for large geometries on many-core systems. -
Bus Saturation: Simultaneous
memcpycalls by all threads at the start of the parallel region saturate memory bandwidth.
Problematic Code
elmerfem/fem/src/view3d/ViewFactors.c
Lines 213 to 248 in 7a8a33a
| #pragma omp parallel | |
| { | |
| Geometry_t *lel; | |
| lel = (Geometry_t *)malloc(N*sizeof(Geometry_t)); | |
| memcpy(lel,Elements,N*sizeof(Geometry_t)); | |
| #pragma omp for private(i,j,k,l,Fact) schedule(dynamic,10) | |
| for( i=0; i<N; i++ ) | |
| { | |
| for( j=i; j<N; j++ ) Factors[i*N+j] = 0.0; | |
| if ( lel[i].Area<1.0e-10 ) continue; | |
| for( j=i+1; j<N; j++ ) | |
| { | |
| if ( lel[j].Area<1.0e-10 ) continue; | |
| FreeLinks( &lel[i] ); | |
| FreeLinks( &lel[j] ); | |
| lel[j].Flags |= GEOMETRY_FLAG_LEAF; | |
| lel[i].Flags |= GEOMETRY_FLAG_LEAF; | |
| (*ViewFactorCompute[lel[i].GeometryType])( &lel[i],&lel[j],0,0 ); | |
| Fact = ComputeViewFactorValue( &lel[i],0 ); | |
| Factors[i*N+j] = Fact / lel[i].Area; | |
| Factors[j*N+i] = Fact / lel[j].Area; | |
| } | |
| FreeChilds( lel[i].Left ); | |
| lel[i].Left = NULL; | |
| FreeChilds( lel[i].Right ); | |
| lel[i].Right = NULL; | |
| } | |
| free(lel); | |
| } |
Proposed Solution
Refactor the solver to be const-correct by decoupling the immutable geometry data from the mutable traversal state.
- Separate State: Isolate mutable fields (e.g.,
Flags,Left,Right) into a lightweightTraversalContext_t. - Shared Access: Treat the main
Geometry_tarray as read-only (const) shared memory. - Local Context: Allocate only the lightweight context per thread.
Suggested Implementation
/* 1. New lightweight state structure */
typedef struct {
int Flags;
void *Left;
void *Right;
} TraversalContext_t;
/* 2. Optimized Parallel Region */
#pragma omp parallel
{
// PROPOSED: Allocate only lightweight mutable state (No memcpy needed)
TraversalContext_t *local_ctx = (TraversalContext_t *)calloc(N, sizeof(TraversalContext_t));
#pragma omp for schedule(dynamic,10)
for(int i=0; i<N; i++) {
// Access global geometry as Read-Only
const Geometry_t *shared_el_i = &Elements[i];
for(int j=i+1; j<N; j++) {
// Pass separated state explicitly to the compute kernel
ViewFactorCompute_Safe(shared_el_i, &local_ctx[i],
&Elements[j], &local_ctx[j], ...);
}
}
free(local_ctx);
}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels