O(N*Threads) memory overhead in `IntegrateFromGeometry` due to per-thread cloning

**The Problem**

The current OpenMP implementation in `ViewFactors.c` forces a **full deep copy** of the `Elements` array for every single thread to avoid race conditions on mutable fields (like `Flags` and dynamic links).

This approach scales poorly:

1. **Memory Explosion:** Memory usage grows as $O(N \times Threads)$, causing massive overhead for large geometries on many-core systems.
2. **Bus Saturation:** Simultaneous `memcpy` calls by all threads at the start of the parallel region saturate memory bandwidth.

**Problematic Code**

https://github.com/ElmerCSC/elmerfem/blob/7a8a33ab4bf30eff6162ff3ec150b4507fe5906e/fem/src/view3d/ViewFactors.c#L213-L248



**Proposed Solution**

Refactor the solver to be **const-correct** by decoupling the immutable geometry data from the mutable traversal state.

1. **Separate State:** Isolate mutable fields (e.g., `Flags`, `Left`, `Right`) into a lightweight `TraversalContext_t`.
2. **Shared Access:** Treat the main `Geometry_t` array as read-only (`const`) shared memory.
3. **Local Context:** Allocate only the lightweight context per thread.

**Suggested Implementation**

```
/* 1. New lightweight state structure */
typedef struct {
    int Flags; 
    void *Left;
    void *Right;
} TraversalContext_t;

/* 2. Optimized Parallel Region */
#pragma omp parallel
{
    // PROPOSED: Allocate only lightweight mutable state (No memcpy needed)
    TraversalContext_t *local_ctx = (TraversalContext_t *)calloc(N, sizeof(TraversalContext_t));

    #pragma omp for schedule(dynamic,10)
    for(int i=0; i<N; i++) {
        // Access global geometry as Read-Only
        const Geometry_t *shared_el_i = &Elements[i]; 
        
        for(int j=i+1; j<N; j++) {
             // Pass separated state explicitly to the compute kernel
             ViewFactorCompute_Safe(shared_el_i, &local_ctx[i], 
                                    &Elements[j],    &local_ctx[j], ...);
        }
    }
    free(local_ctx);
}
```

	#pragma omp parallel
	{
	Geometry_t *lel;
	lel = (Geometry_t )malloc(Nsizeof(Geometry_t));
	memcpy(lel,Elements,N*sizeof(Geometry_t));

	#pragma omp for private(i,j,k,l,Fact) schedule(dynamic,10)
	for( i=0; i<N; i++ )
	{
	for( j=i; j<N; j++ ) Factors[i*N+j] = 0.0;
	if ( lel[i].Area<1.0e-10 ) continue;

	for( j=i+1; j<N; j++ )
	{
	if ( lel[j].Area<1.0e-10 ) continue;

	FreeLinks( &lel[i] );
	FreeLinks( &lel[j] );

	lel[j].Flags \|= GEOMETRY_FLAG_LEAF;
	lel[i].Flags \|= GEOMETRY_FLAG_LEAF;

	(*ViewFactorCompute[lel[i].GeometryType])( &lel[i],&lel[j],0,0 );
	Fact = ComputeViewFactorValue( &lel[i],0 );
	Factors[i*N+j] = Fact / lel[i].Area;
	Factors[j*N+i] = Fact / lel[j].Area;
	}

	FreeChilds( lel[i].Left );
	lel[i].Left = NULL;

	FreeChilds( lel[i].Right );
	lel[i].Right = NULL;
	}
	free(lel);
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

O(N*Threads) memory overhead in `IntegrateFromGeometry` due to per-thread cloning #767

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

O(N*Threads) memory overhead in IntegrateFromGeometry due to per-thread cloning #767

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

O(N*Threads) memory overhead in `IntegrateFromGeometry` due to per-thread cloning #767