-
Notifications
You must be signed in to change notification settings - Fork 173
Open
Description
Performance Issue: Unnecessary Matrix Copy in MMCS Sampling
π Summary
The MMCS (Multiphase Monte Carlo Sampling) implementation performs an expensive matrix transpose operation that creates an unnecessary full matrix copy. This impacts performance, especially for high-dimensional sampling scenarios with large sample sizes.
π Problem Description
Current Implementation
In include/sampling/mmcs.hpp (line 234) and related files, the code performs:
MT Samples = TotalRandPoints.transpose(); //do not copy TODO!
for (int i = 0; i < total_samples; i++)
{
Samples.col(i) = T * Samples.col(i) + T_shift;
}
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
S.block(0, total_number_of_samples_in_P0, P.dimension(), total_samples) =
Samples.block(0, 0, P.dimension(), total_samples);The Problem
Current Flow (Inefficient):
TotalRandPoints [EXPENSIVE COPY] Samples S (Final)
(n Γ d matrix) transpose() creates (d Γ n matrix) (d Γ n matrix)
βββββββββββββββ full copy βββββββββββββββ βββββββββββββββ
β Row 0: xβ β ββββββββββββββββββββββββββββββ>β Col 0: xβ β β Col 0: Txβ β
β Row 1: xβ β β Memory Copy β Col 1: xβ β β Col 1: Txβ β
β Row 2: xβ β β O(nΓd) operation β Col 2: xβ β β Col 2: Txβ β
β ... β β ... β β ... β
β Row n: xβ β β Col n: xβ β β Col n: Txβ β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
β β β
β Transform: T * col + shift β β
β ββββββββββββββββββββββ
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total: 2Γ memory operations + transpose overhead
Memory Flow:
- Read
TotalRandPoints(nΓd elements) - Allocate new matrix
Samples(dΓn elements) β WASTED MEMORY - Copy and transpose all elements β EXPENSIVE OPERATION
- Transform each column:
T * Samples.col(i) + T_shift - Copy transformed data to
S
Issues:
-
Memory Overhead: Creates a full copy of the transposed matrix
- For typical case: 50 dimensions Γ 5000 samples = 250,000 elements
- Memory: ~2MB (double precision) wasted per iteration
-
Computation Overhead: O(nΓd) transpose operation
- Unnecessary for the actual computation needed
-
Memory Bandwidth: Doubles memory traffic
- Read from
TotalRandPointsβ Write toSamplesβ Read fromSamplesβ Write toS
- Read from
Matrix Dimensions
TotalRandPoints: (num_samples Γ dimension)
- Each row represents one sample point
- Shape: [total_samples, P.dimension()]
Samples (transposed copy): (dimension Γ num_samples)
- Each column represents one sample point (transposed)
- Shape: [P.dimension(), total_samples]
S (final result): (dimension Γ total_samples)
- Each column is a transformed sample
- Shape: [P.dimension(), total_number_of_samples_in_P0 + total_samples]
π‘ Proposed Solution
Optimized Implementation
Instead of creating a transposed copy, work directly with rows of TotalRandPoints:
// Optimized: avoid transpose copy by working directly with rows
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
for (int i = 0; i < total_samples; i++)
{
// Transform each sample: T * sample + T_shift, then store as column in S
S.col(total_number_of_samples_in_P0 + i).noalias() =
T * TotalRandPoints.row(i).transpose() + T_shift;
}Optimized Flow
Optimized Flow (Efficient):
TotalRandPoints S (Final)
(n Γ d matrix) (d Γ n matrix)
βββββββββββββββ βββββββββββββββ
β Row 0: xβ β ββDirect: T * row(i).T + shiftββββββ>β Col 0: Txβ β
β Row 1: xβ β β
No copy β Col 1: Txβ β
β Row 2: xβ β β
Single pass β Col 2: Txβ β
β ... β β
Efficient β ... β
β Row n: xβ β β Col n: Txβ β
βββββββββββββββ βββββββββββββββ
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Direct transformation, no intermediate copy
Memory Flow:
- Read
TotalRandPoints.row(i)(single row, d elements) - Transform:
T * row(i).transpose() + T_shift(in-place computation) - Assign directly to
S.col(i)withnoalias()β NO COPY - Repeat for all rows
Key Improvement:
- β Before: Read β Copy β Transpose β Transform β Copy = 5 operations
- β After: Read β Transform β Assign = 3 operations
Benefits:
- β
No Memory Copy: Eliminates intermediate
Samplesmatrix - β Direct Computation: Transform and assign in one step
- β Better Cache Usage: Single pass through data
- β
Uses
noalias(): Prevents Eigen from creating temporaries
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels