Skip to content

Commit edffa2a

Browse files
authored
Optimize MlasComputeSoftmax with prefetch (#20393)
The prefetching instructions (_mm_prefetch) is used to anticipate memory accesses by prefetching the next row of the input buffer. This optimization is designed to reduce the impact of memory latency, thereby enhancing the performance of the MlasComputeSoftmax function. As a result, the worst-case performance of the OCR model has improved by approximately 50ms, which equates to a 3% improvement.
1 parent a077330 commit edffa2a

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

onnxruntime/core/mlas/lib/compute.cpp

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -850,8 +850,24 @@ Return Value:
850850
const float* Input = WorkBlock->Input + n * D;
851851
float* Output = WorkBlock->Output + n * D;
852852

853+
#if defined(MLAS_SSE2_INTRINSICS)
854+
// TODO: Use std::hardware_constructive_interference_size
855+
constexpr size_t CacheLineSize = 64;
856+
constexpr size_t ElementsPerCacheLine = CacheLineSize / sizeof(float);
857+
#endif
858+
853859
while (CountN > 0) {
854860

861+
#if defined(MLAS_SSE2_INTRINSICS)
862+
//
863+
// Prefetch the next row of the input buffer.
864+
//
865+
866+
for (size_t i = 0; i * ElementsPerCacheLine < D; i++) {
867+
_mm_prefetch((char*)(Input + D) + i * CacheLineSize, _MM_HINT_T0);
868+
}
869+
#endif
870+
855871
//
856872
// Find the maximum value for the row.
857873
//

0 commit comments

Comments
 (0)