Skip to content

Commit 0c1c903

Browse files
committed
Fix OMP num specify issue
In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <[email protected]>
1 parent a073fa8 commit 0c1c903

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

driver/others/blas_server_omp.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ int exec_blas(BLASLONG num, blas_queue_t *queue){
335335
break;
336336
}
337337

338-
#pragma omp parallel for schedule(OMP_SCHED)
338+
#pragma omp parallel for num_threads(num) schedule(OMP_SCHED)
339339
for (i = 0; i < num; i ++) {
340340

341341
#ifndef USE_SIMPLE_THREADED_LEVEL3

0 commit comments

Comments
 (0)