Skip to content

Commit dc8e4e1

Browse files
authored
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable
1 parent cccd143 commit dc8e4e1

File tree

2 files changed

+17
-2
lines changed

2 files changed

+17
-2
lines changed

Makefile.rule

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,22 @@ COMMON_PROF = -pg
279279

280280
# If you want to enable the experimental BFLOAT16 support
281281
# BUILD_HALF = 1
282-
#
282+
283+
284+
# Set the thread number threshold beyond which the job array for the threaded level3 BLAS
285+
# will be allocated on the heap rather than the stack. (This array alone requires
286+
# NUM_THREADS*NUM_THREADS*128 bytes of memory so should not pose a problem at low cpu
287+
# counts, but obviously it is not the only item that ends up on the stack.
288+
# The default value of 32 ensures that the overall requirement is compatible
289+
# with the default 1MB stacksize imposed by having the Java VM loaded without use
290+
# of its -Xss parameter.
291+
# The value of 160 formerly used from about version 0.2.7 until 0.3.10 is easily compatible
292+
# with the common Linux stacksize of 8MB but will cause crashes with unwary use of the java
293+
# VM e.g. in Octave or with the java-based libhdfs in numpy or scipy code
294+
# BLAS3_MEM_ALLOC_THRESHOLD = 160
295+
296+
297+
283298
# the below is not yet configurable, use cmake if you need to build only select types
284299
BUILD_SINGLE = 1
285300
BUILD_DOUBLE = 1

common.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ please https://github.com/xianyi/OpenBLAS/issues/246
402402
#endif
403403

404404
#ifndef BLAS3_MEM_ALLOC_THRESHOLD
405-
#define BLAS3_MEM_ALLOC_THRESHOLD 160
405+
#define BLAS3_MEM_ALLOC_THRESHOLD 32
406406
#endif
407407

408408
#ifdef QUAD_PRECISION

0 commit comments

Comments
 (0)