Skip to content

Commit 5cd4880

Browse files
hexagon: bump the thread count in the adb wrapper scripts
We can use more CPU cores now that the dedicated dspqueue polling threads are not used (ie no contention). Also enable more agressive polling for now since we still map Flash Attention (and a few other kernels) to the CPU and those dspqueue threads were keeping the CPU cores are higher clock freqs.
1 parent ac7a334 commit 5cd4880

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

scripts/snapdragon/adb/run-bench.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,6 @@ adb $adbserial shell " \
3535
LD_LIBRARY_PATH=$basedir/$branch/lib \
3636
ADSP_LIBRARY_PATH=$basedir/$branch/lib \
3737
$ndev $nhvx $opmask ./$branch/bin/llama-bench --device $device --mmap 0 -m $basedir/../gguf/$model \
38-
-t 4 --batch-size 128 -ngl 99 $@ \
38+
--poll 1000 -t 6 --cpu-mask 0xfc --cpu-strict 1 \
39+
--batch-size 128 -ngl 99 $@ \
3940
"

scripts/snapdragon/adb/run-cli.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,9 @@ adb $adbserial shell " \
4545
cd $basedir; ulimit -c unlimited; \
4646
LD_LIBRARY_PATH=$basedir/$branch/lib \
4747
ADSP_LIBRARY_PATH=$basedir/$branch/lib \
48-
$verbose $experimental $sched $opmask $profile $nhvx $ndev \
49-
./$branch/bin/llama-cli --no-mmap -m $basedir/../gguf/$model \
50-
-t 4 --ctx-size 8192 --batch-size 128 -ctk q8_0 -ctv q8_0 -fa on \
48+
$verbose $experimental $sched $opmask $profile $nhvx $ndev \
49+
./$branch/bin/llama-cli --no-mmap -m $basedir/../gguf/$model \
50+
--poll 1000 -t 6 --cpu-mask 0xfc --cpu-strict 1 \
51+
--ctx-size 8192 --batch-size 128 -ctk q8_0 -ctv q8_0 -fa on \
5152
-ngl 99 --device $device $cli_opts $@ \
5253
"

0 commit comments

Comments
 (0)