Why larger ubatch size didn't improve performance when inference with cpu on Android devices? #14886

Msyu1020 · 2025-07-26T11:19:52Z

Msyu1020
Jul 26, 2025

Hi, I’m trying to understand how ubatch sizes impact inference performance on Android devices. I’ve verified that when using the GPU for inference, different ubatch sizes result in varying performance on my Xiaomi 14 Pro. However, when using the CPU, the ubatch size doesn’t seem to affect performance significantly.

In my view, a larger ubatch size allows for the calculation of more tokens at once, but it also requires a larger memory allocation to store temporary variables. I'm curious why the performance on the CPU shows almost no difference with different ubatch sizes when all other settings remain the same.

Code version: b5811
Device: Xiaomi 14 Pro ( Android / Adreno GPU )

CPU

model	size	params	backend	n_ubatch	test	pp t/s
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	64	pp64	8.47
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	32	pp64	8.76
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	16	pp64	8.65

GPU

model	size	params	backend	ngl	n_ubatch	test	pp t/s
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	99	64	pp64	73.35
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	99	32	pp64	54.17
llama 7B Q4_0	3.56 GiB	6.74 B	OpenCL	99	16	pp64	37.52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why larger ubatch size didn't improve performance when inference with cpu on Android devices? #14886

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why larger ubatch size didn't improve performance when inference with cpu on Android devices? #14886

Uh oh!

Uh oh!

Msyu1020 Jul 26, 2025

Replies: 0 comments

Msyu1020
Jul 26, 2025