-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Model: Llama 3.2, 3B, Q4F16_1
Phone: Pixel 8 Pro
In computer and phone log observation, data in phone parts seems to be suppressed, causing content that is not long to take 20 seconds or more, which changes to 1B/1.7B (Qwen) results in approximation.
PC (AMD-680M)
prefill_chunk_size = 1024
buffer_size_required_bytes= 64MB :[WebLLM Stats] prefill: 111.7527 tok/s, decoding: 18.0826 tok/s (Prompt: 300 / Gen: 157) [F16: ON]
buffer_size_required_bytes= 128MB :[WebLLM Stats] prefill: 111.7943 tok/s, decoding: 15.5302 tok/s (Prompt: 300 / Gen: 170) [F16: ON]
Pixel 8 Pro
prefill_chunk_size = 1024
buffer_size_required_bytes= 64MB : [WebLLM Stats] prefill: 5.3816 tok/s, decoding: 5.0826 tok/s (Prompt: 148 / Gen: 47) [F16: ON]
buffer_size_required_bytes= 128MB :[WebLLM Stats] prefill: 5.3792 tok/s, decoding: 5.0428 tok/s (Prompt: 148 / Gen: 51) [F16: ON]
buffer_size_required_bytes= 256MB :[WebLLM Stats] prefill: 5.3907 tok/s, decoding: 5.0690 tok/s (Prompt: 148 / Gen: 48) [F16: ON]
prefill_chunk_size = 256 (Recompile WASM)
buffer_size_required_bytes= 256MB [WebLLM Stats] prefill: 5.3828 tok/s, decoding: 5.0957 tok/s (Prompt: 148 / Gen: 49) [F16: ON]