Skip to content

Regarding the slow performance issue of Android Chrome #759

@willo83417

Description

@willo83417

Model: Llama 3.2, 3B, Q4F16_1
Phone: Pixel 8 Pro
In computer and phone log observation, data in phone parts seems to be suppressed, causing content that is not long to take 20 seconds or more, which changes to 1B/1.7B (Qwen) results in approximation.

PC (AMD-680M)
prefill_chunk_size = 1024
buffer_size_required_bytes= 64MB :[WebLLM Stats] prefill: 111.7527 tok/s, decoding: 18.0826 tok/s (Prompt: 300 / Gen: 157) [F16: ON]
buffer_size_required_bytes= 128MB :[WebLLM Stats] prefill: 111.7943 tok/s, decoding: 15.5302 tok/s (Prompt: 300 / Gen: 170) [F16: ON]


Pixel 8 Pro
prefill_chunk_size = 1024
buffer_size_required_bytes= 64MB : [WebLLM Stats] prefill: 5.3816 tok/s, decoding: 5.0826 tok/s (Prompt: 148 / Gen: 47) [F16: ON]
buffer_size_required_bytes= 128MB :[WebLLM Stats] prefill: 5.3792 tok/s, decoding: 5.0428 tok/s (Prompt: 148 / Gen: 51) [F16: ON]
buffer_size_required_bytes= 256MB :[WebLLM Stats] prefill: 5.3907 tok/s, decoding: 5.0690 tok/s (Prompt: 148 / Gen: 48) [F16: ON]

prefill_chunk_size = 256 (Recompile WASM)
buffer_size_required_bytes= 256MB [WebLLM Stats] prefill: 5.3828 tok/s, decoding: 5.0957 tok/s (Prompt: 148 / Gen: 49) [F16: ON]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions