Skip to content

Garbage when using librkllmrt v1.2.3 #102

@iliakonnov

Description

@iliakonnov

I've been using an outdated docker image ghcr.io/notpunchnox/rkllama:main and everything went smoothly.

However, after I've built the image from this repo, I see that LLM starts repeats a single character after just a few successful tokens:

You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)

This may be a different character (I've seen ! on some models) and usually comes after a comma. I'm using DeepSeek-R1-Distill-Qwen-1.5B-rk3588-1.2.0, however the same issue happens with other models too.

So I've decided to iterate over versions librkllmrt.so (since I've seen rkllama working successfuly) and started running container like this:

podman run --privileged --rm -it --name rkllama \
  -v ./models/:/opt/rkllama/models:ro \
  -v ./librkllmrt/v1.2.3.so:/usr/local/lib/python3.12/site-packages/rkllama/lib/librkllmrt.so:ro \
  rkllama

This way I can quickly try different versions of the librkllmrt.so.

This way I found out that this issue happens only with v1.2.3:

v1.2.1

Server:

I rkllm: rkllm-runtime version: 1.2.1, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:52:58,362 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...

Chat:

You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.
</think>

Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.

v1.2.2

Server:

I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:00,209 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...

Chat:

You: Hello
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.
</think>

Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.

v1.2.3

Server:

I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:54,760 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...

Chat:

You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)

All versions of the librkllmrt are from different revisions from this repo.

$ sha256sum ./librkllmrt/*
a7e6f87f07bbb08058cad4871cc74e8069a054fe4f6259b43c29a4738b0affdd  ./librkllmrt/v1.2.1.so
39d01912e67027de32c527be04684bf813e2a49c2d09ab8f6bcf47b34a43789d  ./librkllmrt/v1.2.2.so
bbcf28a8666b9fbf7361d6aad892b957920f6ea92400c074899b48f4c5b2c96f  ./librkllmrt/v1.2.3.so

I see nothing suspicious neither in rkllama logs nor in dmesg. However, I can see that /sys/kernel/debug/rknpu/load shows ~70% load on all three cores while generating characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions