-
Notifications
You must be signed in to change notification settings - Fork 67
Description
I've been using an outdated docker image ghcr.io/notpunchnox/rkllama:main and everything went smoothly.
However, after I've built the image from this repo, I see that LLM starts repeats a single character after just a few successful tokens:
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)
This may be a different character (I've seen ! on some models) and usually comes after a comma. I'm using DeepSeek-R1-Distill-Qwen-1.5B-rk3588-1.2.0, however the same issue happens with other models too.
So I've decided to iterate over versions librkllmrt.so (since I've seen rkllama working successfuly) and started running container like this:
podman run --privileged --rm -it --name rkllama \
-v ./models/:/opt/rkllama/models:ro \
-v ./librkllmrt/v1.2.3.so:/usr/local/lib/python3.12/site-packages/rkllama/lib/librkllmrt.so:ro \
rkllama
This way I can quickly try different versions of the librkllmrt.so.
This way I found out that this issue happens only with v1.2.3:
v1.2.1
Server:
I rkllm: rkllm-runtime version: 1.2.1, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:52:58,362 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
Chat:
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.
</think>
Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.
v1.2.2
Server:
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:00,209 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
Chat:
You: Hello
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.
</think>
Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.
v1.2.3
Server:
I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:54,760 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
Chat:
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)
All versions of the librkllmrt are from different revisions from this repo.
$ sha256sum ./librkllmrt/*
a7e6f87f07bbb08058cad4871cc74e8069a054fe4f6259b43c29a4738b0affdd ./librkllmrt/v1.2.1.so
39d01912e67027de32c527be04684bf813e2a49c2d09ab8f6bcf47b34a43789d ./librkllmrt/v1.2.2.so
bbcf28a8666b9fbf7361d6aad892b957920f6ea92400c074899b48f4c5b2c96f ./librkllmrt/v1.2.3.so
I see nothing suspicious neither in rkllama logs nor in dmesg. However, I can see that /sys/kernel/debug/rknpu/load shows ~70% load on all three cores while generating characters.