Garbage when using librkllmrt v1.2.3

I've been using an outdated docker image `ghcr.io/notpunchnox/rkllama:main` and everything went smoothly.

However, after I've built the image from this repo, I see that LLM starts repeats a single character after just a few successful tokens:
```
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)
```

This may be a different character (I've seen `!` on some models) and usually comes after a comma. I'm using [DeepSeek-R1-Distill-Qwen-1.5B-rk3588-1.2.0](https://huggingface.co/imkebe/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-1.2.0/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm), however the same issue happens with other models too.

So I've decided to iterate over versions `librkllmrt.so` (since I've seen rkllama working successfuly) and started running container like this:
```
podman run --privileged --rm -it --name rkllama \
 -v ./models/:/opt/rkllama/models:ro \
 -v ./librkllmrt/v1.2.3.so:/usr/local/lib/python3.12/site-packages/rkllama/lib/librkllmrt.so:ro \
 rkllama
```

This way I can quickly try different versions of the `librkllmrt.so`.

This way I found out that this issue happens only with **v1.2.3**:

<details><summary>v1.2.1</summary>


Server:
```
I rkllm: rkllm-runtime version: 1.2.1, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:52:58,362 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
```

Chat:
```
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.
</think>

Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask.
```


</details> 

<details><summary>v1.2.2</summary>


Server:
```
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:00,209 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
```

Chat:
```
You: Hello
Assistant: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.
</think>

Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help make your day easier and more enjoyable.
```


</details> 

<details><summary>v1.2.3</summary>


Server:
```
I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from /opt/rkllama/models/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0/DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0.rkllm
I rkllm: rkllm-toolkit version: 1.2.0, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
2026-01-03 00:54:54,760 - rkllama.worker - INFO - Worker for model DeepSeek-R1-Distill-Qwen-1.5B-rk3588-w8a8-opt-1-hybrid-ratio-1.0 created and running...
```

Chat:
```
You: Hello.
Assistant: Hello! How can I assist you today? If you have any questions or need help with something,&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& (continues)
```


</details> 

All versions of the `librkllmrt` are from different revisions from this repo.
```
$ sha256sum ./librkllmrt/*
a7e6f87f07bbb08058cad4871cc74e8069a054fe4f6259b43c29a4738b0affdd ./librkllmrt/v1.2.1.so
39d01912e67027de32c527be04684bf813e2a49c2d09ab8f6bcf47b34a43789d ./librkllmrt/v1.2.2.so
bbcf28a8666b9fbf7361d6aad892b957920f6ea92400c074899b48f4c5b2c96f ./librkllmrt/v1.2.3.so
```

I see nothing suspicious neither in rkllama logs nor in dmesg. However, I can see that `/sys/kernel/debug/rknpu/load` shows ~70% load on all three cores while generating characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage when using librkllmrt v1.2.3 #102

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Garbage when using librkllmrt v1.2.3 #102

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions