You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-1Lines changed: 26 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# RKLLama: LLM Server and Client for Rockchip 3588/3576
2
2
3
-
### [Version: 0.0.53](#New-Version)
3
+
### [Version: 0.0.54](#New-Version)
4
4
5
5
Video demo ( version 0.0.1 ):
6
6
@@ -52,6 +52,7 @@ A server to run and interact with LLM models optimized for Rockchip RK3588(S) an
52
52
*`/v1/embeddings`
53
53
*`/v1/images/generations`
54
54
*`/v1/audio/speech`
55
+
*`/v1/audio/transcriptions`
55
56
-**Tool/Function Calling** - Complete support for tool calls with multiple LLM formats (Qwen, Llama 3.2+, others).
56
57
-**Pull models directly from Huggingface.**
57
58
-**Include a API REST with documentation.**
@@ -70,6 +71,7 @@ A server to run and interact with LLM models optimized for Rockchip RK3588(S) an
70
71
-**Multimodal Suport** - Use Qwen2VL/Qwen2.5VL/Qwen3VL/MiniCPMV4/MiniCPMV4.5/InternVL3.5 vision models to ask questions about images (base64, local file or URL image address). More than one image in the same request is allowed.
71
72
-**Image Generation** - Generate images with OpenAI Image generation endpoint using model LCM Stable Diffusion 1.5 RKNN models.
72
73
-**Text to Speech (TTS)** - Generate speech with OpenAI Audio Speech endpoint using models for Piper TTS running encoder with ONNX and decoder with RKNN.
74
+
-**Speech to Text (STT)** - Generate transcriptions with OpenAI Audio Transcriptions endpoint using models for omniASR-CTC running the model with RKNN.
73
75
74
76
75
77
## Documentation
@@ -408,6 +410,29 @@ Example directory structure for multimodal:
408
410
5. Execute the script export_encoder_decoder.py to export the encoder and decoder IN ONNX format.
409
411
6. Execute the script export_rknn.py to export the decoder in RKNN format (you must uhave installed the rknn-toolkit version 2.3.2).
1. Download a model from https://huggingface.co/danielferr85/omniASR-ctc-rknn from Hugging Face.
416
+
2. Create a folder inside the models directory in RKLLAMA for the model, For example: **omniasr-ctc:300m**
417
+
3. Copy the model (.rknn) and vocabulary (.txt) file from the choosed model to the new directory model created in RKLLMA.
418
+
4. The structure of the model **MUST** be like this:
419
+
420
+
```
421
+
~/RKLLAMA/models/
422
+
└── omniasr-ctc:300m
423
+
└── model.rknn
424
+
└── vocab.txt
425
+
426
+
```
427
+
428
+
5. Done! You are ready to test the OpenAI endpoint /v1/audio/transcriptions to generate transcriptions. You can add it to OpenWebUI in the Audio section for STT.
429
+
430
+
**IMPORTANT**
431
+
- The model can have any name but must ended with extension .rknn
432
+
- The vocabulary of the model can have any name but must ended with extension .txt
433
+
- You must use rknn-toolkit 2.3.2 for RKNN conversion because is the one used by RKLLAMA
434
+
435
+
411
436
## Configuration
412
437
413
438
RKLLAMA uses a flexible configuration system that loads settings from multiple sources in a priority order:
0 commit comments