feat: Add support for using NPU on linux. #727

X-Ryl669 · 2025-12-14T07:16:02Z

Only works with RyzenSW 1.6.1 and NPU models from
https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet.

Description of the changes

Remove the windows specific checks (credit @wc2333)
Fix the windows-ity in the genai_config.json configuration file so it can load on linux unattended.

Example usage

Install xrt for linux (use your preferred method)
Install RyzenAI SW version 1.6.1 from AMD's site (not github, the open source version is lacking)
Create the python3.10 environment from the extracted archive above
Pull this branch in another folder (don't create a new virtual environment for lemonade, use the one from RyzenAI SW 1.6.1)
Run pip install setuptools
Run python setup.py build
Run python setup.py install
Download a model (for example: lemonade-server-dev pull user.Mistral-7B-Instruct-v0.3-NPU --checkpoint amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix --recipe oga-npu)
Run the pulled model (for example: lemonade-server-dev run user.Mistral-7B-Instruct-v0.3-NPU)

You can use any model from the huggingface page above: they all work.
The performance isn't that great, since none of the hybrid model works however.

Only works with RyzenSW 1.6.1 and NPU models from https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet. Signed-off-by: X-Ryl669 <boite.pour.spam@gmail.com>

wc2333 · 2025-12-15T06:45:35Z

Thank you for submitting the code. I am indeed able to run the large model of amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix according to the modified code you provided. However, it seems that other models such as amd/DeepSeek-R1-Distill-Qwen-7B-onnx-ryzenai-npu, etc., which contain "custom_allocator": "ryzen_mm"cannot be solved after deletion. How can I modify this part to make it run normally? Thank you.

X-Ryl669 · 2025-12-15T06:52:24Z

When I've tried the new models (those with "custom_allocator"), it failed because it contained directx_xrt or something like this, so I did not know what to replace to in linux. Also, the genai_config.json schema does not fit with linux's onnx runtime expected model. I thought it was because linux's model was outdated and since it's only delivered as a binary (in RyzenSW 1.6.1), I could only wait for AMD to release an updated model.

I've never heard of ryzen_mm, you're teaching me new things!

IIRC, it also fails on external file (the model weights) so I don't know it it will pass if I change the allocator with ryzen_mm.

wc2333 · 2025-12-15T06:57:56Z

The friend I mentioned seems to have successfully run the Phi-3.5-mini-instruct-onnx-ryzenai-npu. I'll ask him later how he implemented it.Thank you！

ramkrishna2910 · 2025-12-19T00:55:00Z

The linux support on RyzenAI SW is still in early stages. I would recommend waiting for the next official release with genai_configs for linux that will not include any directml components.

X-Ryl669 · 2025-12-19T06:44:03Z

If I understand correctly, the hardware is the same, so the genai_config stuff that has specific windows stuff can be mapped to linux counterparts. I did that for the custom_ops library and it works, so it's likely it'll work too for allocators and weights. We are just missing the update in the (currently binary only) onnx runtime's model so it can handle the new schema in the genai_config.json file. Are you going to release RyzenSW as open source or do we still need to wait for a closed source RyzenSW with the update?

For a user perspective, I would strongly recommend against doing 2 versions of genai_config.json's models (one for linux, one for windows)

The current state of AI on AMD is already such a mess that adding another incompatibility layer is just a nightmare. It's already impossible to understand all the details for delivered models (what all the acronyms in the model name stands for), so if you need to double that for a _lnx version, you're asking for trouble.

Make the currently delivered windows' model work on linux, by live patching the model config instead (or better, don't put windows specific stuff at first in the model ;-)

mikealanni · 2026-01-19T09:23:17Z

I followed the steps but for some reason it says this model is not compatible

The following RyzenAI models are incompatible with RyzenAI 1.6 and can be safely deleted:

After deleting, you can re-download compatible Ryzen AI 1.6 models from the OGA NPU and OGA Hybrid tabs.

amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
4.9 GB
amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
7.8 GB
Total space to free: 12.7 GB

X-Ryl669 · 2026-01-19T10:10:11Z

Are you using lemonade-server-dev ? amd-xdna driver ?

mikealanni · 2026-01-19T16:02:37Z

Are you using lemonade-server-dev ? amd-xdna driver ?

Yes, I just realized it is working when I type something in the box and click send, but its slow.
Also I don't have on the right side OGA npu or hybrid and the model running not showing in the drop list of the model. weird. Am I missing something

X-Ryl669 · 2026-01-19T19:05:38Z

Yes, same for me. It doesn't register the model in the list and so the interface is confused. It is slow since it's not using an hybrid approach, so the NPU processing power isn't added to the GPU/CPU processing power. AMD needs to fix their code to allow hybrid processing in their genai models on Linux.

mikealanni · 2026-01-19T19:19:39Z

But that's very good we have something now 🙂
As I know lemonade uses llama.cpp, does this mean llama.cpp is doing AMD NPU?

X-Ryl669 · 2026-01-20T07:44:17Z

NPU support is done via a ONNX runtime library. llama is calling the library and the library is instantiating the NPU for some computations. I don't know what is supported and what is not, how the allocations are made and how the memory is handled.
I'm guessing from the genai config JSON that in order to get the best performance, everything should be set up correctly (to avoid copying things around, and dispatching the functions to the best "engine") and that's what's missing for now.

X-Ryl669 mentioned this pull request Dec 14, 2025

RyzenAI 1.6.1 - Linux Feedback amd/RyzenAI-SW#313

Open

jeremyfowers requested a review from ramkrishna2910 December 18, 2025 21:47

jeremyfowers assigned X-Ryl669 Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for using NPU on linux. #727

feat: Add support for using NPU on linux. #727

Uh oh!

X-Ryl669 commented Dec 14, 2025

Uh oh!

wc2333 commented Dec 15, 2025

Uh oh!

X-Ryl669 commented Dec 15, 2025

Uh oh!

wc2333 commented Dec 15, 2025

Uh oh!

ramkrishna2910 commented Dec 19, 2025

Uh oh!

X-Ryl669 commented Dec 19, 2025

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 19, 2026

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 19, 2026

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add support for using NPU on linux. #727

Are you sure you want to change the base?

feat: Add support for using NPU on linux. #727

Uh oh!

Conversation

X-Ryl669 commented Dec 14, 2025

Description of the changes

Example usage

Uh oh!

wc2333 commented Dec 15, 2025

Uh oh!

X-Ryl669 commented Dec 15, 2025

Uh oh!

wc2333 commented Dec 15, 2025

Uh oh!

ramkrishna2910 commented Dec 19, 2025

Uh oh!

X-Ryl669 commented Dec 19, 2025

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 19, 2026

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 19, 2026

Uh oh!

mikealanni commented Jan 19, 2026

Uh oh!

X-Ryl669 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants