Skip to content

Conversation

@X-Ryl669
Copy link

Only works with RyzenSW 1.6.1 and NPU models from
https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet.

Description of the changes

Remove the windows specific checks (credit @wc2333)
Fix the windows-ity in the genai_config.json configuration file so it can load on linux unattended.

Example usage

  1. Install xrt for linux (use your preferred method)
  2. Install RyzenAI SW version 1.6.1 from AMD's site (not github, the open source version is lacking)
  3. Create the python3.10 environment from the extracted archive above
  4. Pull this branch in another folder (don't create a new virtual environment for lemonade, use the one from RyzenAI SW 1.6.1)
  5. Run pip install setuptools
  6. Run python setup.py build
  7. Run python setup.py install
  8. Download a model (for example: lemonade-server-dev pull user.Mistral-7B-Instruct-v0.3-NPU --checkpoint amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix --recipe oga-npu)
  9. Run the pulled model (for example: lemonade-server-dev run user.Mistral-7B-Instruct-v0.3-NPU)

You can use any model from the huggingface page above: they all work.
The performance isn't that great, since none of the hybrid model works however.

Only works with RyzenSW 1.6.1 and NPU models from
https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models
since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet.

Signed-off-by: X-Ryl669 <boite.pour.spam@gmail.com>
@wc2333
Copy link

wc2333 commented Dec 15, 2025

Thank you for submitting the code. I am indeed able to run the large model of amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix according to the modified code you provided. However, it seems that other models such as amd/DeepSeek-R1-Distill-Qwen-7B-onnx-ryzenai-npu, etc., which contain "custom_allocator": "ryzen_mm"cannot be solved after deletion. How can I modify this part to make it run normally? Thank you.

@X-Ryl669
Copy link
Author

When I've tried the new models (those with "custom_allocator"), it failed because it contained directx_xrt or something like this, so I did not know what to replace to in linux. Also, the genai_config.json schema does not fit with linux's onnx runtime expected model. I thought it was because linux's model was outdated and since it's only delivered as a binary (in RyzenSW 1.6.1), I could only wait for AMD to release an updated model.

I've never heard of ryzen_mm, you're teaching me new things!

IIRC, it also fails on external file (the model weights) so I don't know it it will pass if I change the allocator with ryzen_mm.

@wc2333
Copy link

wc2333 commented Dec 15, 2025

The friend I mentioned seems to have successfully run the Phi-3.5-mini-instruct-onnx-ryzenai-npu. I'll ask him later how he implemented it.Thank you!

@ramkrishna2910
Copy link
Contributor

The linux support on RyzenAI SW is still in early stages. I would recommend waiting for the next official release with genai_configs for linux that will not include any directml components.

@X-Ryl669
Copy link
Author

If I understand correctly, the hardware is the same, so the genai_config stuff that has specific windows stuff can be mapped to linux counterparts. I did that for the custom_ops library and it works, so it's likely it'll work too for allocators and weights. We are just missing the update in the (currently binary only) onnx runtime's model so it can handle the new schema in the genai_config.json file. Are you going to release RyzenSW as open source or do we still need to wait for a closed source RyzenSW with the update?

For a user perspective, I would strongly recommend against doing 2 versions of genai_config.json's models (one for linux, one for windows)

The current state of AI on AMD is already such a mess that adding another incompatibility layer is just a nightmare. It's already impossible to understand all the details for delivered models (what all the acronyms in the model name stands for), so if you need to double that for a _lnx version, you're asking for trouble.

Make the currently delivered windows' model work on linux, by live patching the model config instead (or better, don't put windows specific stuff at first in the model ;-)

@mikealanni
Copy link

I followed the steps but for some reason it says this model is not compatible

The following RyzenAI models are incompatible with RyzenAI 1.6 and can be safely deleted:

After deleting, you can re-download compatible Ryzen AI 1.6 models from the OGA NPU and OGA Hybrid tabs.

amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
4.9 GB
amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
7.8 GB
Total space to free: 12.7 GB

@X-Ryl669
Copy link
Author

Are you using lemonade-server-dev ? amd-xdna driver ?

@mikealanni
Copy link

Are you using lemonade-server-dev ? amd-xdna driver ?

Yes, I just realized it is working when I type something in the box and click send, but its slow.
Also I don't have on the right side OGA npu or hybrid and the model running not showing in the drop list of the model. weird. Am I missing something

@X-Ryl669
Copy link
Author

Yes, same for me. It doesn't register the model in the list and so the interface is confused. It is slow since it's not using an hybrid approach, so the NPU processing power isn't added to the GPU/CPU processing power. AMD needs to fix their code to allow hybrid processing in their genai models on Linux.

@mikealanni
Copy link

But that's very good we have something now 🙂
As I know lemonade uses llama.cpp, does this mean llama.cpp is doing AMD NPU?

@X-Ryl669
Copy link
Author

NPU support is done via a ONNX runtime library. llama is calling the library and the library is instantiating the NPU for some computations. I don't know what is supported and what is not, how the allocations are made and how the memory is handled.
I'm guessing from the genai config JSON that in order to get the best performance, everything should be set up correctly (to avoid copying things around, and dispatching the functions to the best "engine") and that's what's missing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants