-
Notifications
You must be signed in to change notification settings - Fork 169
feat: Add support for using NPU on linux. #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Only works with RyzenSW 1.6.1 and NPU models from https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet. Signed-off-by: X-Ryl669 <boite.pour.spam@gmail.com>
|
Thank you for submitting the code. I am indeed able to run the large model of |
|
When I've tried the new models (those with I've never heard of IIRC, it also fails on external file (the model weights) so I don't know it it will pass if I change the allocator with |
|
The friend I mentioned seems to have successfully run the |
|
The linux support on RyzenAI SW is still in early stages. I would recommend waiting for the next official release with genai_configs for linux that will not include any directml components. |
|
If I understand correctly, the hardware is the same, so the genai_config stuff that has specific windows stuff can be mapped to linux counterparts. I did that for the custom_ops library and it works, so it's likely it'll work too for allocators and weights. We are just missing the update in the (currently binary only) onnx runtime's model so it can handle the new schema in the For a user perspective, I would strongly recommend against doing 2 versions of genai_config.json's models (one for linux, one for windows) The current state of AI on AMD is already such a mess that adding another incompatibility layer is just a nightmare. It's already impossible to understand all the details for delivered models (what all the acronyms in the model name stands for), so if you need to double that for a Make the currently delivered windows' model work on linux, by live patching the model config instead (or better, don't put windows specific stuff at first in the model ;-) |
|
I followed the steps but for some reason it says this model is not compatible The following RyzenAI models are incompatible with RyzenAI 1.6 and can be safely deleted: After deleting, you can re-download compatible Ryzen AI 1.6 models from the OGA NPU and OGA Hybrid tabs. amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix |
|
Are you using |
Yes, I just realized it is working when I type something in the box and click send, but its slow. |
|
Yes, same for me. It doesn't register the model in the list and so the interface is confused. It is slow since it's not using an hybrid approach, so the NPU processing power isn't added to the GPU/CPU processing power. AMD needs to fix their code to allow hybrid processing in their genai models on Linux. |
|
But that's very good we have something now 🙂 |
|
NPU support is done via a ONNX runtime library. llama is calling the library and the library is instantiating the NPU for some computations. I don't know what is supported and what is not, how the allocations are made and how the memory is handled. |
Only works with RyzenSW 1.6.1 and NPU models from
https://huggingface.co/collections/amd/ryzenai-15-llm-npu-models since RyzenSW 1.6.1's onnxruntime driver doesn't support the change in the genai_config model yet.
Description of the changes
Remove the windows specific checks (credit @wc2333)
Fix the windows-ity in the
genai_config.jsonconfiguration file so it can load on linux unattended.Example usage
pip install setuptoolspython setup.py buildpython setup.py installlemonade-server-dev pull user.Mistral-7B-Instruct-v0.3-NPU --checkpoint amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix --recipe oga-npu)lemonade-server-dev run user.Mistral-7B-Instruct-v0.3-NPU)You can use any model from the huggingface page above: they all work.
The performance isn't that great, since none of the hybrid model works however.