-
Notifications
You must be signed in to change notification settings - Fork 148
Add IHV Qwen2.5-1.5B-Instruct NPU test models to azureml registry #4379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
extra_config: model.yaml | ||
spec: spec.yaml | ||
type: model | ||
categories: ["Local"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel NPUs. This model uses post-training quantization. | ||
|
||
# Model Description | ||
- **Developed by:** Microsoft | ||
- **Model type:** ONNX | ||
- **License:** apache-2.0 | ||
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on Intel NPUs. | ||
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model. | ||
|
||
# Base Model Information | ||
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
path: | ||
container_name: models | ||
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-openvino-npu | ||
storage_name: automlcesdkdataresources | ||
type: azureblob | ||
publish: | ||
description: description.md | ||
type: custom_model |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,32 @@ | ||||||
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json | ||||||
name: qwen2.5-1.5b-instruct-openvino-npu | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be best to add test to model name. If we publish without test in the name, this will publish as v1 of the model, which would be good to avoid as we're working out IHV model integration details. |
||||||
version: 1 | ||||||
path: ./ | ||||||
tags: | ||||||
foundryLocal: "test" | ||||||
license: "apache-2.0" | ||||||
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>." | ||||||
author: Microsoft | ||||||
inputModalities: "text" | ||||||
outputModalities: "text" | ||||||
task: chat-completion | ||||||
maxOutputTokens: 2048 | ||||||
alias: qwen2.5-1.5b | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is worth double checking logic in Foundry Local / Neutron - I can't remember whether including an |
||||||
directoryPath: qwen2.5-1.5b-instruct-openvino-npu | ||||||
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Didn't have this in my original instructions b/c promptTemplate was required at the point that I wrote them, but promptTemplate should no longer be needed here. Leaving this tag out will automatically apply chat template from genai_config.json file. |
||||||
supportsToolCalling: "" | ||||||
toolCallStart: "<tool_call>" | ||||||
toolCallEnd: "</tool_call>" | ||||||
toolRegisterStart: "<tools>" | ||||||
toolRegisterEnd: "</tools>" | ||||||
type: custom_model | ||||||
variantInfo: | ||||||
parents: | ||||||
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1 | ||||||
variantMetadata: | ||||||
modelType: 'ONNX' | ||||||
quantization: ['PTQ'] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in our thread, |
||||||
device: 'npu' | ||||||
executionProvider: 'OpenVINOExecutionProvider' | ||||||
fileSizeBytes: 864 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I gave you inaccurate info about calculating this field based on model sizes! Took a closer look, and the OpenVINO model also includes .xml files, which (I think) contribute to the So sum of all .bin and .xml files is 2.09 MiB + 9.36 KiB + 5.33 MiB + 24.55 KiB + 867.71 MiB + 2.21 MiB = 919,992,391 bytes. |
||||||
vRamFootprintBytes: 917643791 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also gave you the wrong info about calculating this field - this would be sum of previous field ( |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
extra_config: model.yaml | ||
spec: spec.yaml | ||
type: model | ||
categories: ["Local"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Qualcomm NPUs. This model uses post-training quantization. | ||
|
||
# Model Description | ||
- **Developed by:** Microsoft | ||
- **Model type:** ONNX | ||
- **License:** apache-2.0 | ||
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on Qualcomm NPUs. | ||
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model. | ||
|
||
# Base Model Information | ||
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,8 @@ | ||||||
path: | ||||||
container_name: models | ||||||
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-qnn-npu | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to my comment above for OpenVINO model - I think you just need the files w/in the |
||||||
storage_name: automlcesdkdataresources | ||||||
type: azureblob | ||||||
publish: | ||||||
description: description.md | ||||||
type: custom_model |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,32 @@ | ||||||
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json | ||||||
name: qwen2.5-1.5b-instruct-qnn-npu | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
version: 1 | ||||||
path: ./ | ||||||
tags: | ||||||
foundryLocal: "test" | ||||||
license: "apache-2.0" | ||||||
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>." | ||||||
author: Microsoft | ||||||
inputModalities: "text" | ||||||
outputModalities: "text" | ||||||
task: chat-completion | ||||||
maxOutputTokens: 2048 | ||||||
alias: qwen2.5-1.5b | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||||||
directoryPath: qwen2.5-1.5b-instruct-qnn-npu | ||||||
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
supportsToolCalling: "" | ||||||
toolCallStart: "<tool_call>" | ||||||
toolCallEnd: "</tool_call>" | ||||||
toolRegisterStart: "<tools>" | ||||||
toolRegisterEnd: "</tools>" | ||||||
type: custom_model | ||||||
variantInfo: | ||||||
parents: | ||||||
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1 | ||||||
variantMetadata: | ||||||
modelType: 'ONNX' | ||||||
quantization: ['PTQ'] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
device: 'npu' | ||||||
executionProvider: 'QNNExecutionProvider' | ||||||
fileSizeBytes: 325137751 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I inverted the guidance I gave you for calculating this. The |
||||||
vRamFootprintBytes: 998877199 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got a slightly different value - I did sum of |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
extra_config: model.yaml | ||
spec: spec.yaml | ||
type: model | ||
categories: ["Local"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on AMD NPUs. This model uses post-training quantization. | ||
|
||
# Model Description | ||
- **Developed by:** Microsoft | ||
- **Model type:** ONNX | ||
- **License:** apache-2.0 | ||
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on AMD NPUs. | ||
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model. | ||
|
||
# Base Model Information | ||
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,9 @@ | ||||||
path: | ||||||
container_name: models | ||||||
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-vitis-npu | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||||||
|
||||||
storage_name: automlcesdkdataresources | ||||||
type: azureblob | ||||||
publish: | ||||||
description: description.md | ||||||
type: custom_model |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,32 @@ | ||||||
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json | ||||||
name: qwen2.5-1.5b-instruct-vitis-npu | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
version: 1 | ||||||
path: ./ | ||||||
tags: | ||||||
foundryLocal: "test" | ||||||
license: "apache-2.0" | ||||||
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>." | ||||||
author: Microsoft | ||||||
inputModalities: "text" | ||||||
outputModalities: "text" | ||||||
task: chat-completion | ||||||
maxOutputTokens: 2048 | ||||||
alias: qwen2.5-1.5b | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
directoryPath: qwen2.5-1.5b-instruct-vitis-npu | ||||||
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
supportsToolCalling: "" | ||||||
toolCallStart: "<tool_call>" | ||||||
toolCallEnd: "</tool_call>" | ||||||
toolRegisterStart: "<tools>" | ||||||
toolRegisterEnd: "</tools>" | ||||||
type: custom_model | ||||||
variantInfo: | ||||||
parents: | ||||||
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1 | ||||||
variantMetadata: | ||||||
modelType: 'ONNX' | ||||||
quantization: ['PTQ'] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
device: 'npu' | ||||||
executionProvider: 'VitisAIExecutionProvider' | ||||||
fileSizeBytes: 293141524 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, this is the size of the .onnx.data file, which is 674,853,027 bytes. |
||||||
vRamFootprintBytes: 967991316 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got something slightly different when I added up the sum of .onnx.data file and all the .onnx files: 674,853,027.84 bytes + 693.54 KiB + 139.1 MiB + 693.54 KiB + 139.1 MiB = 967,987,240.96 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether or not this is a valid suggestion will actually depend on which files you uploaded to cache for this model when testing manually through Foundry Local.
I'm fairly certain that you'll only need the files within the

model
directory to run this model, but did you include additional files when manually adding the model to your Foundry Local cache and running from there? i.e., Did you include the footprints.json, model_config.json, etc. files pictured here under themodel
directory when running from your cache?