Skip to content

Add IHV Qwen2.5-1.5B-Instruct NPU test models to azureml registry #4379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
extra_config: model.yaml
spec: spec.yaml
type: model
categories: ["Local"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel NPUs. This model uses post-training quantization.

# Model Description
- **Developed by:** Microsoft
- **Model type:** ONNX
- **License:** apache-2.0
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on Intel NPUs.
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.

# Base Model Information
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
path:
container_name: models
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-openvino-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-openvino-npu
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-openvino-npu/model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not this is a valid suggestion will actually depend on which files you uploaded to cache for this model when testing manually through Foundry Local.

I'm fairly certain that you'll only need the files within the model directory to run this model, but did you include additional files when manually adding the model to your Foundry Local cache and running from there? i.e., Did you include the footprints.json, model_config.json, etc. files pictured here under the model directory when running from your cache?
image

storage_name: automlcesdkdataresources
type: azureblob
publish:
description: description.md
type: custom_model
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: qwen2.5-1.5b-instruct-openvino-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: qwen2.5-1.5b-instruct-openvino-npu
name: qwen2.5-1.5b-instruct-test-openvino-npu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be best to add test to model name. If we publish without test in the name, this will publish as v1 of the model, which would be good to avoid as we're working out IHV model integration details.

version: 1
path: ./
tags:
foundryLocal: "test"
license: "apache-2.0"
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>."
author: Microsoft
inputModalities: "text"
outputModalities: "text"
task: chat-completion
maxOutputTokens: 2048
alias: qwen2.5-1.5b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alias: qwen2.5-1.5b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth double checking logic in Foundry Local / Neutron - I can't remember whether including an alias field is a strict requirement, but we wouldn't want to include it in our case. Purpose of alias field is to make discoverability easier for our users, but that doesn't matter when we're doing testing and know exactly what the model name is.

directoryPath: qwen2.5-1.5b-instruct-openvino-npu
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have this in my original instructions b/c promptTemplate was required at the point that I wrote them, but promptTemplate should no longer be needed here. Leaving this tag out will automatically apply chat template from genai_config.json file.

supportsToolCalling: ""
toolCallStart: "<tool_call>"
toolCallEnd: "</tool_call>"
toolRegisterStart: "<tools>"
toolRegisterEnd: "</tools>"
type: custom_model
variantInfo:
parents:
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1
variantMetadata:
modelType: 'ONNX'
quantization: ['PTQ']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
quantization: ['PTQ']
quantization: ['GPTQ']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in our thread, PTQ is not a supported value, so best to change to GPTQ for the time being + then we can work with Foundry Catalog folks to update the list of supported quantization types later. This field should be informational only, so not a big deal if the value is technically incorrect.

device: 'npu'
executionProvider: 'OpenVINOExecutionProvider'
fileSizeBytes: 864
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileSizeBytes: 864
fileSizeBytes: 919992391

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I gave you inaccurate info about calculating this field based on model sizes! Took a closer look, and the OpenVINO model also includes .xml files, which (I think) contribute to the fileSizeBytes.

So sum of all .bin and .xml files is 2.09 MiB + 9.36 KiB + 5.33 MiB + 24.55 KiB + 867.71 MiB + 2.21 MiB = 919,992,391 bytes.

vRamFootprintBytes: 917643791
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vRamFootprintBytes: 917643791
vRamFootprintBytes: 919993255

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also gave you the wrong info about calculating this field - this would be sum of previous field (fileSizeBytes) and the .onnx file, so 919,992,391.68 bytes + 864 bytes = 919,993,255.68 bytes.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
extra_config: model.yaml
spec: spec.yaml
type: model
categories: ["Local"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Qualcomm NPUs. This model uses post-training quantization.

# Model Description
- **Developed by:** Microsoft
- **Model type:** ONNX
- **License:** apache-2.0
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on Qualcomm NPUs.
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.

# Base Model Information
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
path:
container_name: models
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-qnn-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-qnn-npu
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-qnn-npu/model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above for OpenVINO model - I think you just need the files w/in the model directory.

storage_name: automlcesdkdataresources
type: azureblob
publish:
description: description.md
type: custom_model
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: qwen2.5-1.5b-instruct-qnn-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: qwen2.5-1.5b-instruct-qnn-npu
name: qwen2.5-1.5b-instruct-test-qnn-npu

version: 1
path: ./
tags:
foundryLocal: "test"
license: "apache-2.0"
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>."
author: Microsoft
inputModalities: "text"
outputModalities: "text"
task: chat-completion
maxOutputTokens: 2048
alias: qwen2.5-1.5b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alias: qwen2.5-1.5b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

directoryPath: qwen2.5-1.5b-instruct-qnn-npu
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"

supportsToolCalling: ""
toolCallStart: "<tool_call>"
toolCallEnd: "</tool_call>"
toolRegisterStart: "<tools>"
toolRegisterEnd: "</tools>"
type: custom_model
variantInfo:
parents:
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1
variantMetadata:
modelType: 'ONNX'
quantization: ['PTQ']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
quantization: ['PTQ']
quantization: ['GPTQ']

device: 'npu'
executionProvider: 'QNNExecutionProvider'
fileSizeBytes: 325137751
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileSizeBytes: 325137751
fileSizeBytes: 673731051

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I inverted the guidance I gave you for calculating this. The fileSizeBytes should be the sum of all .bin files, which would be 160.63 MiB × 4 = 673,731,051.52 bytes in this case.

vRamFootprintBytes: 998877199
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vRamFootprintBytes: 998877199
vRamFootprintBytes: 999104184

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a slightly different value - I did sum of fileSizeBytes and the 4 .onnx files, which is 673,731,051.52 bytes + 16.05 MiB + 139.1 MiB + 16.05 MiB + 139.1 MiB = 999,104,184.32 bytes.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
extra_config: model.yaml
spec: spec.yaml
type: model
categories: ["Local"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on AMD NPUs. This model uses post-training quantization.

# Model Description
- **Developed by:** Microsoft
- **Model type:** ONNX
- **License:** apache-2.0
- **Model Description:** This is a conversion of the Qwen2.5-1.5B-Instruct for local inference on AMD NPUs.
- **Disclaimer:** Model is only an optimization of the base model, any risk associated with the model is the responsibility of the user of the model. Please verify and test for your scenarios. There may be a slight difference in output from the base model with the optimizations applied. Note that optimizations applied are distinct from fine tuning and thus do not alter the intended uses or capabilities of the model.

# Base Model Information
See Hugging Face model [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) for details.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
path:
container_name: models
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-vitis-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-vitis-npu
container_path: foundrylocal/fl-perf-improvements/qwen2.5-1.5b-instruct/onnx/npu/qwen2.5-1.5b-instruct-vitis-npu/model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


storage_name: automlcesdkdataresources
type: azureblob
publish:
description: description.md
type: custom_model
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: qwen2.5-1.5b-instruct-vitis-npu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: qwen2.5-1.5b-instruct-vitis-npu
name: qwen2.5-1.5b-instruct-test-vitis-npu

version: 1
path: ./
tags:
foundryLocal: "test"
license: "apache-2.0"
licenseDescription: "This model is provided under the License Terms available at <https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE>."
author: Microsoft
inputModalities: "text"
outputModalities: "text"
task: chat-completion
maxOutputTokens: 2048
alias: qwen2.5-1.5b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alias: qwen2.5-1.5b

directoryPath: qwen2.5-1.5b-instruct-vitis-npu
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
promptTemplate: "{\"system\": \"<|im_start|>system\\n{Content}<|im_end|>\", \"user\": \"<|im_start|>user\\n{Content}<|im_end|>\", \"assistant\": \"<|im_start|>assistant\\n{Content}<|im_end|>\", \"prompt\": \"<|im_start|>user\\n{Content}<|im_end|>\\n<|im_start|>assistant\"}"

supportsToolCalling: ""
toolCallStart: "<tool_call>"
toolCallEnd: "</tool_call>"
toolRegisterStart: "<tools>"
toolRegisterEnd: "</tools>"
type: custom_model
variantInfo:
parents:
- assetId: azureml://registries/azureml/models/qwen2.5-1.5b-instruct/versions/1
variantMetadata:
modelType: 'ONNX'
quantization: ['PTQ']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
quantization: ['PTQ']
quantization: ['GPTQ']

device: 'npu'
executionProvider: 'VitisAIExecutionProvider'
fileSizeBytes: 293141524
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileSizeBytes: 293141524
fileSizeBytes: 674853027

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, this is the size of the .onnx.data file, which is 674,853,027 bytes.

vRamFootprintBytes: 967991316
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vRamFootprintBytes: 967991316
vRamFootprintBytes: 967987240

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got something slightly different when I added up the sum of .onnx.data file and all the .onnx files: 674,853,027.84 bytes + 693.54 KiB + 139.1 MiB + 693.54 KiB + 139.1 MiB = 967,987,240.96 bytes.

Loading