[MIGraphX EP] Add migraphx ep save load compiles #20643

TedThemistokleous · 2024-05-10T15:45:02Z

Description

Adds the ability for MIGraphX EP to save off or load compiled models to save time between inferences.

Via Command line

User should be able to set the save ability with
ORT_MIGRAPHX_SAVE_COMPILED_MODEL
ORT_MIGRAPHX_SAVE_COMPILE_PATH

User should be able to set the load ability with
ORT_MIGRAPHX_LOAD_COMPILED_MODEL
ORT_MIGRAPHX_LOAD_COMPILE_PATH

via Onnxruntime API

migx_save_compiled_model
migx_save_model_name
migx_load_compiled_model
migx_load_model_name

Motivation and Context

The motivation for this is to leverage MIGraphX's existing API to save/load models after our compile step of graph optimization. For larger models or models which were compiled with additional tuning steps, this saves time after first compile and inference run, and thus speeds up the user experience in order to encourage development.

Fixes GPU faults Seen when running Mixed Precision inferences workloads with Bert v1.1 (fp16 + int8 Quant of Conv + MatMul) Was hitting an edge case with mixed precision where the input parameters were not being populated and using uninitizlied values for the parameters which would "work" silently as no issue in inference arise. For bert though, segment_ids is pushed through a gather onnx operator which uses these as an index. Using uninitialized memory made this error such that it was non obvious why we were getting failures between our runs and we saw the issue intermittently between machines/cards/etc. Fixes here are as follows to the MIGraphX Execution Provider -Fp16 quantization after int8 -Additional debug logging for workflow of loading/quantization - Set input/output parameters as seperate run prior to int8 calibration - Set all dynamic data as input parameters for int8 static calibration to be performed with MIGraphX Without these changes models will fail to copy input parameters on mixed precision runs when we decided to quantize as MIGraphX assumes all inputs will be used for calibration not just the input data read in from a calibration table.

Initial changes for the getting the flags into the state to save/load precompiled MIGraphX models in the EP.

Makes it easier to handle workflow if the input model needs to be recompiled or not.

TedThemistokleous · 2024-06-11T15:46:45Z

ping @cloudhan @ytaous @PeixuanZuo useful to speed up workflow when working with MIGraphX EP

Should come after this is merged: #20982

TedThemistokleous · 2024-06-11T18:41:09Z

Validated with: microsoft/onnxruntime-inference-examples#441

cloudhan · 2024-06-14T07:37:52Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

cloudhan · 2024-06-14T07:37:56Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2024-06-14T07:38:26Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-06-14T07:38:28Z

Azure Pipelines successfully started running 9 pipeline(s).

Adds the ability for MIGraphX EP to save off or load compiled models to save time between inferences. Via Command line User should be able to set the save ability with ORT_MIGRAPHX_SAVE_COMPILED_MODEL ORT_MIGRAPHX_SAVE_COMPILE_PATH User should be able to set the load ability with ORT_MIGRAPHX_LOAD_COMPILED_MODEL ORT_MIGRAPHX_LOAD_COMPILE_PATH via Onnxruntime API migx_save_compiled_model migx_save_model_name migx_load_compiled_model migx_load_model_name The motivation for this is to leverage MIGraphX's existing API to save/load models after our compile step of graph optimization. For larger models or models which were compiled with additional tuning steps, this saves time after first compile and inference run, and thus speeds up the user experience in order to encourage development. --------- Co-authored-by: Ted Themistokleous <[email protected]>

See #20643 ### Description Changes order of how we perform quantization to better support mixed precision and fixes a bug found with parameters of inputs for int8 quantization not being correctly handled. We now perform int8 quantization first on a full precision input model, before then quantizing the model to fp16 for remain ops that aren't quantized. The former case was causing us to use a low precision input which could cause larger values to be inserted than intended to the model when int8 quantization is perform. The symptom of this was a failure during quantization steps. Similar to the above input parameters were being uninitialized and resulting in similar failure during int8 quantization. GPU faults were intermittent but present as using uninitialized memory created undefined behavior when we started testing more complex models during mixed precision. ### Motivation and Context In some cases we've seen random data and/or invalid values entering into compiled onnx graphs. This is due to input parameters to the MIGraphX Graph not being set correctly when mixed precision (int8 + fp16) is used and ordering of quantization steps is causes a lower precision model to be used to perform int8 quantization. In most cases the failure is silent/intermittent. In some cases we've observed gpu faults due to out of bounds values being set. This change is required as a large input parameter to the MIGraphX graph is initialized to a large random value, and the next operator is using that for indexing, we get undefined behavior and a GPU fault.

Adds the ability for MIGraphX EP to save off or load compiled models to save time between inferences. Via Command line User should be able to set the save ability with ORT_MIGRAPHX_SAVE_COMPILED_MODEL ORT_MIGRAPHX_SAVE_COMPILE_PATH User should be able to set the load ability with ORT_MIGRAPHX_LOAD_COMPILED_MODEL ORT_MIGRAPHX_LOAD_COMPILE_PATH via Onnxruntime API migx_save_compiled_model migx_save_model_name migx_load_compiled_model migx_load_model_name The motivation for this is to leverage MIGraphX's existing API to save/load models after our compile step of graph optimization. For larger models or models which were compiled with additional tuning steps, this saves time after first compile and inference run, and thus speeds up the user experience in order to encourage development. --------- Co-authored-by: Ted Themistokleous <[email protected]>

TedThemistokleous linked an issue May 10, 2024 that may be closed by this pull request

Add cache for Onnxruntime Compiled Programs ROCm/AMDMIGraphX#3056

Closed

2 tasks

TedThemistokleous force-pushed the add_migraphx_ep_save_load_compiles branch 2 times, most recently from e803fcb to a3c457f Compare May 11, 2024 03:33

TedThemistokleous force-pushed the add_migraphx_ep_save_load_compiles branch from a3c457f to e096dec Compare June 10, 2024 16:14

Ted Themistokleous and others added 5 commits June 10, 2024 19:10

Add env flags for save/load of compiled migx models

7f313c0

Initial changes for the getting the flags into the state to save/load precompiled MIGraphX models in the EP.

Add onnxruntime API hooks for save/load of MIGraphX models

41e50d6

Use save/load in the MIGraphX EP

dced8a9

Cleanup to get things to compile before testing

ba9e209

Changes after rebase ontop of fixes for mixed parameter fixes

eaf9da7

TedThemistokleous force-pushed the add_migraphx_ep_save_load_compiles branch from e096dec to eaf9da7 Compare June 10, 2024 20:00

Cleanup save/load calls to seperate functions

ddd1b38

Makes it easier to handle workflow if the input model needs to be recompiled or not.

TedThemistokleous marked this pull request as ready for review June 11, 2024 02:20

cloudhan approved these changes Jun 17, 2024

View reviewed changes

cloudhan merged commit 11e7a1b into microsoft:main Jun 17, 2024

apwojcik mentioned this pull request Aug 1, 2025

[MIGraphX EP] Syncing AMD changes upstream #25583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MIGraphX EP] Add migraphx ep save load compiles #20643

[MIGraphX EP] Add migraphx ep save load compiles #20643

Uh oh!

TedThemistokleous commented May 10, 2024

Uh oh!

TedThemistokleous commented Jun 11, 2024

Uh oh!

TedThemistokleous commented Jun 11, 2024

Uh oh!

cloudhan commented Jun 14, 2024

Uh oh!

cloudhan commented Jun 14, 2024

Uh oh!

azure-pipelines bot commented Jun 14, 2024

Uh oh!

azure-pipelines bot commented Jun 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MIGraphX EP] Add migraphx ep save load compiles #20643

[MIGraphX EP] Add migraphx ep save load compiles #20643

Uh oh!

Conversation

TedThemistokleous commented May 10, 2024

Description

Motivation and Context

Uh oh!

TedThemistokleous commented Jun 11, 2024

Uh oh!

TedThemistokleous commented Jun 11, 2024

Uh oh!

cloudhan commented Jun 14, 2024

Uh oh!

cloudhan commented Jun 14, 2024

Uh oh!

azure-pipelines bot commented Jun 14, 2024

Uh oh!

azure-pipelines bot commented Jun 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants