Skip to content

Conversation

Jonahcb
Copy link
Contributor

@Jonahcb Jonahcb commented Sep 28, 2025

…tputs

Description

Tighten output selection in MIGraphXExecutionProvider::GetSubGraph(...) so a node’s output is included only if it has an external consumer (outside the subgraph) or is a top-level graph output.

Motivation and Context

Previously, MIGraphX could promote internal-only values (e.g., BatchNormalization auxiliary outputs) to fused graph outputs, causing downstream compile errors. This fix conforms MIGraphX subgraph output selection to the boundary-only rule used elsewhere, preventing accidental promotion of unused outputs.

Related: #26176

@snnn snnn requested a review from devang-ml September 29, 2025 20:50
@TedThemistokleous
Copy link
Contributor

Seeing ORT value errors when I run this. let me take a look

@Jonahcb
Copy link
Contributor Author

Jonahcb commented Sep 30, 2025

Seeing ORT value errors when I run this. let me take a look

Can you post the errors here, please?

@TedThemistokleous
Copy link
Contributor

TedThemistokleous commented Sep 30, 2025

2025-09-30 21:24:40.837470830 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running MGXKernel_graph_main_graph_12426062359934032490_0 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_main_graph_12426062359934032490_0_0' Status Message: the ort_value must contain a constructed tensor or sparse tensor

We're not even hitting MIGraphX compile step here. which indicates there's something occurring after this change was added. Let me take a look at this change.

@apwojcik
Copy link
Contributor

apwojcik commented Oct 1, 2025

I confirm that the PR breaks execution of almost all the models with MIGraphX EP. I am seeing errors similar to the one below.

migraphx_program_run_async: Error: /**********/AMDMIGraphX/src/program.cpp:491: operator(): Parameter not found: model/average_pooling2d/AvgPool__95:0
2025-10-01 23:42:05.359307958 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running MGXKernel_graph_tf2onnx_17804881885832813595_0 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_tf2onnx_17804881885832813595_0_0' Status Message: Failed to call function
	Error: Non-zero status code returned while running MGXKernel_graph_tf2onnx_17804881885832813595_0 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_tf2onnx_17804881885832813595_0_0' Status Message: Failed to call function

@Jonahcb
Copy link
Contributor Author

Jonahcb commented Oct 2, 2025

@TedThemistokleous @apwojcik thanks for letting me know. This is my theory as to what is causing these issues: the stricter logic for adding outputs to fused_outputs blocked some internally (internal to this EP's subgraph) produced node outputs from being placed into fused_outputs. Before this fix, these internally produced outputs would have been removed from fused_outputs while processing an internal node that consumed the output as an input. Thus, that consumed output would have cancelled out some input from fused_inputs as all node inputs are automatically added to fused_inputs and only removed if they are fed by some internally produced output. Due to some internally produced outputs no longer making it into fused_outputs, some inputs in fused_inputs are not removed (when they should be removed) and at the end of the loop over the nodes, these internal inputs are promoted to subgraph inputs. ORT has nothing to give for these incorrect subgraph inputs so we get these errors.

Now, which internally produced output nodes are no longer making it in fused_outputs? Any internally produced output that is not a graph output or externally consumed (which can be a lot).

I will finalize a solution tomorrow. Here is my plan:

1). Create a new data structure to hold the internally produced outputs
2). Only add to fused_outputs if a). consumed externally b). graph output. Add the rest of the outputs to the internally produced outputs list
3). Check against the internally produced outputs list (instead of fused_outputs) to determine whether a node input should be placed into (or removed from) fused_inputs

Although the current implementation is neat, my plan will make the code more readable.

Any feedback?

@TedThemistokleous
Copy link
Contributor

TedThemistokleous commented Oct 2, 2025

@TedThemistokleous @apwojcik thanks for letting me know. This is my theory as to what is causing these issues: the stricter logic for adding outputs to fused_outputs blocked some internally (internal to this EP's subgraph) produced node outputs from being placed into fused_outputs. Before this fix, these internally produced outputs would have been removed from fused_outputs while processing an internal node that consumed the output as an input. Thus, that consumed output would have cancelled out some input from fused_inputs as all node inputs are automatically added to fused_inputs and only removed if they are fed by some internally produced output. Due to some internally produced outputs no longer making it into fused_outputs, some inputs in fused_inputs are not removed (when they should be removed) and at the end of the loop over the nodes, these internal inputs are promoted to subgraph inputs. ORT has nothing to give for these incorrect subgraph inputs so we get these errors.

Now, which internally produced output nodes are no longer making it in fused_outputs? Any internally produced output that is not a graph output or externally consumed (which can be a lot).

I will finalize a solution tomorrow. Here is my plan:

1). Create a new data structure to hold the internally produced outputs 2). Only add to fused_outputs if a). consumed externally b). graph output. Add the rest of the outputs to the internally produced outputs list 3). Check against the internally produced outputs list (instead of fused_outputs) to determine whether a node input should be placed into (or removed from) fused_inputs

Although the current implementation is neat, my plan will make the code more readable.

Any feedback?

First off, appreciate the effort on this and contribution. You dove right in.

Keep it simple. If you know the outputs of the graph as well as the inputs any other outputs that arent in the graph output list should be pruned regardless of fusions since those are dangling outputs.

Essentially as you fuse each node, ensure there's a consumer to the output, if the next() node is null for that output path then you know its dangling. You can double check this in the final output of the graph to ensure even in the case of output layer being fused, you still have the same outputs and inputs to the entire model.

I think there's a way to massage the logic here to just exclude dangling outputs, and ensure they're filtered from the final output. That ensures that the fused graph use still gets Onnx's optimizations but none of the dangling outputs are then propagated and therefore not allocated in the later stages of this EP

@Jonahcb
Copy link
Contributor Author

Jonahcb commented Oct 2, 2025

First off, appreciate the effort on this and contribution. You dove right in.

Keep it simple. If you know the outputs of the graph as well as the inputs any other outputs that arent in the graph output list should be pruned regardless of fusions since those are dangling outputs.

Essentially as you fuse each node, ensure there's a consumer to the output, if the next() node is null for that output path then you know its dangling. You can double check this in the final output of the graph to ensure even in the case of output layer being fused, you still have the same outputs and inputs to the entire model.

I think there's a way to massage the logic here to just exclude dangling outputs, and ensure they're filtered from the final output. That ensures that the fused graph use still gets Onnx's optimizations but none of the dangling outputs are then propagated and therefore not allocated in the later stages of this EP

Thank you for the response and encouragement:)

After studying the code some more, I think there may also be a hidden bug in the current/old implementation in this branch:if (node->GetOutputEdgesCount() > node->OutputDefs().size()). The intent of this case is correct, but the if-condition doesn't capture all the subcases. It doesn't capture the case where there are not more output edges than output defs, but an output def has multiple output edges and another output def has 0 output edges. For example, if we take a node with two output defs: out_1, out_2, and two output edges: out_edge_1, out_edge_1.1, where out_edge_1 transfers out_1 to an internal consumer and out_edge_1.1 transfers out_1 to an external consumer. Then this case will not trigger the block under if (node->GetOutputEdgesCount() > node->OutputDefs().size()) as 2 (edges) = 2 (output defs), so out_1 will be added to fused_outputs but then cancelled out and removed by its internal consumer. But it needs to remain in fused_outputs as it feeds an external consumer.

Is there any reason MiGraphX did not use the logic for finding the inputs and outputs from onnxruntime/core/providers/partitioning_utils.cc:292-365? Are you opposed to refactoring the code to use MakeComputeCapability? This would be simpler, more efficient in terms of code, and less prone to errors.

@Jonahcb Jonahcb force-pushed the fix/migraphx-graph-output-promotion-fixes-26176 branch from 697d22e to ac1458f Compare October 2, 2025 18:06
@TedThemistokleous
Copy link
Contributor

TedThemistokleous commented Oct 2, 2025

First off, appreciate the effort on this and contribution. You dove right in.
Keep it simple. If you know the outputs of the graph as well as the inputs any other outputs that arent in the graph output list should be pruned regardless of fusions since those are dangling outputs.
Essentially as you fuse each node, ensure there's a consumer to the output, if the next() node is null for that output path then you know its dangling. You can double check this in the final output of the graph to ensure even in the case of output layer being fused, you still have the same outputs and inputs to the entire model.
I think there's a way to massage the logic here to just exclude dangling outputs, and ensure they're filtered from the final output. That ensures that the fused graph use still gets Onnx's optimizations but none of the dangling outputs are then propagated and therefore not allocated in the later stages of this EP

Thank you for the response and encouragement:)

After studying the code some more, I think there may also be a hidden bug in the current/old implementation in this branch:if (node->GetOutputEdgesCount() > node->OutputDefs().size()). The intent of this case is correct, but the if-condition doesn't capture all the subcases. It doesn't capture the case where there are not more output edges than output defs, but an output def has multiple output edges and another output def has 0 output edges. For example, if we take a node with two output defs: out_1, out_2, and two output edges: out_edge_1, out_edge_1.1, where out_edge_1 transfers out_1 to an internal consumer and out_edge_1.1 transfers out_1 to an external consumer. Then this case will not trigger the block under if (node->GetOutputEdgesCount() > node->OutputDefs().size()) as 2 (edges) = 2 (output defs), so out_1 will be added to fused_outputs but then cancelled out and removed by its internal consumer. But it needs to remain in fused_outputs as it feeds an external consumer.

Is there any reason MiGraphX did not use the logic for finding the inputs and outputs from onnxruntime/core/providers/partitioning_utils.cc:292-365? Are you opposed to refactoring the code to use MakeComputeCapability? This would be simpler, more efficient in terms of code, and less prone to errors.

I would actually be open to a refactor at some point. Found the solution btw. Its an edge case with using subgraph when all the graph is supported/used here so we get into a degenerate case when there's an onnx model with dangling outputs for the intermediate nodes. Will post fix shortly.

I inherited a bit of this code so I'll likely be cleaning things as I go through. If you have any suggestions or ideas I'm open to input and by nature of this being open source, you're welcome to contribute too.

@TedThemistokleous
Copy link
Contributor

Fix for this should be here now - #26224

@Jonahcb
Copy link
Contributor Author

Jonahcb commented Oct 3, 2025

I would actually be open to a refactor at some point. Found the solution btw. Its an edge case with using subgraph when all the graph is supported/used here so we get into a degenerate case when there's an onnx model with dangling outputs for the intermediate nodes. Will post fix shortly.

I inherited a bit of this code so I'll likely be cleaning things as I go through. If you have any suggestions or ideas I'm open to input and by nature of this being open source, you're welcome to contribute too.

Sounds good. I will write out a refactoring on this branch so you can see what I have in mind. I suggest calling MakeComputeCapability (from onnxruntime/core/providers/partitioning_utils.cc) to replace most of the logic implemented in MIGraphXExecutionProvider::GetSubGraph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants