Skip to content

Conversation

TedThemistokleous
Copy link
Contributor

Description

Fixes a case where full graph passed to GetSubGraph() promotes intermediate dangling node outputs as the final graph output.

Motivation and Context

Fixes this as was seen in a customer model for us. Refereced to MIGraphX EP change here:ROCm#179

@TedThemistokleous
Copy link
Contributor Author

cc @tianleiwu @snnn bugfix for MIGraphX EP

@TedThemistokleous TedThemistokleous changed the title Migx ep fix dangling nodes being promoted to output nodes when graph not split [MIGraphX EP] Fix dangling graph outputs nodes being promoted to output nodes when graph not split Oct 3, 2025
@snnn snnn added the ep:MIGraphX issues related to AMD MI GraphX execution provider label Oct 3, 2025
if (output.second->Exists()) {
auto name = output.second->Name();
if (std::find(graph_output_names.begin(), graph_output_names.end(), name) == graph_output_names.end()) {
// if graph is split we dont know if output is used so we need this, otherwise if the graph isn't split
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the graph is split, can't we check if an output is used by checking if it has an external consumer? Wouldn't it be better to build this check into the logic above so that we don't have to add another parameter into the function call and complicate the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically want to run the entire model through MIGraphx instead of adding memcopies between another EP (say CPU EP) since that adds overhead.

Sure the logic can be added later but in the "no partition" case, why bother checking every node? We know the entire graphs input and outputs from the original metadata and then can confidently say those nodes have no use an can be pruned.

For the split case, you can check but again going back to the first piece ( we want to minimize fallback) it would be a signal for us to ensure all the operators in the graph are supported and add the support in MIGraphX for that.

Copy link
Contributor

@Jonahcb Jonahcb Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. This approach reduces the overhead of looping through the edges of every node.

One concern: this approach will promote internal dangling outputs to subgraph outputs in the "partition" case, which will then be computed and allocated when they should have been pruned in this step, leading to a waste of memory and computation later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the flag for the unsupported nodes will be nonzero, and then we revert back to the previous behavior as if the is_graph_split check wasn't set anymore.

The issue isn't just that we have dangling output edges from internal nodes, its that in the full case they're interpreted as a valid output to be promoted to the graph output, modifying the metadata. In this case too there are 0 fusions being done that we saw in the model.

In the split case, yes you need to check if that node output edge is consumed before determine if its dangling.

Copy link
Contributor

@Jonahcb Jonahcb Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you mean.

No, because the flag for the unsupported nodes will be nonzero, and then we revert back to the previous behavior > as if the is_graph_split check wasn't set anymore.

By "previous behavior," you mean the behavior before this PR, which promoted dangling outputs from individual nodes to outputs of the full subgraph?

I assume that is what you mean because you say this:

In the split case, yes you need to check if that node output edge is consumed before determine if its dangling.

But then, given that in this PR we are not checking "if that node output edge is consumed before determin[ing] if it's dangling" that means we have incorrect logic in the split case in this PR.

Specifically, is_graph_split will evaluate to true and promote internal dangling outputs of nodes in this subgraph to subgraph outputs. Although these dangling outputs won't be promoted to model outputs, they will still be treated as outputs of the subgraph when they should have been pruned.

The issue isn't just that we have dangling output edges from internal nodes, its that in the full case they're >interpreted as a valid output to be promoted to the graph output, modifying the metadata. In this case too there >are 0 fusions being done that we saw in the model.

I agree that there are two issues:
1). Dangling outputs promoted to full model outputs,
2). dangling outputs promoted to fused node outputs

This solves 1, but keeps problem 2 intact.

A better approach would be to solve 1 and 2 with one modification.

Copy link
Contributor Author

@TedThemistokleous TedThemistokleous Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this isn't about "better" its about "good enough" here for now. I've got a customer waiting on a change for a fully parsed model so case 2 isn't valid. If we hit case 2 we typically just let MIGraphX do the optimizations since we turn off all optimizations and let MIGraphX handle it as well or add in the missing parser op support. infact similar to other EPs the intent is that case 1 is the default case and support be added so we don't have to split the graph ever.

@TedThemistokleous
Copy link
Contributor Author

@snnn any comments on this? We've put this for ROCm 7.1 builds right now.

@Jonahcb
Copy link
Contributor

Jonahcb commented Oct 6, 2025

@snnn if this is good enough for ORT, let us know. If you would prefer a refactoring that makes use of MakeComputeCapability, let me know and I will do that. Don't want to allocate time to a refactoring if you don't prefer it for the ORT codebase.

@TedThemistokleous
Copy link
Contributor Author

@snnn if this is good enough for ORT, let us know. If you would prefer a refactoring that makes use of MakeComputeCapability, let me know and I will do that. Don't want to allocate time to a refactoring if you don't prefer it for the ORT codebase.

You can still do the refactor and I can test/take a look. These aren't mutually exclusive and you're welcome to contribute since we're open source. I'd likely test this refactor and mirror the change on the AMD side to our internal branch if it improves things.

@tianleiwu
Copy link
Contributor

Please merge latest main, and also run lintrunner to format the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:MIGraphX issues related to AMD MI GraphX execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants