Skip to content

Conversation

@wwwind
Copy link
Collaborator

@wwwind wwwind commented Aug 15, 2025

This change allows us to preserve output order after export.

Change-Id: I7ee55c2877ca1b247f10d2e90da3ba38dc727b6f
Signed-off-by: Elena Zhelezina [email protected]

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

This change allows us to preserve output order after export.

Change-Id: I7ee55c2877ca1b247f10d2e90da3ba38dc727b6f
Signed-off-by: Elena Zhelezina <[email protected]>
@wwwind wwwind requested a review from digantdesai as a code owner August 15, 2025 09:04
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13454

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 130 Pending

As of commit 71d2b8d with merge base 41730fa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2025
@wwwind wwwind requested a review from zingo August 15, 2025 09:05
@wwwind wwwind added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Aug 15, 2025
@zingo
Copy link
Collaborator

zingo commented Aug 18, 2025

The dl3 model tests seem to have got problems, I rerun test to see if it's random or a 100% thing.

@zingo
Copy link
Collaborator

zingo commented Aug 18, 2025

Unfortunately not :( I wonder if we assumed some order or where just lucky before.

@wwwind
Copy link
Collaborator Author

wwwind commented Aug 18, 2025

Hi @zingo I will run this test locally and check the stability for it. Thank you!

Comment on lines +83 to +95
spec = TosaSpecification.create_from_string("TOSA-1.0+INT")
compile_spec = ArmCompileSpecBuilder().tosa_compile_spec(tosa_spec=spec).build()
# Setup quantizer
quantizer = TOSAQuantizer(compile_spec)
quantizer.set_global(
get_symmetric_quantization_config(is_qat=True, is_per_channel=False)
)
# Trace the model
dummy = torch.randn(batch_size, 1, 28, 28)
fx_mod = torch.export.export_for_training(model, (dummy,)).module()
model = prepare_pt2e(fx_mod, quantizer)
model(dummy)
model = convert_pt2e(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use FP profile and avoid quantization in this test? Just to simplify

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order of inputs has happened only for INT profile, it is not repro in FP.
this test fails without this fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because we don't run the same test in FP? I am failing to see a output order connection with the TOSA profiling? Is there a pass we run only in INT profile which shuffles the order or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just happens here that for FP profile the order is what we want.
this test for INT does not fail in the debugger, for example, and that is why it was impossible for me to find out where exactly it fails during partioning but it fails when we run as a pytest.
the order of outputs is not deterministic. this change makes sure that we re-order according to the initial order.
the reason of these flakiness can be in usage of set, etc inside of Python code.
we need this fix for our project and this is a clean and working solution to make sure that order is as original

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just happens here that for FP profile the order is what we want.
this test for INT does not fail in the debugger, for example, and that is why it was impossible for me to find out where exactly it fails during partioning but it fails when we run as a pytest.

Weird. Tracking the output order after each pass might lead to something. You can add a print in the base class for ExportPass or something.

the order of outputs is not deterministic.

This is surprising TBH. export() does have this guarantee (if flattened and back then perhaps with preserve_module_call_signature arg). Also, ExportGraphSignature also same, but it adds more stuff if you do buffer modifications.

Inputs = [*parameters_buffers_constant_tensors, *flattened_user_inputs]
Outputs = [*mutated_inputs, *flattened_user_outputs]

See - https://docs.pytorch.org/docs/stable/export.html#torch.export.graph_signature.ExportGraphSignature

we need this fix for our project and this is a clean and working solution to make sure that order is as original

I get this and am also OK with landing this as a TOSA level solution.
That said, I would like to understand the root cause a bit better and see what's the right place to fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I create a ticket for us to investigate further. I close this PR for now then

@wwwind wwwind closed this Aug 20, 2025
@wwwind wwwind reopened this Aug 29, 2025
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwwind - can you create an issue to further investigate the root case. Main concern is if this is needed by all the backends etc. But stamping to unblock you. I didn't mean to block you in the first place. Thanks.

exported_program=edge_program
)

# Re-shuffle output nodes to preserve author's order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so IIUC the order was correct before we ran passes (i.e. for the incoming edge_program) but then got switched up? If yes, did we find if some pass(es) are injecting things out of order in output_node.arg[0]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the order was correct here. after we run passes, it changed.
because in the debugger this test was working, I was not able to get to the true reason.

@zingo
Copy link
Collaborator

zingo commented Sep 5, 2025

@wwwind this unfortunally seem to need a rebase before merge, hopefully that fixes most of the the CI fails also.

Change-Id: I61cbbb515fd1bfd2f92cf6ac0c110452ca9f624b
@wwwind
Copy link
Collaborator Author

wwwind commented Sep 5, 2025

i resolved conflicts locally and dont understand what is going on here with Resolve conflicts being grayed out

@wwwind
Copy link
Collaborator Author

wwwind commented Sep 5, 2025

Replaced with #13997 due to this grayed out Resolve conflicts button
Locally all conflicts are resolved and change is pushed, but - no idea what is going on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants