Skip to content

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Oct 1, 2025

Stack from ghstack (oldest at bottom):

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

Fixes

Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how vulkan_preprocess.py was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied _copy_module is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: D83703496

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 1, 2025
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

ghstack-source-id: 313477373
Pull Request resolved: #14732
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14732

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit f892c30 with merge base 70ea661 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2025
@facebook-github-bot
Copy link
Contributor

@SS-JIA has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83703496.

@github-actions
Copy link

github-actions bot commented Oct 1, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 2, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
ghstack-source-id: 313740339
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313794474

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313795799

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313796850

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313914307

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313948426

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)

[ghstack-poisoned]
SS-JIA pushed a commit that referenced this pull request Oct 3, 2025
Pull Request resolved: #14732

Collecting fixes for various models/ops in this diff/PR.

They have all been squashed into this single change to make it easier to cherry pick.

# Fixes

## Wav2Letter

Type: Output correctness failure

This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum.

## ConvNeXT

Type: Exception during runtime

This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op.

To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes.

## Inception_V3/ViT

Type: Exception during runtime

The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, which uses the original name of the tensor which may contain capital letters. However after re-tracing the graph, the node's name was being lowercased. `vulkan_preprocess` was using _copy_module to update the exported program's graph module in place, which was not updating the ep's graph signature with the new lowercase name after retracing.

The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module.

There was also a small bug in Pool.cpp where `bool` was used to pass a UBO field that is received as an `int`.

## DenseNet 161 (w/ dynamic shapes)

Type: Output Mismatch

Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support.

Since batch norm is easy to implement, fix by implementing resize for batch norm.
ghstack-source-id: 313984117

Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
@facebook-github-bot facebook-github-bot merged commit f976379 into gh/SS-JIA/335/base Oct 4, 2025
174 of 178 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/SS-JIA/335/head branch October 4, 2025 01:31
GregoryComer pushed a commit that referenced this pull request Oct 7, 2025
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #14732 by
@SS-JIA
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/orig
Differential Revision:
[D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/)
@diff-train-skip-merge

Co-authored-by: Sicheng Jia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/nightly CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants