- 
                Notifications
    You must be signed in to change notification settings 
- Fork 706
[ET-VK][benchmarking][ez] Don't perform copies when benchmarking #9468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) [ghstack-poisoned]
## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) ghstack-source-id: 273059929 Pull Request resolved: #9468
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9468
 Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 PendingAs of commit f4c4585 with merge base 7159650 ( NEW FAILURE - The following job has failed:
 
 This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
| This pull request was exported from Phabricator. Differential Revision: D71570143 | 
| This PR needs a  | 
…arking" ## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) [ghstack-poisoned]
Pull Request resolved: #9468 ## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) ghstack-source-id: 274197244
| This pull request was exported from Phabricator. Differential Revision: D71570143 | 
e68552d
      into
      
  
    gh/SS-JIA/199/base
  
    
Stack from ghstack (oldest at bottom):
Context
The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead:
And this is done for both inputs and outputs.
Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor.
Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark.
Differential Revision: D71570143