Skip to content

Commit 1520f9f

Browse files
authored
[ET-VK] Introduce custom op correctness + speed testing suite & add vulkan operator testing to CI (#13835)
## Motivation Provide an easy way to test and benchmark custom operators when developing them. ## Changes Introduces a custom op test suite under `backends/vulkan/test/custom_ops`. Each operator will have its own test file, as seen in the next diff. `utils.[h|cpp]` define common utilities that can be used across test files. To facilitate prototyping, prototype shaders and C++ host code can be placed under the `impl/` and `glsl` folders. Output of the test binary looks like: ``` === Compute Shader Performance Benchmark === Add Operation Prototyping Framework ---------------------------------------------------------------------- Executing 32 test cases for Add ---------------------------------------------------------------------- Add_1x64x64_Texture3D_Float [1x64x64] 3.094 μs 1.324 GFLOP/s PASSED Add_1x64x64_Texture3D_Half [1x64x64] 2.574 μs 1.591 GFLOP/s SKIPPED Add_1x64x64_Buffer_Float [1x64x64] 3.084 μs 1.328 GFLOP/s PASSED Add_1x64x64_Buffer_Half [1x64x64] 2.668 μs 1.535 GFLOP/s SKIPPED Add_1x128x128_Texture3D_Float [1x128x128] 6.001 μs 2.730 GFLOP/s PASSED Add_1x128x128_Texture3D_Half [1x128x128] 4.004 μs 4.092 GFLOP/s SKIPPED Add_1x128x128_Buffer_Float [1x128x128] 6.074 μs 2.698 GFLOP/s PASSED Add_1x128x128_Buffer_Half [1x128x128] 5.112 μs 3.205 GFLOP/s SKIPPED Add_1x256x256_Texture3D_Float [1x256x256] 17.852 μs 3.671 GFLOP/s PASSED Add_1x256x256_Texture3D_Half [1x256x256] 10.057 μs 6.517 GFLOP/s SKIPPED Add_1x256x256_Buffer_Float [1x256x256] 19.027 μs 3.444 GFLOP/s PASSED Add_1x256x256_Buffer_Half [1x256x256] 15.330 μs 4.275 GFLOP/s SKIPPED Add_1x512x512_Texture3D_Float [1x512x512] 48.292 μs 5.428 GFLOP/s PASSED Add_1x512x512_Texture3D_Half [1x512x512] 26.832 μs 9.770 GFLOP/s SKIPPED Add_1x512x512_Buffer_Float [1x512x512] 48.828 μs 5.369 GFLOP/s PASSED Add_1x512x512_Buffer_Half [1x512x512] 48.308 μs 5.427 GFLOP/s SKIPPED Add_1x1x1024_Texture3D_Float [1x1x1024] 2.376 μs 0.431 GFLOP/s PASSED Add_1x1x1024_Texture3D_Half [1x1x1024] 2.215 μs 0.462 GFLOP/s SKIPPED Add_1x1x1024_Buffer_Float [1x1x1024] 2.402 μs 0.426 GFLOP/s PASSED Add_1x1x1024_Buffer_Half [1x1x1024] 2.304 μs 0.445 GFLOP/s SKIPPED Add_1x1024x1_Texture3D_Float [1x1024x1] 6.120 μs 0.167 GFLOP/s PASSED Add_1x1024x1_Texture3D_Half [1x1024x1] 6.245 μs 0.164 GFLOP/s SKIPPED Add_1x1024x1_Buffer_Float [1x1024x1] 2.392 μs 0.428 GFLOP/s PASSED Add_1x1024x1_Buffer_Half [1x1024x1] 2.304 μs 0.445 GFLOP/s SKIPPED Add_32x32x32_Texture3D_Float [32x32x32] 10.249 μs 3.197 GFLOP/s PASSED Add_32x32x32_Texture3D_Half [32x32x32] 6.583 μs 4.978 GFLOP/s SKIPPED Add_32x32x32_Buffer_Float [32x32x32] 10.468 μs 3.130 GFLOP/s PASSED Add_32x32x32_Buffer_Half [32x32x32] 8.481 μs 3.864 GFLOP/s SKIPPED Add_16x128x64_Texture3D_Float [16x128x64] 26.000 μs 5.041 GFLOP/s PASSED Add_16x128x64_Texture3D_Half [16x128x64] 17.841 μs 7.347 GFLOP/s SKIPPED Add_16x128x64_Buffer_Float [16x128x64] 28.917 μs 4.533 GFLOP/s PASSED Add_16x128x64_Buffer_Half [16x128x64] 28.792 μs 4.552 GFLOP/s SKIPPED ``` `SKIPPED` means that correctness checking is not performed on that test case. This usually happens in one of the following cases: * Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions * Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness. Differential Revision: [D81323426](https://our.internmc.facebook.com/intern/diff/D81323426/) cc manuelcandales cbilgin [ghstack-poisoned]
1 parent ff972c4 commit 1520f9f

File tree

15 files changed

+3153
-1
lines changed

15 files changed

+3153
-1
lines changed

.github/workflows/pull.yml

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -862,7 +862,7 @@ jobs:
862862
PYTHON_EXECUTABLE=python bash backends/nxp/run_unittests.sh
863863
864864
# Run aot examples:
865-
PYTHON_EXECUTABLE=python bash examples/nxp/run_aot_example.sh cifar10
865+
PYTHON_EXECUTABLE=python bash examples/nxp/run_aot_example.sh cifar10
866866
PYTHON_EXECUTABLE=python bash examples/nxp/run_aot_example.sh mobilenetv2
867867
868868
@@ -902,6 +902,34 @@ jobs:
902902
done
903903
904904
905+
test-vulkan-operators-linux:
906+
name: test-vulkan-operators-linux
907+
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
908+
permissions:
909+
id-token: write
910+
contents: read
911+
with:
912+
runner: linux.2xlarge
913+
docker-image: ci-image:executorch-ubuntu-22.04-clang12
914+
submodules: 'recursive'
915+
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
916+
timeout: 90
917+
script: |
918+
set -eux
919+
920+
# The generic Linux job chooses to use base env, not the one setup by the image
921+
CONDA_ENV=$(conda env list --json | jq -r ".envs | .[-1]")
922+
conda activate "${CONDA_ENV}"
923+
924+
# Setup swiftshader and Vulkan SDK which are required to build the Vulkan delegate
925+
source .ci/scripts/setup-vulkan-linux-deps.sh
926+
927+
# Setup python
928+
PYTHON_EXECUTABLE=python \
929+
CMAKE_ARGS="-DEXECUTORCH_BUILD_VULKAN=ON" \
930+
.ci/scripts/setup-linux.sh --build-tool "cmake"
931+
932+
PYTHON_EXECUTABLE=python bash backends/vulkan/test/custom_ops/build_and_run.sh add
905933
906934
nxp-build-test:
907935
name: nxp-build-test
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
cmake_minimum_required(VERSION 3.19)
8+
project(prototyping_shaders)
9+
10+
if(ANDROID)
11+
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY BOTH)
12+
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE BOTH)
13+
endif()
14+
15+
find_package(executorch CONFIG REQUIRED COMPONENTS vulkan_backend)
16+
17+
# Compile settings
18+
19+
set(VULKAN_CXX_FLAGS "-fexceptions")
20+
list(APPEND VULKAN_CXX_FLAGS "-DUSE_VULKAN_WRAPPER")
21+
list(APPEND VULKAN_CXX_FLAGS "-DUSE_VULKAN_VOLK")
22+
23+
message(STATUS "VULKAN_CXX_FLAGS: ${VULKAN_CXX_FLAGS}")
24+
25+
# Only build if Vulkan was compiled
26+
if(TARGET vulkan_backend)
27+
if(NOT EXECUTORCH_ROOT)
28+
set(EXECUTORCH_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/../../../..)
29+
endif()
30+
31+
if(NOT PYTHON_EXECUTABLE)
32+
set(PYTHON_EXECUTABLE python3)
33+
endif()
34+
35+
# Include this file to access executorch_target_link_options_shared_lib
36+
include(${EXECUTORCH_ROOT}/tools/cmake/Utils.cmake)
37+
include(${EXECUTORCH_ROOT}/backends/vulkan/cmake/ShaderLibrary.cmake)
38+
39+
# Third party include paths
40+
set(VULKAN_THIRD_PARTY_PATH ${EXECUTORCH_ROOT}/backends/vulkan/third-party)
41+
set(VULKAN_HEADERS_PATH ${VULKAN_THIRD_PARTY_PATH}/Vulkan-Headers/include)
42+
set(VOLK_PATH ${VULKAN_THIRD_PARTY_PATH}/volk)
43+
set(VMA_PATH ${VULKAN_THIRD_PARTY_PATH}/VulkanMemoryAllocator)
44+
45+
set(COMMON_INCLUDES ${EXECUTORCH_ROOT}/.. ${VULKAN_HEADERS_PATH} ${VOLK_PATH}
46+
${VMA_PATH}
47+
)
48+
49+
# Prototyping utility files
50+
set(PROTOTYPING_UTILS_HEADERS ${CMAKE_CURRENT_SOURCE_DIR})
51+
set(PROTOTYPING_UTILS_CPP ${CMAKE_CURRENT_SOURCE_DIR}/utils.cpp)
52+
53+
# Prototyping shaders
54+
message(STATUS "shader stuff")
55+
set(PROTOTYPING_SHADERS_PATH ${CMAKE_CURRENT_SOURCE_DIR}/glsl)
56+
gen_vulkan_shader_lib_cpp(${PROTOTYPING_SHADERS_PATH})
57+
vulkan_shader_lib(prototyping_shaderlib ${generated_spv_cpp})
58+
target_compile_options(prototyping_shaderlib PRIVATE ${VULKAN_CXX_FLAGS})
59+
message(STATUS "done shader stuff")
60+
61+
# Operator implementations library
62+
file(GLOB OPERATOR_IMPL_SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/impl/*.cpp)
63+
add_library(operator_implementations STATIC ${OPERATOR_IMPL_SOURCES})
64+
target_include_directories(
65+
operator_implementations PRIVATE ${COMMON_INCLUDES}
66+
)
67+
target_link_libraries(
68+
operator_implementations PRIVATE vulkan_backend executorch_core
69+
prototyping_shaderlib
70+
)
71+
target_compile_options(operator_implementations PRIVATE ${VULKAN_CXX_FLAGS})
72+
set_property(TARGET operator_implementations PROPERTY CXX_STANDARD 17)
73+
74+
executorch_target_link_options_shared_lib(vulkan_backend)
75+
executorch_target_link_options_shared_lib(operator_implementations)
76+
77+
# Function to create operator prototype binaries
78+
function(add_operator_prototype OPERATOR_NAME)
79+
set(TARGET_NAME ${OPERATOR_NAME})
80+
set(SOURCE_FILE ${CMAKE_CURRENT_SOURCE_DIR}/${OPERATOR_NAME}.cpp)
81+
82+
add_executable(${TARGET_NAME} ${SOURCE_FILE} ${PROTOTYPING_UTILS_CPP})
83+
target_include_directories(${TARGET_NAME} PRIVATE ${COMMON_INCLUDES})
84+
target_link_libraries(
85+
${TARGET_NAME} PRIVATE vulkan_backend executorch_core
86+
prototyping_shaderlib operator_implementations
87+
)
88+
target_compile_options(${TARGET_NAME} PRIVATE ${VULKAN_CXX_FLAGS})
89+
set_property(TARGET ${TARGET_NAME} PROPERTY CXX_STANDARD 17)
90+
endfunction()
91+
92+
# Define operator prototypes
93+
add_operator_prototype(add)
94+
endif()
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
load(":targets.bzl", "define_common_targets")
2+
3+
oncall("executorch")
4+
5+
define_common_targets(is_fbcode = True)
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
// Copyright (c) Meta Platforms, Inc. and affiliates.
2+
// All rights reserved.
3+
//
4+
// This source code is licensed under the BSD-style license found in the
5+
// LICENSE file in the root directory of this source tree.
6+
7+
#include <executorch/backends/vulkan/runtime/graph/ops/impl/Common.h>
8+
#include <executorch/backends/vulkan/runtime/graph/ops/utils/ShaderNameUtils.h>
9+
#include <iostream>
10+
#include <vector>
11+
#include "utils.h"
12+
13+
using namespace executorch::vulkan::prototyping;
14+
15+
// Generate test cases for add operation
16+
std::vector<TestCase> generate_add_test_cases() {
17+
std::vector<TestCase> test_cases;
18+
19+
// Set the data generation type as a local variable
20+
DataGenType data_gen_type = DataGenType::ONES;
21+
22+
// Define different input size configurations
23+
std::vector<std::vector<int64_t>> size_configs = {
24+
{1, 64, 64}, // Small square
25+
{1, 128, 128}, // Medium square
26+
{1, 256, 256}, // Large square
27+
{1, 512, 512}, // Very large square
28+
{1, 1, 1024}, // Wide tensor
29+
{1, 1024, 1}, // Tall tensor
30+
{32, 32, 32}, // 3D cube
31+
{16, 128, 64}, // 3D rectangular
32+
};
33+
34+
// Storage types to test
35+
std::vector<utils::StorageType> storage_types = {
36+
utils::kTexture3D, utils::kBuffer};
37+
38+
// Data types to test
39+
std::vector<vkapi::ScalarType> data_types = {vkapi::kFloat, vkapi::kHalf};
40+
41+
// Generate test cases for each combination
42+
for (const auto& sizes : size_configs) {
43+
for (const auto& storage_type : storage_types) {
44+
for (const auto& data_type : data_types) {
45+
TestCase test_case;
46+
47+
// Create a descriptive name for the test case
48+
std::string size_str = "";
49+
for (size_t i = 0; i < sizes.size(); ++i) {
50+
size_str += std::to_string(sizes[i]);
51+
if (i < sizes.size() - 1)
52+
size_str += "x";
53+
}
54+
55+
std::string storage_str =
56+
(storage_type == utils::kTexture3D) ? "Texture3D" : "Buffer";
57+
std::string dtype_str = (data_type == vkapi::kFloat) ? "Float" : "Half";
58+
59+
// Add data generation type to the name for clarity
60+
std::string test_name =
61+
"Add_" + size_str + "_" + storage_str + "_" + dtype_str;
62+
test_case.set_name(test_name);
63+
64+
// Set the operator name for the test case
65+
test_case.set_operator_name("etvk.add_prototype");
66+
67+
// Add two input tensors with the same size, type, storage, and data
68+
// generation method
69+
ValueSpec input_a(
70+
sizes, data_type, storage_type, utils::kWidthPacked, data_gen_type);
71+
ValueSpec input_b(
72+
sizes, data_type, storage_type, utils::kWidthPacked, data_gen_type);
73+
74+
// Add output tensor with the same size, type, and storage as inputs
75+
// (output uses ZEROS by default)
76+
ValueSpec output(
77+
sizes,
78+
data_type,
79+
storage_type,
80+
utils::kWidthPacked,
81+
DataGenType::ZEROS);
82+
83+
test_case.add_input_spec(input_a);
84+
test_case.add_input_spec(input_b);
85+
test_case.add_output_spec(output);
86+
87+
test_cases.push_back(test_case);
88+
}
89+
}
90+
}
91+
92+
return test_cases;
93+
}
94+
95+
// Custom FLOP calculator for add operation
96+
// Add operation performs 1 FLOP (addition) per element
97+
int64_t add_flop_calculator(const TestCase& test_case) {
98+
// Calculate total elements from the first input tensor
99+
int64_t total_elements = 1;
100+
if (!test_case.empty() && test_case.num_inputs() > 0 &&
101+
test_case.inputs()[0].is_tensor()) {
102+
const auto& sizes = test_case.inputs()[0].get_tensor_sizes();
103+
for (int64_t size : sizes) {
104+
total_elements *= size;
105+
}
106+
}
107+
108+
// Add operation: 1 FLOP per element (one addition)
109+
return total_elements;
110+
}
111+
112+
// Reference implementation for add operator
113+
void add_reference_compute(TestCase& test_case) {
114+
const ValueSpec& input_a = test_case.inputs().at(0);
115+
const ValueSpec& input_b = test_case.inputs().at(1);
116+
117+
ValueSpec& output = test_case.outputs().at(0);
118+
119+
if (input_a.dtype != vkapi::kFloat) {
120+
throw std::invalid_argument("Unsupported dtype");
121+
}
122+
123+
// Calculate number of elements
124+
int64_t num_elements = input_a.numel();
125+
126+
auto& input_a_data = input_a.get_float_data();
127+
auto& input_b_data = input_b.get_float_data();
128+
129+
auto& ref_data = output.get_ref_float_data();
130+
ref_data.resize(num_elements);
131+
for (int64_t i = 0; i < num_elements; ++i) {
132+
ref_data[i] = input_a_data[i] + input_b_data[i];
133+
}
134+
}
135+
136+
int main(int argc, char* argv[]) {
137+
set_print_output(false); // Disable output tensor printing
138+
set_print_latencies(false); // Enable latency timing printing
139+
set_use_gpu_timestamps(true); // Enable GPU timestamps
140+
141+
print_performance_header();
142+
std::cout << "Add Operation Prototyping Framework" << std::endl;
143+
print_separator();
144+
145+
// Initialize Vulkan context
146+
try {
147+
api::context()->initialize_querypool();
148+
} catch (const std::exception& e) {
149+
std::cerr << "Failed to initialize Vulkan context: " << e.what()
150+
<< std::endl;
151+
return 1;
152+
}
153+
154+
// Execute test cases using the new framework with custom FLOP calculator and
155+
// reference compute
156+
auto results = execute_test_cases(
157+
generate_add_test_cases,
158+
add_flop_calculator,
159+
"Add",
160+
3,
161+
10,
162+
add_reference_compute);
163+
164+
return 0;
165+
}

0 commit comments

Comments
 (0)