Skip to content

Commit bfc6dfb

Browse files
committed
Update base for Update on "[ET-VK][ez] Fix Vulkan Validation layer errors due to consecutive command buffer encoding"
## Changes * In `VulkanBackend.cpp` do not call `encode_execute()` during model load if the model compile spec specifies `requires_dynamic_shapes` as true * In test files, do not call `encode_execute()` if `propagate_resize()` is subsequently called. ## Motivation Recently, it was discovered that a command buffer re-encode was required to update push constant values. This means that for dynamic shapes to work correctly, `encode_execute()` must be called after updating tensor sizes. As a result, `propagate_resize()` now calls `encode_execute()` internally. This results in scenarios where `encode_execute()` is called once during model load, then again right before the first inference during `propagate_resize()`, without actually executing the command buffer in-between. This causes Validation layer errors like ``` UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x88d2b500000000e2, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x88d2b500000000e2[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED. Objects: 2 [0] 0x24086224ec0, type: 6, name: NULL [1] 0x88d2b500000000e2, type: 10, name: NULL UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 - Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0: handle = 0x24086224ec0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x6caffc00000000e3, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4dae5635 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] command buffer VkCommandBuffer 0x24086224ec0[] expects VkImage 0x6caffc00000000e3[] (subresource: aspectMask VK_IMAGE_ASPECT_COLOR_BIT array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_UNDEFINED. Objects: 2 [0] 0x24086224ec0, type: 6, name: NULL [1] 0x6caffc00000000e3, type: 10, name: NULL ``` because the last access information of image/buffer resources are inaccurate during the second command buffer encoding, since the first command buffer never executed. ## Perf Impact * Performance improvement for first inference of dynamic shape models if actual tensor sizes are much smaller than maximum possible sizes * No impact for non-dynamic shape models Differential Revision: [D76047203](https://our.internmc.facebook.com/intern/diff/D76047203/) cc manuelcandales cbilgin [ghstack-poisoned]
2 parents 012f7a3 + 27cb43d commit bfc6dfb

File tree

121 files changed

+5867
-1394
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+5867
-1394
lines changed

.ci/scripts/build_llama_android.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ build_llama_runner() {
4242
popd
4343
ANDROID_ABI=arm64-v8a
4444
cmake -DBUCK2="${BUCK2}" \
45+
-DBUILD_TESTING=OFF \
4546
-DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK"/build/cmake/android.toolchain.cmake \
4647
-DANDROID_ABI="${ANDROID_ABI}" \
4748
-DCMAKE_INSTALL_PREFIX=cmake-android-out \

.ci/scripts/test_llama.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ cmake_build_llama_runner() {
169169
popd
170170
dir="examples/models/llama"
171171
retry cmake \
172+
-DBUILD_TESTING=OFF \
172173
-DCMAKE_INSTALL_PREFIX=cmake-out \
173174
-DCMAKE_BUILD_TYPE="$CMAKE_BUILD_TYPE" \
174175
-Bcmake-out/${dir} \

.ci/scripts/test_llama_torchao_lowbit.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ cmake --build cmake-out -j16 --target install --config Release
4040

4141
# Install llama runner with torchao
4242
cmake -DPYTHON_EXECUTABLE=python \
43+
-DBUILD_TESTING=OFF \
4344
-DCMAKE_BUILD_TYPE=Release \
4445
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
4546
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \

.ci/scripts/test_llava.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,10 @@ cmake_install_executorch_libraries_for_android() {
6464

6565

6666
LLAVA_COMMON_CMAKE_ARGS=" \
67+
-DBUILD_TESTING=OFF \
6768
-DPYTHON_EXECUTABLE="$PYTHON_EXECUTABLE" \
6869
-DCMAKE_INSTALL_PREFIX=${BUILD_DIR} \
69-
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
70+
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
7071
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
7172
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
7273
-DEXECUTORCH_BUILD_XNNPACK=ON"

.github/workflows/trunk.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ jobs:
262262
output=$(ls -la ${elf})
263263
arr=($output)
264264
size=${arr[4]}
265-
threshold="102400" # 100KiB
265+
threshold="103068" # ~100KiB
266266
echo "size: $size, threshold: $threshold"
267267
if [[ "$size" -le "$threshold" ]]; then
268268
echo "Success $size <= $threshold"

backends/arm/_passes/annotate_channels_last_dim_order_pass.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@
3535
def _transpose_impl(*args, **kwargs):
3636
# Validate length of dim_order array
3737
dim = args[1]
38-
assert len(dim) in (4, 5)
38+
if len(dim) != 4 and len(dim) != 5:
39+
raise ValueError(
40+
f"Dim order length must be either 4 or 5, got {len(dim)}: {dim}"
41+
)
3942
# Pass-through in edge-IR
4043
return args[0]
4144

backends/arm/_passes/convert_split_to_slice.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,14 @@ def call(self, graph_module: torch.fx.GraphModule):
4141
dim = split_node.args[2] if len(split_node.args) > 2 else 0
4242
dim = (dim + rank) % rank
4343

44-
assert (
45-
sum(split_lengths) == shape[dim]
46-
), "Given split lengths don't sum up to the size of the dimension."
44+
# Validate that split lengths cover the entire dimension
45+
length_sum = sum(split_lengths)
46+
dim_size = shape[dim]
47+
if length_sum != dim_size:
48+
raise ValueError(
49+
f"Split sizes {split_lengths} sum to {length_sum}, "
50+
f"but dimension {dim} has size {dim_size}"
51+
)
4752

4853
# Convert split argument 'split_lengths' to slice arguments start and end.
4954
starts = [0] * len(split_lengths)

backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,9 @@ def fold_and_annotate_arg(
120120
if input_qparams is not None:
121121
node.meta["input_qparams"][i] = input_qparams
122122
for n in nodes_to_remove:
123-
assert n.target == dq_op
123+
if n.target != dq_op:
124+
raise RuntimeError(f"Expected {dq_op} dq_op, got {n.target}")
125+
124126
n.replace_all_uses_with(n.args[0]) # type: ignore[arg-type]
125127
graph_module.graph.erase_node(n)
126128

@@ -136,14 +138,16 @@ def call(self, graph_module: GraphModule) -> PassResult:
136138
continue
137139

138140
# Make sure we haven't already set qparams meta information on the node
139-
assert "input_qparams" not in n.meta, (
140-
f'Unexpected key "input_qparams" found in meta for node {n}. '
141-
"input_qparams should not have been set at this point"
142-
)
143-
assert "output_qparams" not in n.meta, (
144-
f'Unexpected key "output_qparams" found in meta for node {n}. '
145-
"output_qparams should not have been set at this point"
146-
)
141+
if "input_qparams" in n.meta:
142+
raise RuntimeError(
143+
f'Unexpected key "input_qparams" found in meta for node {n}. '
144+
"input_qparams should not have been set at this point"
145+
)
146+
if "output_qparams" in n.meta:
147+
raise RuntimeError(
148+
f'Unexpected key "output_qparams" found in meta for node {n}. '
149+
"output_qparams should not have been set at this point"
150+
)
147151

148152
# for the inputs and outputs search the graph for quantization info and
149153
# store the information in a dict with order of the _tensor_ inputs as key,

backends/arm/_passes/insert_table_ops.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -240,8 +240,17 @@ def call(self, graph_module: GraphModule) -> PassResult:
240240
args=(node.args[0],),
241241
)
242242
output_node = table_node
243-
assert len(input_qparams) == 1
244-
assert len(output_qparams) == 1
243+
# Expect exactly one quantization parameter for input and output
244+
if len(input_qparams) != 1:
245+
raise ValueError(
246+
f"InsertTableOpsPass expected exactly one input quantization parameter, "
247+
f"got {len(input_qparams)} for node {node.name}"
248+
)
249+
if len(output_qparams) != 1:
250+
raise ValueError(
251+
f"InsertTableOpsPass expected exactly one output quantization parameter, "
252+
f"got {len(output_qparams)} for node {node.name}"
253+
)
245254

246255
# Generate table buffer and how much to lshift the table output.
247256
buffer, lshift = self.generate_table_values(

backends/arm/_passes/remove_clone_pass.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,8 @@ def call_operator(self, op, args, kwargs, meta):
1717
if op != exir_ops.edge.aten.clone.default:
1818
return super().call_operator(op, args, kwargs, meta)
1919

20-
assert len(args) == 1
20+
if len(args) != 1:
21+
raise ValueError(
22+
f"clone operator expects exactly one argument, got {len(args)}"
23+
)
2124
return args[0]

0 commit comments

Comments
 (0)