Skip to content

Commit e447656

Browse files
committed
Update on "[ET-VK] Split up prepack command buffer"
## Changes * Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once. * Introduce graph config options to control command buffer submission behaviour during prepacking. Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below). ## Context Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer. However, this approach has some drawbacks: * CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded. * There can be performance issues at the Vulkan API level when processing a single "large" command buffer. By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues. ## Llama 3.2 1B crash on Samsung S24 I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely. Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s). ## Performance Impact * Improves model load time, especially on larger models. ## Future Work * Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/) [ghstack-poisoned]
2 parents 5892e42 + 641157c commit e447656

File tree

87 files changed

+6966
-1094
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+6966
-1094
lines changed

.ci/scripts/unittest-buck2.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ set -eux
1010
# TODO: can't query cadence & vulkan backends
1111
# TODO: can't query //kernels/prim_ops because of non-buckified stuff in OSS.
1212
buck2 query "//backends/apple/... + //backends/example/... + \
13-
//backends/mediatek/... + //backends/test/... + //backends/transforms/... + \
13+
//backends/mediatek/... + //backends/transforms/... + \
1414
//backends/xnnpack/... + //configurations/... + //kernels/aten/... + \
1515
//kernels/optimized/... + //kernels/portable/... + //kernels/quantized/... + \
1616
//kernels/test/... + //runtime/... + //schema/... + //test/... + //util/..."

.github/workflows/android-release-artifacts.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,15 +47,13 @@ jobs:
4747
name: build-aar
4848
needs: check-if-aar-exists
4949
if: ${{ !github.event.pull_request.head.repo.fork }}
50-
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@release/2.7
50+
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
5151
secrets: inherit
5252
permissions:
5353
id-token: write
5454
contents: read
5555
with:
5656
secrets-env: EXECUTORCH_MAVEN_SIGNING_KEYID EXECUTORCH_MAVEN_SIGNING_PASSWORD EXECUTORCH_MAVEN_CENTRAL_PASSWORD EXECUTORCH_MAVEN_CENTRAL_USERNAME EXECUTORCH_MAVEN_SIGNING_GPG_KEY_CONTENTS
57-
# As this job has access to Maven credential, run this on a fresh ephemeral runner
58-
runner: ephemeral.linux.2xlarge
5957
docker-image: executorch-ubuntu-22.04-clang12-android
6058
submodules: 'recursive'
6159
ref: ${{ github.sha }}
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
#!/usr/bin/env bash
2+
# Copyright 2025 Arm Limited and/or its affiliates.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
set -euo pipefail
8+
9+
# TODO
10+
mlsdk_manifest_url=""
11+
12+
script_dir=$(cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
13+
14+
source ${script_dir}/utils.sh
15+
16+
usage() { echo "Usage: $0 [-u <mlsdk-manifest-url>]" 1>&2; exit 1; }
17+
18+
while getopts ":u:" opt; do
19+
case "${opt}" in
20+
u)
21+
mlsdk_manifest_url=${OPTARG}
22+
;;
23+
*)
24+
usage
25+
;;
26+
esac
27+
done
28+
29+
function download_ai_mlsdk_manifest() {
30+
local _dada_dir="$1"
31+
32+
if [[ -z "${_dada_dir}" ]]; then
33+
echo "Error: _dada_dir parameter missing?"
34+
return 1
35+
fi
36+
37+
if [[ -z "${mlsdk_manifest_url}" ]]; then
38+
echo "Error: mlsdk_manifest_url parameter missing?"
39+
return 1
40+
fi
41+
42+
if [[ ! -d "${_dada_dir}" ]]; then
43+
mkdir -p "$_dada_dir"
44+
pushd "$_dada_dir" || exit 1
45+
46+
curl https://storage.googleapis.com/git-repo-downloads/repo > repo
47+
chmod u+x repo
48+
./repo init --no-repo-verify --depth=1 --manifest-url ${mlsdk_manifest_url} -g model-converter,emulation-layer,vgf-library
49+
./repo sync
50+
51+
popd
52+
fi
53+
}
54+
55+
function setup_model_converter() {
56+
local work_dir="$1"
57+
local manifest_dir="$2"
58+
local enable_vgf_lib="$3"
59+
local enable_emulation_layer="$4"
60+
61+
if [[ -z "$work_dir" ]]; then
62+
echo "Error: work_dir parameter is required."
63+
return 1
64+
fi
65+
66+
if [[ -z "$manifest_dir" ]]; then
67+
echo "Error: manifest_dir parameter is required."
68+
return 1
69+
fi
70+
71+
mkdir -p "$work_dir"
72+
pushd "$work_dir" || exit 1
73+
74+
download_ai_mlsdk_manifest ${manifest_dir}
75+
76+
pushd "$manifest_dir"
77+
78+
# model-converter
79+
# TODO: Remove macOS patch after mlsdk fully supports macOS
80+
if [[ "$(uname)" == "Darwin" ]]; then
81+
sed -i '' '/^ *print(f"Unsupported host platform/ i\
82+
if system == "Darwin":\
83+
# Use default Apple toolchain (Clang) on macOS\
84+
return True\
85+
\
86+
' sw/model-converter/scripts/build.py
87+
fi
88+
python sw/model-converter/scripts/build.py -j$(nproc)
89+
90+
# libvgf
91+
if [[ "${enable_vgf_lib}" -eq 1 ]]; then
92+
# TODO: Remove macOS patch after mlsdk fully supports macOS
93+
if [[ "$(uname)" == "Darwin" ]]; then
94+
sed -i '' '/^ *print(f"ERROR: Unsupported host platform/ i\
95+
if system == "Darwin":\
96+
# Use default Apple toolchain (Clang) on macOS\
97+
return True\
98+
\
99+
' sw/vgf-lib/scripts/build.py
100+
fi
101+
python sw/vgf-lib/scripts/build.py -j$(nproc)
102+
fi
103+
104+
# emu layer
105+
if [[ "${enable_emulation_layer}" -eq 1 ]]; then
106+
pushd sw/emulation-layer
107+
cmake -B build \
108+
-DGLSLANG_PATH=../../dependencies/glslang \
109+
-DSPIRV_CROSS_PATH=../../dependencies/SPIRV-Cross \
110+
-DSPIRV_HEADERS_PATH=../../dependencies/SPIRV-Headers \
111+
-DSPIRV_TOOLS_PATH=../../dependencies/SPIRV-Tools \
112+
-DVULKAN_HEADERS_PATH=../../dependencies/Vulkan-Headers
113+
cmake --build build
114+
popd
115+
fi
116+
117+
popd
118+
}
119+
120+
#setup_model_converter() $1
121+
# `"$manifest_dir"'

backends/arm/tosa_quant_utils.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,15 @@ def quantize_value(self, x: torch.Tensor | float) -> Tensor:
146146
x = x.to(torch.float32)
147147
if self.per_channel:
148148
q_op = exir_ops.edge.quantized_decomposed.quantize_per_channel.default
149-
args = (x, self.scale, self.zp, self.axis, self.qmin, self.qmax, self.dtype)
149+
args = (
150+
x,
151+
torch.tensor(self.scale),
152+
torch.tensor(self.zp),
153+
self.axis,
154+
self.qmin,
155+
self.qmax,
156+
self.dtype,
157+
)
150158
else:
151159
q_op = exir_ops.edge.quantized_decomposed.quantize_per_tensor.default
152160
args = (x, self.scale, self.zp, self.qmin, self.qmax, self.dtype) # type: ignore[assignment]
@@ -162,8 +170,8 @@ def dequantize_value(self, qx: torch.Tensor) -> torch.Tensor:
162170
dq_op = exir_ops.edge.quantized_decomposed.dequantize_per_channel.default
163171
args = (
164172
qx,
165-
self.scale,
166-
self.zp,
173+
torch.tensor(self.scale),
174+
torch.tensor(self.zp),
167175
self.axis,
168176
self.qmin,
169177
self.qmax,

backends/cadence/aot/TARGETS

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,21 @@ python_library(
184184
],
185185
)
186186

187+
python_library(
188+
name = "program_builder",
189+
srcs = [
190+
"program_builder.py",
191+
],
192+
typing = True,
193+
deps = [
194+
":graph_builder",
195+
"fbcode//caffe2:torch",
196+
"fbcode//executorch/exir:lib",
197+
"fbcode//executorch/exir:pass_base",
198+
"fbcode//executorch/exir/verification:verifier",
199+
],
200+
)
201+
187202
python_library(
188203
name = "fuse_ops",
189204
srcs = [
@@ -508,6 +523,7 @@ python_unittest(
508523
":typing_stubs",
509524
":ops_registrations",
510525
":pass_utils",
526+
":program_builder",
511527
"//caffe2:torch",
512528
"//executorch/exir:memory",
513529
"//executorch/exir/dialects:lib",

0 commit comments

Comments
 (0)