Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
0f4caa1
[flamingo] Update preproc imports (#5160)
lucylq Sep 9, 2024
2dee34e
Refactor namespace usage in module tests.
shoumikhin Sep 9, 2024
647bfd4
Add an overload to skip dtype and sizes.
shoumikhin Sep 9, 2024
b52d4b6
Enable Llama3 Multi-turn conversation
cmodi-meta Sep 9, 2024
cd9d536
Make convert to linear an export pass
mcr229 Sep 9, 2024
b69ae0c
Hide and simplify operator registry internals
dbort Sep 9, 2024
6b1e328
[ExecuTorch] Support BFloat16 in CPUBlas gemm
swolchok Sep 9, 2024
c634f14
FFHT enhancements to fast hadamard transform kernels
swolchok Sep 9, 2024
eca9ed5
q to s start ops | add dim order sanity check
Gasoonjia Sep 9, 2024
85410e4
Qualcomm AI Engine Direct - Optimization and fix mutable buffer issue…
shewu-quic Sep 9, 2024
3858dca
Update base for Update on "FFHT enhancements to fast hadamard transfo…
swolchok Sep 9, 2024
ed45a66
Update on "FFHT enhancements to fast hadamard transform kernels"
swolchok Sep 9, 2024
d2014e3
Add a target rule for ops_registrations (#5083)
LeeOHzzZ Sep 9, 2024
b23ee01
Register LLM prefill native method in JNI
kirklandsign Sep 9, 2024
28beeff
Clean up devtools/etdump
dbort Sep 9, 2024
dc4f9fc
Update base for Update on "FFHT enhancements to fast hadamard transfo…
swolchok Sep 9, 2024
0c95f18
Update on "FFHT enhancements to fast hadamard transform kernels"
swolchok Sep 9, 2024
6ce9f52
t to z start ops | add dim order sanity check
Gasoonjia Sep 9, 2024
542ecb5
Add Echo parameter to multimodal runner (llava) and jni layer (#5181)
cmodi-meta Sep 9, 2024
59d9bad
Use c++17 for size test
lucylq Sep 9, 2024
7650667
Add a default delegate time scale converter
Olivia-liu Sep 10, 2024
f412630
Qualcomm AI Engine Direct - Uplevel QNN version for ci test (#5174)
shewu-quic Sep 10, 2024
c5a385e
Update schema to include infinity for double values
lucylq Sep 10, 2024
f471556
Partition Mutable Buffer as Core ML State (#5165)
YifanShenSZ Sep 10, 2024
67ae762
Qualcomm AI Engine Direct - Add the argument to specify soc model (#5…
shewu-quic Sep 10, 2024
63e794a
Add pass to convert special case of mean.dim to averagepool2d
per Sep 10, 2024
370f304
Add slice_scatter test: large end value
manuelcandales Sep 10, 2024
083b9e6
[ET-VK] Fix gpuinfo CI
junpi3 Sep 10, 2024
1eeded1
Let the app check "aatp/data" subdir for AWS.
shoumikhin Sep 10, 2024
126abb5
Update the API of registering fake kernels to new standard (#5084)
LeeOHzzZ Sep 10, 2024
657789e
Qualcomm AI Engine Direct - Apply spin quant R1 and R2 (#5175)
shewu-quic Sep 10, 2024
549f14b
Restore constant segment
lucylq Sep 10, 2024
e826de3
Add Half/BFloat16 tests for op_mul
manuelcandales Sep 10, 2024
43e2f2d
Qualcomm AI Engine Direct - support skip quantization (#5070)
haowhsu-quic Sep 10, 2024
30acae5
Switch over backend tests to export_for_training
tarun292 Sep 10, 2024
db34239
[LLava] Fix stats for C++ runner
digantdesai Sep 10, 2024
02304d7
Update bundled_program to use new namespace
dbort Sep 10, 2024
c76b22f
Qualcomm AI Engine Direct - Fixed the order of the transforms for lla…
shewu-quic Sep 10, 2024
d38ca81
Android refactor cmake build
kirklandsign Sep 10, 2024
a4d67e2
Android: Leverage prefillPrompt and prefillImage on Llava
Riandy Sep 10, 2024
b54206d
Update the minimum C++ version to C++17
dbort Sep 10, 2024
4ce0f9d
Introduce PlatformMemoryAllocator
manuelcandales Sep 10, 2024
2b50c76
Use dynamic bound by default.
shoumikhin Sep 10, 2024
ced40f4
Fix models in benchinfra (#5226)
guangy10 Sep 10, 2024
e245590
App side change
kirklandsign Sep 10, 2024
4cce620
Minor fix: Create root dir when it doesn't exist. (#5075)
freddan80 Sep 10, 2024
ab6d91c
Fix internal executorch_llama_jni
kirklandsign Sep 10, 2024
f07e4d5
Update setup-with-qnn.sh with runner util flag (#5210)
WuhanMonkey Sep 10, 2024
cac2c05
[ET-VK] Integrate axis mapping into optimized matrix multiplication s…
SS-JIA Sep 10, 2024
cba5bee
fbshipit-source-id: f63634ba171da01328849d84552b125b829403e8
facebook-github-bot Sep 11, 2024
ca889fb
Minibench use model_dir instead (#5250)
kirklandsign Sep 11, 2024
e4d72ce
Update setup.sh for LlamaDemo (#5235)
kirklandsign Sep 11, 2024
d423131
Android app UI/flow improvements (#5241)
Riandy Sep 11, 2024
7942d2c
Allow core aten op exception list (#5237)
larryliu0820 Sep 11, 2024
69aed24
link whole quantized_ops_lib (#5253)
kirklandsign Sep 11, 2024
41bc1ce
spinquant in eager mode (#5125)
Sep 11, 2024
d7a7ec6
Updated the workflow to upload models to S3 (#5232)
Sep 11, 2024
7e374d7
Add model execution scripts and runner (#5217)
neuropilot-captain Sep 11, 2024
af80804
Debug event populates event name (#5142)
Olivia-liu Sep 11, 2024
68397af
Optimized op_mm using CPUBlas gemm (#5242)
swolchok Sep 11, 2024
d73a653
Add optimized op_linear (#5243)
swolchok Sep 11, 2024
3171ede
Add scalar tensor tests. (#5260)
shoumikhin Sep 11, 2024
4da3c5d
Add CoreML Quantize (#5228)
Sep 11, 2024
d6b800b
Add helper function to create empty, full, ones and zeros tensors. (#…
shoumikhin Sep 11, 2024
75a56a2
Add helper function to create random tensors. (#5266)
shoumikhin Sep 11, 2024
750625e
Update base for Update on "FFHT enhancements to fast hadamard transfo…
swolchok Sep 11, 2024
5229d4e
Update on "FFHT enhancements to fast hadamard transform kernels"
swolchok Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/scripts/build-qnn-sdk.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ set -o xtrace
build_qnn_backend() {
echo "Start building qnn backend."
export ANDROID_NDK_ROOT=/opt/ndk
export QNN_SDK_ROOT=/tmp/qnn/2.23.0.240531
export QNN_SDK_ROOT=/tmp/qnn/2.25.0.240728
export EXECUTORCH_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/../.." && pwd)"

bash backends/qualcomm/scripts/build.sh --skip_aarch64 --job_number 2 --release
Expand Down
26 changes: 24 additions & 2 deletions .ci/scripts/setup-qnn-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@

set -ex

verify_pkg_installed() {
echo $(dpkg-query -W --showformat='${Status}\n' $1|grep "install ok installed")
}

install_qnn() {
echo "Start installing qnn."
QNN_INSTALLATION_DIR=/tmp/qnn
mkdir -p "${QNN_INSTALLATION_DIR}"

curl -Lo /tmp/v2.23.0.24.06.24.zip "https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.23.0.24.06.24.zip"
curl -Lo /tmp/v2.25.0.24.07.28.zip "https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.25.0.240728.zip"
echo "Finishing downloading qnn sdk."
unzip -qo /tmp/v2.23.0.24.06.24.zip -d /tmp
unzip -qo /tmp/v2.25.0.24.07.28.zip -d /tmp
echo "Finishing unzip qnn sdk."


Expand All @@ -26,4 +30,22 @@ install_qnn() {
ls -lah "${QNN_INSTALLATION_DIR}"
}

setup_libc++() {
sudo apt-get update
pkgs_to_check=('libc++-dev')
j=0
while [ $j -lt ${#pkgs_to_check[*]} ]; do
install_status=$(verify_pkg_installed ${pkgs_to_check[$j]})
if [ "$install_status" == "" ]; then
sudo apt-get install -y ${pkgs_to_check[$j]}
if [[ $? -ne 0 ]]; then
echo "ERROR: Failed to install required packages for libc++"
exit 1
fi
fi
j=$(( $j +1));
done
}

setup_libc++
install_qnn
2 changes: 1 addition & 1 deletion .ci/scripts/test_llama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ echo "COREML option ${COREML}"
if [[ "${MODE}" =~ .*qnn.* ]]; then
QNN=ON
export EXECUTORCH_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)"
export QNN_SDK_ROOT=/tmp/qnn/2.23.0.240531
export QNN_SDK_ROOT=/tmp/qnn/2.25.0.240728
export LD_LIBRARY_PATH="${QNN_SDK_ROOT}/lib/x86_64-linux-clang"
export PYTHONPATH=".."
cp schema/program.fbs exir/_serialize/program.fbs
Expand Down
1 change: 1 addition & 0 deletions .ci/scripts/test_llava.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ if hash nproc &> /dev/null; then NPROC=$(nproc); fi
EXECUTORCH_COMMON_CMAKE_ARGS=" \
-DCMAKE_INSTALL_PREFIX=${BUILD_DIR} \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
-DEXECUTORCH_ENABLE_LOGGING=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
Expand Down
8 changes: 7 additions & 1 deletion .ci/scripts/test_model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,13 @@ elif [[ "${BACKEND}" == "coreml" ]]; then
fi
elif [[ "${BACKEND}" == "xnnpack" ]]; then
echo "Testing ${MODEL_NAME} with xnnpack..."
test_model_with_xnnpack true true
WITH_QUANTIZATION=true
WITH_DELEGATION=true
if [[ "$MODEL_NAME" == "mobilebert" ]]; then
# TODO(T197452682)
WITH_QUANTIZATION=false
fi
test_model_with_xnnpack "${WITH_QUANTIZATION}" "${WITH_DELEGATION}"
if [[ $? -eq 0 ]]; then
prepare_artifacts_upload
fi
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/android-perf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ jobs:
upload-models:
needs: export-models
runs-on: linux.2xlarge
if: always() # Continue this job regardless of previous job outcome
steps:
- name: Download the models from GitHub
uses: actions/download-artifact@v3
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/apple-perf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ jobs:
# Test llama2
if [[ ${{ matrix.delegate }} == "xnnpack" ]]; then
DELEGATE_CONFIG="xnnpack+custom+qe"
elif [[ ${{ matrix.delegate }} == "coreml" ]]; then
DELEGATE_CONFIG="coreml"
fi
PYTHON_EXECUTABLE=python ${CONDA_RUN} --no-capture-output \
bash .ci/scripts/test_llama.sh "${{ matrix.model }}" "${BUILD_MODE}" "${DTYPE}" "${DELEGATE_CONFIG}" "${ARTIFACTS_DIR_NAME}"
Expand All @@ -177,6 +179,7 @@ jobs:
upload-models:
needs: export-models
runs-on: linux.2xlarge
if: always() # Continue this job regardless of previous job outcome
steps:
- name: Download the models from GitHub
uses: actions/download-artifact@v3
Expand Down
4 changes: 4 additions & 0 deletions .lintrunner.toml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ exclude_patterns = [
# NB: Objective-C is not supported
'examples/apple/**',
'examples/demo-apps/apple_ios/**',
# File contains @generated
'extension/llm/custom_ops/spinquant/fast_hadamard_transform_special.h',
]
command = [
'python',
Expand Down Expand Up @@ -177,6 +179,8 @@ exclude_patterns = [
'**/*.bat',
'**/*.jpg',
'**/*.jar',
# File contains @generated
'extension/llm/custom_ops/spinquant/fast_hadamard_transform_special.h',
]
command = [
'python',
Expand Down
4 changes: 1 addition & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,7 @@ for detailed advice.

#### C++ language version

**C++11.**

NOTE: The code does not yet fully conform to this, and some files require C++17.
**C++17.**

Rationale: This is a compromise between being compatible with older, proprietary
toolchains, and having access to relatively modern C++ features.
Expand Down
61 changes: 58 additions & 3 deletions backends/apple/coreml/compiler/coreml_preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# CoreML backend for delegating a EdgeProgram to CoreML.

import json
import logging

import shutil
import uuid
Expand All @@ -14,6 +15,7 @@
from typing import Any, Dict, final, List, Optional, Tuple

import coremltools as ct
import coremltools.optimize as cto
import executorchcoreml

from executorch.exir.backend.backend_details import (
Expand All @@ -23,12 +25,16 @@
)
from executorch.exir.backend.compile_spec_schema import CompileSpec

logger = logging.getLogger(__name__)
logger.setLevel(logging.WARNING)


class COMPILE_SPEC_KEYS(Enum):
COMPUTE_UNITS = "compute_units"
MODEL_TYPE = "model_type"
MIN_DEPLOYMENT_TARGET = "min_deployment_target"
MODEL_COMPUTE_PRECISION = "model_compute_precision"
OP_LINEAR_QUANTIZER_CONFIG = "op_linear_quantizer_config"


class MODEL_PATHS(Enum):
Expand Down Expand Up @@ -169,12 +175,44 @@ def generate_compute_unit_compile_spec(
compute_unit.name.lower().encode("utf-8"),
)

@staticmethod
def generate_op_linear_quantizer_config_compile_spec(
op_linear_quantizer_config: Dict,
) -> CompileSpec:
"""
Returns the compile spec representing the model post conversion quantization,
which is a dict that will construct cto.coreml.OpLinearQuantizerConfig
"""
str_representation = json.dumps(op_linear_quantizer_config)
byte_representation = str_representation.encode("utf-8")
return CompileSpec(
COMPILE_SPEC_KEYS.OP_LINEAR_QUANTIZER_CONFIG.value,
byte_representation,
)

@staticmethod
def op_linear_quantizer_config_from_compile_specs(
compile_specs: List[CompileSpec],
) -> cto.coreml.OpLinearQuantizerConfig:
"""
Returns the model's post conversion quantization by parsing the list of compile specs.
"""
for compile_spec in compile_specs:
if compile_spec.key == COMPILE_SPEC_KEYS.OP_LINEAR_QUANTIZER_CONFIG.value:
config_dict_str = compile_spec.value.decode("utf-8")
config_dict = json.loads(config_dict_str)
config = cto.coreml.OpLinearQuantizerConfig._from_dict(config_dict)
return config

return None

@staticmethod
def generate_compile_specs(
compute_unit: ct.ComputeUnit = ct.ComputeUnit.ALL,
minimum_deployment_target: ct.target = ct.target.iOS15,
compute_precision: ct.precision = ct.precision.FLOAT16,
model_type: MODEL_TYPE = MODEL_TYPE.MODEL,
op_linear_quantizer_config: Optional[Dict] = None,
) -> List[CompileSpec]:
"""
Returns the list of compile specs that's used by CoreMLBackend to lower the module.
Expand All @@ -192,6 +230,12 @@ def generate_compile_specs(
CoreMLBackend.generate_compute_precision_compile_spec(compute_precision)
)
compile_specs.append(CoreMLBackend.generate_model_type_compile_spec(model_type))
if op_linear_quantizer_config is not None:
compile_specs.append(
CoreMLBackend.generate_op_linear_quantizer_config_compile_spec(
op_linear_quantizer_config
)
)

return compile_specs

Expand Down Expand Up @@ -368,18 +412,18 @@ def preprocess(
compile_specs,
)
)

model_compute_precision: ct.precision = (
CoreMLBackend.model_compute_precision_from_compile_specs(compile_specs)
)

minimum_deployment_target: ct.target = (
CoreMLBackend.min_deployment_target_from_compile_specs(compile_specs)
)

compute_units: ct.ComputeUnit = CoreMLBackend.compute_unit_from_compile_specs(
compile_specs
)
op_linear_quantizer_config = (
CoreMLBackend.op_linear_quantizer_config_from_compile_specs(compile_specs)
)

mlmodel = ct.convert(
model=edge_program,
Expand All @@ -392,4 +436,15 @@ def preprocess(
compute_units=compute_units,
)

if op_linear_quantizer_config is not None:
logger.warning(
"Core ML Backend op_linear_quantizer_config API is experimental"
)
config = cto.coreml.OptimizationConfig(
global_config=op_linear_quantizer_config,
# skip embedding
op_type_configs={"gather": None},
)
mlmodel = cto.coreml.linear_quantize_weights(mlmodel, config=config)

return CoreMLBackend.preprocess_model(mlmodel, model_type=model_type)
13 changes: 12 additions & 1 deletion backends/apple/coreml/partition/coreml_partitioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
Partitioner,
PartitionResult,
)
from executorch.exir.backend.utils import tag_constant_data
from executorch.exir.backend.utils import tag_constant_data, tag_mutated_buffer
from torch.export.exported_program import ExportedProgram
from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner
from torch.fx.passes.operator_support import OperatorSupportBase
Expand Down Expand Up @@ -61,6 +61,7 @@ def __init__(
self,
skip_ops_for_coreml_delegation: Optional[List[str]] = None,
compile_specs: Optional[List[CompileSpec]] = None,
take_over_mutable_buffer: Optional[bool] = True,
) -> None:
if skip_ops_for_coreml_delegation is None:
skip_ops_for_coreml_delegation = []
Expand All @@ -69,6 +70,7 @@ def __init__(
backend_id=CoreMLBackend.__name__,
compile_specs=compile_specs if compile_specs is not None else [],
)
self.take_over_mutable_buffer = take_over_mutable_buffer

def partition(self, exported_program: ExportedProgram) -> PartitionResult:
# Run the CapabilityBasedPartitioner to return the largest possible
Expand All @@ -89,6 +91,15 @@ def partition(self, exported_program: ExportedProgram) -> PartitionResult:
partition_tags[tag] = self.delegation_spec

tag_constant_data(exported_program)
if self.take_over_mutable_buffer:
logger.info(
"Core ML partitioner will take over torch mutable buffer as Core ML state, "
"so if your model contains mutable buffer, "
"then you will need MacOS15+/iOS18+ to execute. "
"If you want your mutable buffer model to be compatible with older OS, "
"then please set `take_over_mutable_buffer=False`"
)
tag_mutated_buffer(exported_program)

return PartitionResult(
tagged_exported_program=exported_program, partition_tags=partition_tags
Expand Down
7 changes: 6 additions & 1 deletion backends/apple/coreml/scripts/install_requirements.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ rm -rf "$COREML_DIR_PATH/third-party"
mkdir "$COREML_DIR_PATH/third-party"

echo "${green}ExecuTorch: Cloning coremltools."
git clone --depth 1 --branch 8.0b1 "https://github.com/apple/coremltools.git" $COREMLTOOLS_DIR_PATH
git clone --depth 1 --branch 8.0b2 "https://github.com/apple/coremltools.git" $COREMLTOOLS_DIR_PATH
cd $COREMLTOOLS_DIR_PATH

STATUS=$?
Expand All @@ -47,6 +47,11 @@ cmake --build "$COREMLTOOLS_DIR_PATH/build" --parallel

echo "${green}ExecuTorch: Installing coremltools."
pip install "$COREMLTOOLS_DIR_PATH"
# CoreMLTools have started supporting numpy 2.0,
# but ExecuTorch example model test env is still using older transformers,
# so for now we will need to downgrade numpy to 1.x
# TODO: Remove this numpy downgrade once later transformers starts to be used
pip install numpy==1.26.4
STATUS=$?
if [ $STATUS -ne 0 ]; then
echo "${red}ExecuTorch: Failed to install coremltools."
Expand Down
49 changes: 49 additions & 0 deletions backends/apple/coreml/test/test_coreml_partitioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@

import unittest

import coremltools as ct

import executorch.exir

import torch
import torchvision

from executorch.backends.apple.coreml.compiler import CoreMLBackend
from executorch.backends.apple.coreml.partition import CoreMLPartitioner


Expand Down Expand Up @@ -86,8 +89,54 @@ def test_vit_skip_conv(self):
if node.op == "call_function"
] == total

def test_buffer(self):
embedding_dim = 3
max_seq_len = 2

class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.register_buffer(
"cache",
torch.zeros((max_seq_len, embedding_dim), dtype=torch.float32),
)

def forward(self, q, k_val, input_pos):
q_T = q.transpose(0, 1)
k = torch.ops.aten.index_put_(self.cache, [input_pos, None], k_val)
attn = k.mm(q_T)
return attn

model = Model()
model.eval()

q = torch.randn((1, embedding_dim))
k_val = torch.randn((1, embedding_dim))
input_pos = torch.tensor([0])
example_inputs = (q, k_val, input_pos)
exir_program_aten = torch.export.export(model, example_inputs)

compile_specs = CoreMLBackend.generate_compile_specs(
minimum_deployment_target=ct.target.iOS18
)
partitioner = CoreMLPartitioner(compile_specs=compile_specs)
edge_program_manager = executorch.exir.to_edge(
exir_program_aten, compile_config=self.edge_compile_config
)
delegated_program_manager = edge_program_manager.to_backend(partitioner)

assert [
node.target.__name__
for node in delegated_program_manager.exported_program().graph.nodes
if node.op == "call_function"
] == [
"executorch_call_delegate",
"getitem",
]


if __name__ == "__main__":
test_runner = TestCoreMLPartitioner()
test_runner.test_add_sub_skip_mm()
test_runner.test_vit_skip_conv()
test_runner.test_buffer()
Loading