Skip to content

Commit bd890dd

Browse files
LLM pipeline implementation (#1040)
* WIP LLM pipeline and dataset implementation * fixed issues preventing libraries from compiling, runtime errors not included * upgrade TensorFlow to 2.18.0 * upgraded llm pipeline to use TFLite C++ api + small bug fixes * basic flutter app support for icon and dataset * added linux x86_64 config for internal testing * updated bazel config to use SSE/MMX instructions * fixed incorrect answer format and compression * got pipeline and dataset to produce proper results + fixed issues where pipeline cannot handle an input size larger than the max prefill size * added support for loadgen's token based performance measurement + implemented performance benchmark for LLM pipeline * fixed bugs in inference process, first token function now handles only input and issue_query only handles output tokens * optimized tensor retrieval for inference + added check for input size vs KV cache size * clang-format * mmlu dataset cleanup and formatting * slight code cleanup * fixed issue with genai ops import * code/config cleanup * add zero-shot option to MMLU constructor * use function to detect which token is answer letter * quick initial implementation of first token callback * moved tokenizer to dataset side (possibly needs cleanup) * added files needed for MMLU utils * clang-format * continued formatting * code cleanup / issue_query signature update to vendor backends * signature update for QTI/Samsung backends * format * formatted clang and bazel using docker based formatter * reverted issue_query change for samsung + bazel formatting * fix for MSVC C7555 error * rough IFEval implementation using llm_instruction benchmark * disabled XNNPACK AVX-VNNI for windows due to C2440 error * moved accuracy calculation away from ProcessOutput, ifeval accuracy is calculated per instruction not per sample * fixed issue with app not finding model/tokenizer * properly format 0-shot prompts + allow for file/directory for model path * formatting * potential fix for windows C2440 * fix for aligned free for windows * potential fix for IOS / windows CI issues * ifeval check cleanup and bugfixes * formatting * all possible configs for removing eigen exceptions * removed objc opts * use token latencies in app * enable exceptions for IOS * disable FP16 AVX for x86 simulator * attempt to enable exceptions for eigen * 2nd attempt at enabling exceptions for IOS eigen * fixed fexceptions syntax * kitchen-sink approach to enable exceptions for IOS * attempt to undefine EIGEN_EXCEPTIONS for IOS * add global ovverride to disable eigen exceptions * use ARM based macos for IOS build * fixed and re-enabled eigen patch * further fix for eigen patch * even more patch fixing * fixed typo in eigen patch * fixed incorrect count in eigen patch * force arm64 ios build * use ARM64 simulator for IOS build * use arm64 simulator for tflite on IOS * set ios cpu argument for cpuinfo * remvoed ios_sim prefix * attempt at using arm64 simulator for IOS instead of x86 * attempt to force flutter to build ITs for arm64 only * force arm64 for pods * disable f16 instead of building for arm64 * more bazel config lines to disable fp16 * removed unavailable compiler flags * provide patched fp16 lib with math workaround * typo * added patch arg * created a math workaround patch compatible with fp16 version used by xnnpack * datasets now provide token limits as inputs to pipeline --------- Co-authored-by: Farook Al-Sammarraie <[email protected]>
1 parent 0fd0922 commit bd890dd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+6368
-258
lines changed

.bazelrc

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ build --spawn_strategy=standalone
1010
# This flag is required by tensorflow
1111
common --experimental_repo_remote_exec
1212

13+
# Without these, tensorflow complains about lack of CUDA library.
14+
common --repo_env=TF_NEED_CUDA=0
15+
common --repo_env=TF_NEED_ROCM=0
16+
1317
# Default options should come above this line.
1418

1519
# Configure logs
@@ -18,13 +22,16 @@ build:verbose_logs --output_filter=
1822

1923
# Suppress C++ compiler warnings, otherwise build logs become 10s of MBs.
2024
build:android --copt=-w
25+
build:linux --copt=-w
2126
build:ios --copt=-w
2227
build:windows --copt=/W0
2328

2429
# Build in C++ 17 mode.
2530
build --cxxopt=-std=c++17
2631
build:android --cxxopt=-std=c++17
2732
build:android --host_cxxopt=-std=c++17
33+
build:linux --cxxopt=-std=c++17
34+
build:linux --host_cxxopt=-std=c++17
2835
build:ios --cxxopt=-std=c++17
2936
build:ios --host_cxxopt=-std=c++17
3037
build:ios --cxxopt=-xobjective-c++
@@ -41,10 +48,39 @@ build:android_x86_64 --config=android
4148
build:android_x86_64 --cpu=x86_64
4249
build:android_x86_64 --fat_apk_cpu=x86_64
4350

51+
52+
build:android_x86_64 --define=xnn_enable_avx512fp16=false
53+
build:android_x86_64 --define=xnn_enable_avxvnniint8=false
54+
55+
# Linux configs
56+
build:linux_x86_64 --config=linux
57+
build:linux_x86_64 --cpu=k8
58+
# Not required, but enables the proper SSE/MMX instructions per CPU
59+
build:linux_x86_64 --copt=-march=native
60+
61+
# These may be neccessary depending on CPU instruction support
62+
#build:linux_x86_64 --define=xnn_enable_avx=false
63+
#build:linux_x86_64 --define=xnn_enable_avx2=false
64+
#build:linux_x86_64 --define=xnn_enable_avx512=false
65+
build:linux_x86_64 --define=xnn_enable_avx512fp16=false
66+
#build:linux_x86_64 --define=xnn_enable_avxvnni=false
67+
build:linux_x86_64 --define=xnn_enable_avxvnniint8=false
68+
#build:linux_x86_64 --define=xnn_enable_vnni=false
69+
70+
71+
# Optional, enable for debugging or compilation errors
72+
#build:linux_x86_64 --action_env=CC=gcc
73+
#build:linux_x86_64 --action_env=CXX=g++
74+
#build:linux_x86_64 --strip=never
75+
#build:linux_x86_64 --copt=-fno-omit-frame-pointer
76+
#build:linux_x86_64 --linkopt=-fno-omit-frame-pointer
77+
4478
# iOS configs
4579
build:ios --apple_platform_type=ios
4680
build:ios --copt=-Wno-c++11-narrowing
4781
build:ios --cxxopt=-fobjc-arc
82+
build:ios --copt=-DEIGEN_NOEXCEPTIONS_OVERRIDE
83+
build:ios --cxxopt=-DEIGEN_NOEXCEPTIONS_OVERRIDE
4884

4985
# Windows configs
5086

@@ -73,6 +109,10 @@ build:windows --host_linkopt=/OPT:REF
73109
build:windows --linkopt=/OPT:ICF
74110
build:windows --host_linkopt=/OPT:ICF
75111

112+
# MSVC does not support XNNPACK AVXVNNI instructions (causes C2440 error).
113+
build:windows --define=xnn_enable_avxvnni=false
114+
build:windows --define=xnn_enable_avxvnniint8=false
115+
76116
# Address sanitizer
77117
build:asan --strip=never
78118
build:asan --copt -fsanitize=address

WORKSPACE

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,18 @@ http_archive(
4848

4949
load("@tf_patch_finder//:patch_win_arm64.bzl", "PATCH_FILE")
5050

51+
http_archive(
52+
name = "FP16",
53+
build_file = "@//third_party:FP16.BUILD",
54+
patch_args = ["-p1"],
55+
patches = ["//patches:fp16_math_workaround.patch"],
56+
sha256 = "e66e65515fa09927b348d3d584c68be4215cfe664100d01c9dbc7655a5716d70",
57+
strip_prefix = "FP16-0a92994d729ff76a58f692d3028ca1b64b145d91",
58+
urls = [
59+
"https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip",
60+
],
61+
)
62+
5163
http_archive(
5264
name = "org_tensorflow",
5365
patch_args = ["-p1"],
@@ -59,13 +71,43 @@ http_archive(
5971
# Fix tensorflow not being able to read image files on Windows
6072
"//:flutter/third_party/tensorflow-fix-file-opening-mode-for-Windows.patch",
6173
"//:flutter/third_party/tf-eigen.patch",
62-
# NDK 25 support
63-
"//patches:ndk_25_r14.diff",
6474
] + PATCH_FILE,
65-
sha256 = "ce357fd0728f0d1b0831d1653f475591662ec5bca736a94ff789e6b1944df19f",
66-
strip_prefix = "tensorflow-2.14.0",
75+
sha256 = "d7876f4bb0235cac60eb6316392a7c48676729860da1ab659fb440379ad5186d",
76+
strip_prefix = "tensorflow-2.18.0",
77+
urls = [
78+
"https://github.com/tensorflow/tensorflow/archive/v2.18.0.tar.gz",
79+
],
80+
)
81+
82+
load("@org_tensorflow//third_party/gpus:cuda_configure.bzl", "cuda_configure")
83+
84+
cuda_configure(name = "local_config_cuda")
85+
86+
load("@org_tensorflow//third_party/gpus:rocm_configure.bzl", "rocm_configure")
87+
88+
rocm_configure(name = "local_config_rocm")
89+
90+
http_archive(
91+
name = "com_google_sentencepiece",
92+
build_file = "@//patches:sentencepiece.BUILD",
93+
patch_args = ["-p1"],
94+
patches = ["@//patches:com_google_sentencepiece.diff"],
95+
sha256 = "8409b0126ebd62b256c685d5757150cf7fcb2b92a2f2b98efb3f38fc36719754",
96+
strip_prefix = "sentencepiece-0.1.96",
97+
urls = [
98+
"https://github.com/google/sentencepiece/archive/refs/tags/v0.1.96.zip",
99+
],
100+
)
101+
102+
http_archive(
103+
name = "darts_clone",
104+
build_file = "@//patches:darts_clone.BUILD",
105+
patch_args = ["-p0"],
106+
patches = ["//patches:darts_no_exceptions.diff"],
107+
sha256 = "c97f55d05c98da6fcaf7f9ecc6a6dc6bc5b18b8564465f77abff8879d446491c",
108+
strip_prefix = "darts-clone-e40ce4627526985a7767444b6ed6893ab6ff8983",
67109
urls = [
68-
"https://github.com/tensorflow/tensorflow/archive/v2.14.0.tar.gz",
110+
"https://github.com/s-yata/darts-clone/archive/e40ce4627526985a7767444b6ed6893ab6ff8983.zip",
69111
],
70112
)
71113

flutter/android/android-docker.mk

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ user_id=$(shell id -u)
1919
.PHONY: flutter/android/docker/image
2020
flutter/android/docker/image: output/docker/mlperf_mobile_flutter_android_${user_id}.stamp
2121
output/docker/mlperf_mobile_flutter_android_${user_id}.stamp: flutter/android/docker/Dockerfile
22-
docker image build -t ${DOCKER_IMAGE_TAG} flutter/android/docker
22+
DOCKER_BUILDKIT=1 docker buildx build --tag ${DOCKER_IMAGE_TAG} flutter/android/docker
2323
mkdir -p output/docker
2424
touch $@
2525

@@ -68,4 +68,4 @@ docker/flutter/android/release: flutter/check-release-env flutter/android/docker
6868
docker/flutter/clean: flutter/check-release-env
6969
MSYS2_ARG_CONV_EXCL="*" docker run \
7070
${flutter_common_docker_flags} \
71-
make flutter/clean
71+
make flutter/clean
Lines changed: 98 additions & 0 deletions
Loading

flutter/assets/icons/ic_task_llm_white.svg

Lines changed: 101 additions & 0 deletions
Loading

flutter/assets/tasks.pbtxt

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,63 @@ task {
336336
}
337337
}
338338

339+
task {
340+
id: "llm"
341+
name: "LLM"
342+
max_throughput: 2000
343+
max_accuracy: 1.0
344+
scenario: "SingleStream"
345+
runs {
346+
normal {
347+
min_query_count: 100
348+
min_duration: 60
349+
max_duration: 300
350+
}
351+
quick {
352+
min_query_count: 10
353+
min_duration: 10
354+
max_duration: 40
355+
}
356+
rapid {
357+
min_query_count: 6
358+
min_duration: 6
359+
max_duration: 60
360+
}
361+
}
362+
datasets {
363+
type: MMLU
364+
full {
365+
name: "TinyMMLU prompt set for LLM"
366+
input_path: "local:///mlperf_datasets/tinymmlu/data.tfrecord"
367+
input_checksum: "c20f9115582217af15e4d9955b41ace1"
368+
groundtruth_path: ""
369+
groundtruth_checksum: ""
370+
}
371+
lite {
372+
name: "TinyMMLU prompt set for LLM"
373+
input_path: "local:///mlperf_datasets/tinymmlu/data.tfrecord"
374+
input_checksum: "c20f9115582217af15e4d9955b41ace1"
375+
groundtruth_path: ""
376+
groundtruth_checksum: ""
377+
}
378+
tiny {
379+
name: "TinyMMLU prompt set for LLM"
380+
input_path: "local:///mlperf_datasets/tinymmlu/data.tfrecord"
381+
input_checksum: "c20f9115582217af15e4d9955b41ace1"
382+
groundtruth_path: ""
383+
groundtruth_checksum: ""
384+
}
385+
}
386+
model {
387+
id: "LLM"
388+
name: "LLM"
389+
}
390+
custom_config {
391+
id: "llm_tokenizer_path"
392+
value: "llama3_1b.spm.model"
393+
}
394+
}
395+
339396
task {
340397
id: "stable_diffusion"
341398
name: "Stable Diffusion"

flutter/cpp/backend.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,9 @@ class Backend {
4545
virtual const std::string& AcceleratorName() const = 0;
4646

4747
// Run inference for a sample. Inputs is already set by SetInputs.
48-
virtual void IssueQuery() = 0;
48+
// TODO might be good to provide the callback and context along with the
49+
// inputs if possible
50+
virtual void IssueQuery(ft_callback callback, void* context) = 0;
4951

5052
// Flush the staged queries immediately.
5153
virtual void FlushQueries() = 0;

flutter/cpp/backends/external.h

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ struct BackendFunctions {
4747
using AcceleratorNamePtr =
4848
std::add_pointer<const char*(mlperf_backend_ptr_t)>::type;
4949
using BackendDeletePtr = std::add_pointer<void(mlperf_backend_ptr_t)>::type;
50-
using IssueQueryPtr =
51-
std::add_pointer<mlperf_status_t(mlperf_backend_ptr_t)>::type;
50+
using IssueQueryPtr = std::add_pointer<mlperf_status_t(
51+
mlperf_backend_ptr_t, ft_callback, void*)>::type;
5252
using FlushQueriesPtr =
5353
std::add_pointer<mlperf_status_t(mlperf_backend_ptr_t)>::type;
5454

@@ -157,8 +157,9 @@ class ExternalBackend : public Backend {
157157
}
158158

159159
// Run inference for a sample.
160-
void IssueQuery() override {
161-
if (backend_functions_.issue_query(backend_ptr_) != MLPERF_SUCCESS) {
160+
void IssueQuery(ft_callback callback, void* context) override {
161+
if (backend_functions_.issue_query(backend_ptr_, callback, context) !=
162+
MLPERF_SUCCESS) {
162163
LOG(FATAL) << "Error while inferencing model";
163164
}
164165
}

flutter/cpp/binary/BUILD

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ cc_binary(
5454
"//flutter/cpp/datasets:ade20k",
5555
"//flutter/cpp/datasets:coco",
5656
"//flutter/cpp/datasets:coco_gen",
57+
"//flutter/cpp/datasets:ifeval",
5758
"//flutter/cpp/datasets:imagenet",
59+
"//flutter/cpp/datasets:mmlu_gen",
5860
"//flutter/cpp/datasets:snu_sr",
5961
"//flutter/cpp/datasets:squad",
6062
"//flutter/cpp/proto:mlperf_task_cc_proto",

flutter/cpp/binary/cmdline-docker.mk

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,10 @@
1717
docker/cmdline/android/release: flutter/android/docker/image
1818
MSYS2_ARG_CONV_EXCL="*" docker run \
1919
${flutter_common_docker_flags} \
20-
make cmdline/android/bins/release
20+
make cmdline/android/bins/release
21+
22+
.PHONY: docker/cmdline/linux/release
23+
docker/cmdline/linux/release: flutter/android/docker/image
24+
MSYS2_ARG_CONV_EXCL="*" docker run \
25+
${flutter_common_docker_flags} \
26+
make cmdline/linux/bins/release

0 commit comments

Comments
 (0)