Skip to content

Commit e425abb

Browse files
authored
docs: update generic benchmark tool documentation (#3021)
Add instructions for using the tool with compressed models, including profiling timing for decompression and alternate memory regions. Update the tested targets list to include additional Xtensa architectures. Provide example build and run commands for compressed models with alternate decompression memory. Correct typos and improve clarity in build instructions and example outputs. Update compiler flags and example output to reflect recent changes. BUG=part of #2636
1 parent 478bb78 commit e425abb

File tree

1 file changed

+107
-45
lines changed
  • tensorflow/lite/micro/tools/benchmarking

1 file changed

+107
-45
lines changed

tensorflow/lite/micro/tools/benchmarking/README.md

Lines changed: 107 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,73 @@
1-
# Generic Benchmarking Tool build/run instructions
1+
# Generic Benchmarking Tool
22
This tool can be used to benchmark any TfLite format model. The tool can be
33
compiled in one of two ways:
44
1. Such that it takes command line arguments, allowing the path to the model
55
file to be specified as a program argument
66
2. With a model compiled into the tool, allowing use in any simulator or on
77
any hardware platform
88

9-
Building the tool with the model compiled in uses two additional Makefile
9+
All tool output is prefaced with metadata. The metadata consists of compiler
10+
version and flags, and the target information supplied on the `make` command
11+
line. For some targets, version information for external libraries used with
12+
optimized kernels is available.
13+
14+
If the model is compiled into the tool, additional model analysis information
15+
is added to the metadata. This includes data usage within the model, each model
16+
subgraph and operation in inference execution order, and information on all
17+
tensors in the model.
18+
19+
The tool will output a CRC32 of all input tensors, followed by the profiling
20+
times for the pre-inference phase of the MicroInterpreter. Next is the output
21+
of the inference profiling times for each operator, and a summary total for
22+
all inference operations. Finally a CRC32 of all output tensors and the
23+
MicroInterpreter arena memory usage are output.
24+
25+
# Generic Benchmarking Tool build/run instructions
26+
Building the tool with the model compiled in uses two additional `make`
1027
variables:
11-
* `GENERIC_BENCHMARK_MODEL_PATH`: the path to the TfLite format model file. This
12-
can be a relative or absolute path. This variable is required.
28+
* `GENERIC_BENCHMARK_MODEL_PATH`: the path to the TfLite format model file.
29+
The model path can be an abolute path, or relative to your local TFLM repository.
30+
This variable is required.
1331
* `GENERIC_BENCHMARK_ARENA_SIZE`: the size of the TFLM interpreter arena, in bytes.
1432
This variable is optional.
1533

16-
## Tested, working targets
34+
## Tested targets
1735
* x86
1836
* cortex_m_qemu (no timing data)
19-
* Xtensa (p6, hifi3)
37+
* Xtensa (p6, hifi3, hifi5)
2038
* cortex_m_corstone_300
2139

22-
## Tested, non-working targets
23-
* none currently
40+
## Use with compressed models
41+
When the tool is used with compressed models, additional profiling timing will
42+
be output. This will consist of profiling timing for each tensor decompressed
43+
during inference,
44+
and a summary total. While this profiling timing is output separately, the
45+
timing for decompression is also included in the normal inference profiling
46+
timing and summary total.
47+
48+
To use the tool with a compressed model, the `make` variables must include:
49+
```
50+
USE_TFLM_COMPRESSION=1
51+
```
52+
53+
The tensor decompression operation can occur with an alternate destination
54+
memory region. This allows specialized memory to be used as the decompressed
55+
data destination. The tool supports a single alternate decompression region.
56+
Use the following `make` variables to specify an alternate decompression region:
57+
* `GENERIC_BENCHMARK_ALT_MEM_ATTR`: a C++ attribute specifying the alternate
58+
memory as mapped through a linker script.
59+
* `GENERIC_BENCHMARK_ALT_MEM_SIZE`: the alternate memory region size in bytes.
60+
61+
Both `make` variables are required (along with `USE_TFLM_COMPRESSION=1`) for the
62+
tool to use the alternate decompression region.
63+
64+
An example build and run command line for Xtensa with alternate decompression memory:
65+
```
66+
make -f tensorflow/lite/micro/tools/make/Makefile BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=compressed.tflite TARGET=xtensa TARGET_ARCH=hifi3 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=HIFI_190304_swupgrade USE_TFLM_COMPRESSION=1 GENERIC_BENCHMARK_ALT_MEM_ATTR='__attribute__\(\(section\(\".specialized_memory_region\"\)\)\)' GENERIC_BENCHMARK_ALT_MEM_SIZE=`expr 64 \* 1024`
67+
```
68+
69+
For more information on model compression, please see the
70+
[compression document](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/docs/compression.md).
2471

2572
## Build and run for x86
2673
Build for command line arguments:
@@ -43,6 +90,11 @@ Build and run with model compiled into tool:
4390
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=vision_p6 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=P6_200528 BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=/tmp/keyword_scrambled.tflite GENERIC_BENCHMARK_ARENA_SIZE=`expr 50 \* 1024`
4491
```
4592

93+
Build and run with a compressed model compiled into the tool:
94+
```
95+
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=vision_p6 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=P6_200528 BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=/tmp/keyword_scrambled.tflite GENERIC_BENCHMARK_ARENA_SIZE=`expr 50 \* 1024` USE_TFLM_COMPRESSION=1
96+
```
97+
4698
## Build and run for Cortex-M using Corstone 300 simulator
4799
Build and run with model compiled into tool:
48100
```
@@ -51,13 +103,13 @@ make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_corstone_300
51103

52104
## Build and run using Bazel
53105

54-
This is only for the x86 command line argument build, and does not contain meta-data:
106+
This is only for the x86 command line argument build, and does not contain metadata:
55107
```
56108
bazel build tensorflow/lite/micro/tools/benchmarking:tflm_benchmark
57109
bazel-bin/tensorflow/lite/micro/tools/benchmarking/tflm_benchmark tensorflow/lite/micro/models/person_detect.tflite
58110
```
59111

60-
## Example output with meta-data and built-in model layer information
112+
## Example output with metadata and built-in model layer information
61113

62114
This sample output is for Cortex-M using Corstone 300:
63115
```
@@ -66,14 +118,14 @@ Configured arena size = 153600
66118
--------------------
67119
Compiled on:
68120
69-
Fri May 17 03:36:59 PM PDT 2024
121+
Tue Dec 17 12:01:44 PM PST 2024
70122
--------------------
71-
Git SHA: a4390a1d73edf5a8d3affa1da60e1eba88e0cb13
123+
Git SHA: aa47932ea602f72705cefe3fb9fc7fa2a651e205
72124
73125
Git status:
74126
75-
On branch main
76-
Your branch is up to date with 'origin/main'.
127+
On branch your-test-branch
128+
77129
--------------------
78130
C compiler: tensorflow/lite/micro/tools/make/downloads/gcc_embedded/bin/arm-none-eabi-gcc
79131
Version:
@@ -85,11 +137,11 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
85137
86138
Flags:
87139
88-
-Wimplicit-function-declaration -std=c11 -Werror -fno-unwind-tables -ffunction-sections
89-
-fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON
90-
-DCMSIS_NN -DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto
91-
-DTF_LITE_MCU_DEBUG_LOG -mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian
92-
-fomit-frame-pointer -MD -DARMCM4
140+
-Wimplicit-function-declaration -std=c17 -Werror -fno-unwind-tables
141+
-fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0
142+
-DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -DCMSIS_NN
143+
-DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto -DTF_LITE_MCU_DEBUG_LOG
144+
-mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian -fomit-frame-pointer -MD -DARMCM4
93145
94146
C++ compiler: tensorflow/lite/micro/tools/make/downloads/gcc_embedded/bin/arm-none-eabi-g++
95147
Version:
@@ -101,10 +153,10 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
101153
102154
Flags:
103155
104-
-std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Wnon-virtual-dtor -Werror
105-
-fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0
106-
-DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion
107-
-Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra
156+
-std=c++17 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Wnon-virtual-dtor -Werror
157+
-fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections
158+
-fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare
159+
-Wdouble-promotion -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra
108160
-Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DCMSIS_NN
109161
-DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto -DTF_LITE_MCU_DEBUG_LOG
110162
-mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian -fomit-frame-pointer -MD
@@ -125,12 +177,12 @@ BUILD_TYPE=default
125177
--------------------
126178
NN library download URLs:
127179
128-
http://github.com/ARM-software/CMSIS-NN/archive/01dee38e6d6bfbbf202f0cd425bbea1731747d51.z
180+
http://github.com/ARM-software/CMSIS-NN/archive/22080c68d040c98139e6cb1549473e3149735f4d.z
129181
ip
130182
131183
NN library MD5 checksums:
132184
133-
f20be93ededf42bb704c19f699a24313
185+
32aa69692541060a76b18bd5d2d98956
134186
--------------------
135187
Model SHA1:
136188
@@ -339,23 +391,31 @@ RO 512 bytes, buffer: 17, data:[-2063, 10755, -12037, -6417, 2147, ...]
339391
You can find more details from
340392
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs
341393
--------------------
342-
TfliteGetModel took 4 ticks (0 ms).
394+
"Unique Tag","Total ticks across all events with that tag."
395+
tflite::GetModel, 4
396+
tflite::CreateOpResolver, 8090
397+
tflite::RecordingMicroAllocator::Create, 40
398+
tflite::MicroInterpreter instantiation, 59
399+
tflite::MicroInterpreter::AllocateTensors, 363531
400+
"total number of ticks", 371724
401+
402+
Input CRC32: 0x14F6A510
343403
344404
DEPTHWISE_CONV_2D took 224622 ticks (8 ms).
345405
DEPTHWISE_CONV_2D took 175917 ticks (7 ms).
346-
CONV_2D took 249560 ticks (9 ms).
406+
CONV_2D took 249561 ticks (9 ms).
347407
DEPTHWISE_CONV_2D took 84958 ticks (3 ms).
348408
CONV_2D took 145817 ticks (5 ms).
349-
DEPTHWISE_CONV_2D took 164915 ticks (6 ms).
409+
DEPTHWISE_CONV_2D took 164914 ticks (6 ms).
350410
CONV_2D took 197283 ticks (7 ms).
351411
DEPTHWISE_CONV_2D took 41304 ticks (1 ms).
352-
CONV_2D took 99472 ticks (3 ms).
412+
CONV_2D took 99473 ticks (3 ms).
353413
DEPTHWISE_CONV_2D took 79969 ticks (3 ms).
354414
CONV_2D took 151505 ticks (6 ms).
355415
DEPTHWISE_CONV_2D took 20053 ticks (0 ms).
356416
CONV_2D took 78521 ticks (3 ms).
357417
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
358-
CONV_2D took 132862 ticks (5 ms).
418+
CONV_2D took 132863 ticks (5 ms).
359419
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
360420
CONV_2D took 132865 ticks (5 ms).
361421
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
@@ -365,21 +425,23 @@ CONV_2D took 132851 ticks (5 ms).
365425
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
366426
CONV_2D took 132853 ticks (5 ms).
367427
DEPTHWISE_CONV_2D took 9585 ticks (0 ms).
368-
CONV_2D took 78470 ticks (3 ms).
369-
DEPTHWISE_CONV_2D took 17473 ticks (0 ms).
428+
CONV_2D took 78471 ticks (3 ms).
429+
DEPTHWISE_CONV_2D took 17474 ticks (0 ms).
370430
CONV_2D took 143615 ticks (5 ms).
371431
AVERAGE_POOL_2D took 2229 ticks (0 ms).
372432
CONV_2D took 386 ticks (0 ms).
373433
RESHAPE took 28 ticks (0 ms).
374434
SOFTMAX took 163 ticks (0 ms).
375435
376436
"Unique Tag","Total ticks across all events with that tag."
377-
DEPTHWISE_CONV_2D, 1009431
378-
CONV_2D, 1808919
379-
AVERAGE_POOL_2D, 2229
380-
RESHAPE, 28
381-
SOFTMAX, 163
382-
"total number of ticks", 2820770
437+
DEPTHWISE_CONV_2D, 1009435
438+
CONV_2D, 1817013
439+
AVERAGE_POOL_2D, 2269
440+
RESHAPE, 87
441+
SOFTMAX, 363694
442+
"total number of ticks", 2820774
443+
444+
Output CRC32: 0xA4A6A6BE
383445
384446
[[ Table ]]: Arena
385447
Arena Bytes % Arena
@@ -403,12 +465,12 @@ Info: /OSCI/SystemC: Simulation stopped by user.
403465
[warning ][main@0][01 ns] Simulation stopped by user
404466
405467
--- FVP_MPS3_Corstone_SSE_300 statistics: -------------------------------------
406-
Simulated time : 2.879993s
407-
User time : 2.027100s
408-
System time : 0.135914s
409-
Wall time : 2.663214s
410-
Performance index : 1.08
411-
cpu0 : 27.03 MIPS ( 71999848 Inst)
412-
Memory highwater mark : 0x11919000 bytes ( 0.275 GB )
468+
Simulated time : 2.958458s
469+
User time : 1.768731s
470+
System time : 0.227094s
471+
Wall time : 2.022361s
472+
Performance index : 1.46
473+
cpu0 : 36.57 MIPS ( 73961463 Inst)
474+
Memory highwater mark : 0x11935000 bytes ( 0.275 GB )
413475
-------------------------------------------------------------------------------
414476
```

0 commit comments

Comments
 (0)