1
- # Generic Benchmarking Tool build/run instructions
1
+ # Generic Benchmarking Tool
2
2
This tool can be used to benchmark any TfLite format model. The tool can be
3
3
compiled in one of two ways:
4
4
1 . Such that it takes command line arguments, allowing the path to the model
5
5
file to be specified as a program argument
6
6
2 . With a model compiled into the tool, allowing use in any simulator or on
7
7
any hardware platform
8
8
9
- Building the tool with the model compiled in uses two additional Makefile
9
+ All tool output is prefaced with metadata. The metadata consists of compiler
10
+ version and flags, and the target information supplied on the ` make ` command
11
+ line. For some targets, version information for external libraries used with
12
+ optimized kernels is available.
13
+
14
+ If the model is compiled into the tool, additional model analysis information
15
+ is added to the metadata. This includes data usage within the model, each model
16
+ subgraph and operation in inference execution order, and information on all
17
+ tensors in the model.
18
+
19
+ The tool will output a CRC32 of all input tensors, followed by the profiling
20
+ times for the pre-inference phase of the MicroInterpreter. Next is the output
21
+ of the inference profiling times for each operator, and a summary total for
22
+ all inference operations. Finally a CRC32 of all output tensors and the
23
+ MicroInterpreter arena memory usage are output.
24
+
25
+ # Generic Benchmarking Tool build/run instructions
26
+ Building the tool with the model compiled in uses two additional ` make `
10
27
variables:
11
- * ` GENERIC_BENCHMARK_MODEL_PATH ` : the path to the TfLite format model file. This
12
- can be a relative or absolute path. This variable is required.
28
+ * ` GENERIC_BENCHMARK_MODEL_PATH ` : the path to the TfLite format model file.
29
+ The model path can be an abolute path, or relative to your local TFLM repository.
30
+ This variable is required.
13
31
* ` GENERIC_BENCHMARK_ARENA_SIZE ` : the size of the TFLM interpreter arena, in bytes.
14
32
This variable is optional.
15
33
16
- ## Tested, working targets
34
+ ## Tested targets
17
35
* x86
18
36
* cortex_m_qemu (no timing data)
19
- * Xtensa (p6, hifi3)
37
+ * Xtensa (p6, hifi3, hifi5 )
20
38
* cortex_m_corstone_300
21
39
22
- ## Tested, non-working targets
23
- * none currently
40
+ ## Use with compressed models
41
+ When the tool is used with compressed models, additional profiling timing will
42
+ be output. This will consist of profiling timing for each tensor decompressed
43
+ during inference,
44
+ and a summary total. While this profiling timing is output separately, the
45
+ timing for decompression is also included in the normal inference profiling
46
+ timing and summary total.
47
+
48
+ To use the tool with a compressed model, the ` make ` variables must include:
49
+ ```
50
+ USE_TFLM_COMPRESSION=1
51
+ ```
52
+
53
+ The tensor decompression operation can occur with an alternate destination
54
+ memory region. This allows specialized memory to be used as the decompressed
55
+ data destination. The tool supports a single alternate decompression region.
56
+ Use the following ` make ` variables to specify an alternate decompression region:
57
+ * ` GENERIC_BENCHMARK_ALT_MEM_ATTR ` : a C++ attribute specifying the alternate
58
+ memory as mapped through a linker script.
59
+ * ` GENERIC_BENCHMARK_ALT_MEM_SIZE ` : the alternate memory region size in bytes.
60
+
61
+ Both ` make ` variables are required (along with ` USE_TFLM_COMPRESSION=1 ` ) for the
62
+ tool to use the alternate decompression region.
63
+
64
+ An example build and run command line for Xtensa with alternate decompression memory:
65
+ ```
66
+ make -f tensorflow/lite/micro/tools/make/Makefile BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=compressed.tflite TARGET=xtensa TARGET_ARCH=hifi3 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=HIFI_190304_swupgrade USE_TFLM_COMPRESSION=1 GENERIC_BENCHMARK_ALT_MEM_ATTR='__attribute__\(\(section\(\".specialized_memory_region\"\)\)\)' GENERIC_BENCHMARK_ALT_MEM_SIZE=`expr 64 \* 1024`
67
+ ```
68
+
69
+ For more information on model compression, please see the
70
+ [ compression document] ( https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/docs/compression.md ) .
24
71
25
72
## Build and run for x86
26
73
Build for command line arguments:
@@ -43,6 +90,11 @@ Build and run with model compiled into tool:
43
90
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=vision_p6 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=P6_200528 BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=/tmp/keyword_scrambled.tflite GENERIC_BENCHMARK_ARENA_SIZE=`expr 50 \* 1024`
44
91
```
45
92
93
+ Build and run with a compressed model compiled into the tool:
94
+ ```
95
+ make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=vision_p6 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE=P6_200528 BUILD_TYPE=default run_tflm_benchmark -j$(nproc) GENERIC_BENCHMARK_MODEL_PATH=/tmp/keyword_scrambled.tflite GENERIC_BENCHMARK_ARENA_SIZE=`expr 50 \* 1024` USE_TFLM_COMPRESSION=1
96
+ ```
97
+
46
98
## Build and run for Cortex-M using Corstone 300 simulator
47
99
Build and run with model compiled into tool:
48
100
```
@@ -51,13 +103,13 @@ make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_corstone_300
51
103
52
104
## Build and run using Bazel
53
105
54
- This is only for the x86 command line argument build, and does not contain meta-data :
106
+ This is only for the x86 command line argument build, and does not contain metadata :
55
107
```
56
108
bazel build tensorflow/lite/micro/tools/benchmarking:tflm_benchmark
57
109
bazel-bin/tensorflow/lite/micro/tools/benchmarking/tflm_benchmark tensorflow/lite/micro/models/person_detect.tflite
58
110
```
59
111
60
- ## Example output with meta-data and built-in model layer information
112
+ ## Example output with metadata and built-in model layer information
61
113
62
114
This sample output is for Cortex-M using Corstone 300:
63
115
```
@@ -66,14 +118,14 @@ Configured arena size = 153600
66
118
--------------------
67
119
Compiled on:
68
120
69
- Fri May 17 03:36:59 PM PDT 2024
121
+ Tue Dec 17 12:01:44 PM PST 2024
70
122
--------------------
71
- Git SHA: a4390a1d73edf5a8d3affa1da60e1eba88e0cb13
123
+ Git SHA: aa47932ea602f72705cefe3fb9fc7fa2a651e205
72
124
73
125
Git status:
74
126
75
- On branch main
76
- Your branch is up to date with 'origin/main'.
127
+ On branch your-test-branch
128
+
77
129
--------------------
78
130
C compiler: tensorflow/lite/micro/tools/make/downloads/gcc_embedded/bin/arm-none-eabi-gcc
79
131
Version:
@@ -85,11 +137,11 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
85
137
86
138
Flags:
87
139
88
- -Wimplicit-function-declaration -std=c11 -Werror -fno-unwind-tables -ffunction-sections
89
- -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON
90
- -DCMSIS_NN -DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto
91
- -DTF_LITE_MCU_DEBUG_LOG -mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian
92
- -fomit-frame-pointer -MD -DARMCM4
140
+ -Wimplicit-function-declaration -std=c17 -Werror -fno-unwind-tables
141
+ -fno-asynchronous-unwind-tables -ffunction-sections - fdata-sections -fmessage-length=0
142
+ -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -DCMSIS_NN
143
+ -DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto -DTF_LITE_MCU_DEBUG_LOG
144
+ -mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian - fomit-frame-pointer -MD -DARMCM4
93
145
94
146
C++ compiler: tensorflow/lite/micro/tools/make/downloads/gcc_embedded/bin/arm-none-eabi-g++
95
147
Version:
@@ -101,10 +153,10 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
101
153
102
154
Flags:
103
155
104
- -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Wnon-virtual-dtor -Werror
105
- -fno-unwind-tables -ffunction-sections -fdata -sections -fmessage-length=0
106
- -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion
107
- -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra
156
+ -std=c++17 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Wnon-virtual-dtor -Werror
157
+ -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction -sections -fdata-sections
158
+ -fmessage-length=0 - DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare
159
+ -Wdouble-promotion - Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra
108
160
-Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DCMSIS_NN
109
161
-DKERNELS_OPTIMIZED_FOR_SPEED -mcpu=cortex-m4+nofp -mfpu=auto -DTF_LITE_MCU_DEBUG_LOG
110
162
-mthumb -mfloat-abi=soft -funsigned-char -mlittle-endian -fomit-frame-pointer -MD
@@ -125,12 +177,12 @@ BUILD_TYPE=default
125
177
--------------------
126
178
NN library download URLs:
127
179
128
- http://github.com/ARM-software/CMSIS-NN/archive/01dee38e6d6bfbbf202f0cd425bbea1731747d51 .z
180
+ http://github.com/ARM-software/CMSIS-NN/archive/22080c68d040c98139e6cb1549473e3149735f4d .z
129
181
ip
130
182
131
183
NN library MD5 checksums:
132
184
133
- f20be93ededf42bb704c19f699a24313
185
+ 32aa69692541060a76b18bd5d2d98956
134
186
--------------------
135
187
Model SHA1:
136
188
@@ -339,23 +391,31 @@ RO 512 bytes, buffer: 17, data:[-2063, 10755, -12037, -6417, 2147, ...]
339
391
You can find more details from
340
392
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs
341
393
--------------------
342
- TfliteGetModel took 4 ticks (0 ms).
394
+ "Unique Tag","Total ticks across all events with that tag."
395
+ tflite::GetModel, 4
396
+ tflite::CreateOpResolver, 8090
397
+ tflite::RecordingMicroAllocator::Create, 40
398
+ tflite::MicroInterpreter instantiation, 59
399
+ tflite::MicroInterpreter::AllocateTensors, 363531
400
+ "total number of ticks", 371724
401
+
402
+ Input CRC32: 0x14F6A510
343
403
344
404
DEPTHWISE_CONV_2D took 224622 ticks (8 ms).
345
405
DEPTHWISE_CONV_2D took 175917 ticks (7 ms).
346
- CONV_2D took 249560 ticks (9 ms).
406
+ CONV_2D took 249561 ticks (9 ms).
347
407
DEPTHWISE_CONV_2D took 84958 ticks (3 ms).
348
408
CONV_2D took 145817 ticks (5 ms).
349
- DEPTHWISE_CONV_2D took 164915 ticks (6 ms).
409
+ DEPTHWISE_CONV_2D took 164914 ticks (6 ms).
350
410
CONV_2D took 197283 ticks (7 ms).
351
411
DEPTHWISE_CONV_2D took 41304 ticks (1 ms).
352
- CONV_2D took 99472 ticks (3 ms).
412
+ CONV_2D took 99473 ticks (3 ms).
353
413
DEPTHWISE_CONV_2D took 79969 ticks (3 ms).
354
414
CONV_2D took 151505 ticks (6 ms).
355
415
DEPTHWISE_CONV_2D took 20053 ticks (0 ms).
356
416
CONV_2D took 78521 ticks (3 ms).
357
417
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
358
- CONV_2D took 132862 ticks (5 ms).
418
+ CONV_2D took 132863 ticks (5 ms).
359
419
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
360
420
CONV_2D took 132865 ticks (5 ms).
361
421
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
@@ -365,21 +425,23 @@ CONV_2D took 132851 ticks (5 ms).
365
425
DEPTHWISE_CONV_2D took 38127 ticks (1 ms).
366
426
CONV_2D took 132853 ticks (5 ms).
367
427
DEPTHWISE_CONV_2D took 9585 ticks (0 ms).
368
- CONV_2D took 78470 ticks (3 ms).
369
- DEPTHWISE_CONV_2D took 17473 ticks (0 ms).
428
+ CONV_2D took 78471 ticks (3 ms).
429
+ DEPTHWISE_CONV_2D took 17474 ticks (0 ms).
370
430
CONV_2D took 143615 ticks (5 ms).
371
431
AVERAGE_POOL_2D took 2229 ticks (0 ms).
372
432
CONV_2D took 386 ticks (0 ms).
373
433
RESHAPE took 28 ticks (0 ms).
374
434
SOFTMAX took 163 ticks (0 ms).
375
435
376
436
"Unique Tag","Total ticks across all events with that tag."
377
- DEPTHWISE_CONV_2D, 1009431
378
- CONV_2D, 1808919
379
- AVERAGE_POOL_2D, 2229
380
- RESHAPE, 28
381
- SOFTMAX, 163
382
- "total number of ticks", 2820770
437
+ DEPTHWISE_CONV_2D, 1009435
438
+ CONV_2D, 1817013
439
+ AVERAGE_POOL_2D, 2269
440
+ RESHAPE, 87
441
+ SOFTMAX, 363694
442
+ "total number of ticks", 2820774
443
+
444
+ Output CRC32: 0xA4A6A6BE
383
445
384
446
[[ Table ]]: Arena
385
447
Arena Bytes % Arena
@@ -403,12 +465,12 @@ Info: /OSCI/SystemC: Simulation stopped by user.
403
465
[warning ][main@0][01 ns] Simulation stopped by user
404
466
405
467
--- FVP_MPS3_Corstone_SSE_300 statistics: -------------------------------------
406
- Simulated time : 2.879993s
407
- User time : 2.027100s
408
- System time : 0.135914s
409
- Wall time : 2.663214s
410
- Performance index : 1.08
411
- cpu0 : 27.03 MIPS ( 71999848 Inst)
412
- Memory highwater mark : 0x11919000 bytes ( 0.275 GB )
468
+ Simulated time : 2.958458s
469
+ User time : 1.768731s
470
+ System time : 0.227094s
471
+ Wall time : 2.022361s
472
+ Performance index : 1.46
473
+ cpu0 : 36.57 MIPS ( 73961463 Inst)
474
+ Memory highwater mark : 0x11935000 bytes ( 0.275 GB )
413
475
-------------------------------------------------------------------------------
414
476
```
0 commit comments