@@ -36,3 +36,67 @@ Once the environment has been configured, the application can be executed by
3636
3737 ./array_partition <matmul XCLBIN>
3838
39+ DETAILS
40+ -------
41+
42+ This example demonstrates how ``array partition `` in HLS kernel can help
43+ to improve the performance. In this example matrix multiplication
44+ functionality is used to showcase the benefit of array partition. Design
45+ contains two kernels “matmul” a simple matrix multiplication and
46+ “matmul_partition” a matrix multiplication implementation using array
47+ partition.
48+
49+ ``#pragma HLS array partition `` is used to partition an array into
50+ multiple smaller arrays or memories. Arrays can be partitioned in three
51+ ways, ``cyclic ``, ``block `` and ``complete ``. In this example,
52+ ``complete `` partition is used to partition one of the dimension of
53+ local Matrix array as below
54+
55+ .. code :: cpp
56+
57+ int B[MAX_SIZE][MAX_SIZE];
58+ int C[MAX_SIZE][MAX_SIZE];
59+ #pragma HLS ARRAY_PARTITION variable = B dim = 2 complete
60+ #pragma HLS ARRAY_PARTITION variable = C dim = 2 complete
61+
62+ This array partition helps design to access 2nd dimension of both Matrix
63+ B and C concurrently to reduce the overall latency.
64+
65+ To see the benefit of array partition, user can look into system
66+ estimate report and see overall latency. Latency Information of normal
67+ matmul kernel (without partition):
68+
69+ ::
70+
71+ Compute Unit Kernel Name Module Name Start Interval Best (cycles) Avg (cycles) Worst (cycles) Best (absolute) Avg (absolute) Worst (absolute)
72+ ------------ ----------- ----------- -------------- ------------- ------------ -------------- --------------- -------------- ----------------
73+ matmul_1 matmul matmul 2856 ~ 2859 2855 2857 2858 9.516 us 9.522 us 9.526 us
74+
75+ Latency Information for matrix multiplication for kernel with partition:
76+
77+ ::
78+
79+ Compute Unit Kernel Name Module Name Start Interval Best (cycles) Avg (cycles) Worst (cycles) Best (absolute) Avg (absolute) Worst (absolute)
80+ ------------------ ---------------- ---------------- -------------- ------------- ------------ -------------- --------------- -------------- ----------------
81+ matmul_partition_1 matmul_partition matmul_partition 1063 ~ 1066 1062 1064 1065 3.540 us 3.546 us 3.550 us
82+
83+ Example generates the following information as output when ran on Alevo
84+ U200 Card:
85+
86+ ::
87+
88+ Found Platform
89+ Platform Name: Xilinx
90+ INFO: Reading ./build_dir.hw.xilinx_u200_qdma_201910_1/matmul.xclbin
91+ Loading: './build_dir.hw.xilinx_u200_qdma_201910_1/matmul.xclbin'
92+ |-------------------------+-------------------------|
93+ | Kernel | Wall-Clock Time (ns) |
94+ |-------------------------+-------------------------|
95+ | matmul: | 396685 |
96+ | matmul: partition | 256367 |
97+ |-------------------------+-------------------------|
98+ Note: Wall Clock Time is meaningful for real hardware execution only, not for emulation.
99+ Please refer to profile summary for kernel execution time for hardware emulation.
100+ TEST PASSED
101+
102+ For more comprehensive documentation, `click here <http://xilinx.github.io/Vitis_Accel_Examples >`__.
0 commit comments