Skip to content

Commit e07cd13

Browse files
Demo TinyViT compatibility with tiled Siracusa (#124)
### Key Changes: 1. **TinyViT Demo Support**: Successfully enables minimal TinyViT model on tiled Siracusa platform (running on as low as **4kB L1**!) - New ReduceMean and Slice tiling constraints (But see #134 for its limitations) - Added broadcasting handling and improved constraint logic for MatMul - Changed to tanh approximation in the PULP GELU kernel 2. **Input Tiling for regular and DW Conv2D**: Feature enabling smaller L1 memory requirements - Implemented input tiling on X and Y axes (not just channel-wise) - Fixed padding computation using absolute offsets (now considering interior tiles for FP solution as well) - Cleaned up constraints and reduced code duplication - Added documentation explaining motivation for each constraint 3. **Bug Fixes**: Addresses critical bugs that would cause runtime failures - Pointer Arithmetic Bugs - Byte offsets were being used with typed pointer arithmetic, causing 4x overflow for float32 types. - Buffer Name Parsing Bug - A check matched intermediate buffers containing "input" or "output" in their names, causing IndexError. 4. **Test Coverage**: Adds 12+ new test configurations to CI - Added minimal TinyViT to all test matrices (single/double buffer, different memory levels) - Ohterwise, focus on updated Conv operators (regular & DW, with & without bias) - Reduced L1 limit of existent tests, enabled by new input conv tiling - Added skip connection test for tiled Siracusa ## Added - Support for input and bias tiling for PULP FP regular and DW conv 2D. - PULP ReduceMean and Slice tile constraints. - Broadcast support for MatMul tiling constraints - CI tests for tiled Siracusa FP regular and DW conv 2D, with and without bias, for skip connections, and for the demo version of TinyViT. - Documentation for PULP FP regular and DW conv 2D and MatMul tile constraints. ## Changed - Decreased L1 maximal memory limit for CI pipeline tests where compatible thanks to the implementation of Conv2D input tiling support. ## Fixed - Fixed PULP FP32 regular and DW Conv2D, and MatMul tile constraints. - Fixed type casting for tiling code generation. - Fixed bug in buffer name identification in code generation for tests with L3 default memory level. - Fixed PULP GELU kernel to use tanh approximation.
1 parent 4795932 commit e07cd13

File tree

19 files changed

+846
-194
lines changed

19 files changed

+846
-194
lines changed

.github/workflows/ci-platform-siracusa-tiled.yml

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,20 @@ jobs:
4747
{"name":"Hardswish","L1":[750]},
4848
{"name":"RQHardswish","L1":[750]},
4949
{"name":"testFloatGEMM","L1":[8000]},
50-
{"name":"testFloat2DConvolution","L1":[8000]},
50+
51+
{"name":"testFloat2DConvolution","L1":[1600]},
52+
{"name":"testFloat2DConvolutionBias","L1":[6600]},
53+
{"name":"testFloat2DConvolutionZeroBias","L1":[6600]},
54+
55+
{"name":"testFloat2DDWConvolution","L1":[7200]},
56+
{"name":"testFloat2DDWConvolutionBias","L1":[7200]},
57+
{"name":"testFloat2DDWConvolutionZeroBias","L1":[7200]},
58+
5159
{"name":"testFloatLayerNorm","L1":[2000]},
52-
{"name":"testFloatRelu","L1":[2000]},
5360
{"name":"testFloatMaxPool","L1":[2000]},
5461
{"name":"testFloatMatmul","L1":[2000]},
62+
{"name":"testFloatRelu","L1":[2000]},
63+
{"name":"testFloatReshapeWithSkipConnection","L1":[1400]},
5564
{"name":"testFloatSoftmax","L1":[4000]},
5665
{"name":"testFloatTranspose","L1":[2000]},
5766
{"name":"testFloatMul","L1":[2000]},
@@ -78,11 +87,20 @@ jobs:
7887
{"name":"Hardswish","L1":[750]},
7988
{"name":"RQHardswish","L1":[800]},
8089
{"name":"testFloatGEMM","L1":[8000]},
81-
{"name":"testFloat2DConvolution","L1":[15000]},
90+
91+
{"name":"testFloat2DConvolution","L1":[2000]},
92+
{"name":"testFloat2DConvolutionBias","L1":[8800]},
93+
{"name":"testFloat2DConvolutionZeroBias","L1":[8800]},
94+
95+
{"name":"testFloat2DDWConvolution","L1":[9800]},
96+
{"name":"testFloat2DDWConvolutionBias","L1":[10000]},
97+
{"name":"testFloat2DDWConvolutionZeroBias","L1":[9800]},
98+
8299
{"name":"testFloatLayerNorm","L1":[2000]},
83-
{"name":"testFloatRelu","L1":[2000]},
84100
{"name":"testFloatMaxPool","L1":[5000]},
85101
{"name":"testFloatMatmul","L1":[5000]},
102+
{"name":"testFloatRelu","L1":[20]},
103+
{"name":"testFloatReshapeWithSkipConnection","L1":[2600]},
86104
{"name":"testFloatSoftmax","L1":[8000]},
87105
{"name":"testFloatTranspose","L1":[2000]},
88106
{"name":"testFloatMul","L1":[2000]}
@@ -117,9 +135,11 @@ jobs:
117135
- name: "MLPerf/AnomalyDetection"
118136
L1: [64000]
119137
- name: "CCT/CCT_1_16_16_8"
120-
L1: [64000]
138+
L1: [2000, 64000]
121139
- name: "testTrainCCT/CCT1_Classifier_Training/CCT_1_16_16_8"
122-
L1: [64000]
140+
L1: [4000, 64000]
141+
- name: "testFloatDemoTinyViT"
142+
L1: [4000]
123143
num-cores: [8]
124144
uses: ./.github/workflows/_runner-siracusa-tiled.yml
125145
with:
@@ -148,9 +168,11 @@ jobs:
148168
- name: "microLlama/microLlama1"
149169
L1: [60000, 10000, 5000]
150170
- name: "CCT/CCT_2_32_32_128"
151-
L1: [128000]
171+
L1: [64000, 128000]
152172
- name: "testTrainCCT/CCT1_Classifier_Training/CCT_1_16_16_128"
153-
L1: [64000]
173+
L1: [32000, 64000]
174+
- name: "testFloatDemoTinyViT"
175+
L1: [4000]
154176
num-cores: [8]
155177
default-memory-level: ["L3"]
156178
uses: ./.github/workflows/_runner-siracusa-tiled.yml
@@ -186,9 +208,11 @@ jobs:
186208
- name: "microLlama/microLlama8_parallel"
187209
L1: [60000, 20000, 10000]
188210
- name: "CCT/CCT_2_32_32_128"
189-
L1: [128000]
211+
L1: [64000, 128000]
190212
- name: "testTrainCCT/CCT1_Classifier_Training/CCT_1_16_16_128"
191-
L1: [64000]
213+
L1: [8000, 64000]
214+
- name: "testFloatDemoTinyViT"
215+
L1: [4000]
192216
num-cores: [8]
193217
double-buffer: [true]
194218
default-memory-level: ["L3"]

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
44
## Unreleased (Planned Release Target: v0.2.1)
55

66
### List of Pull Requests
7+
- Demo TinyViT compatibility with tiled Siracusa [#124](https://github.com/pulp-platform/Deeploy/pull/124)
78
- TinyViT on non-tiled Siracusa [#117](https://github.com/pulp-platform/Deeploy/pull/117)
89
- Support Fully Asynchronous DMAs [#114](https://github.com/pulp-platform/Deeploy/pull/114)
910
- Disallow shape inference [#128](https://github.com/pulp-platform/Deeploy/pull/128)
@@ -25,6 +26,10 @@ This file contains the changelog for the Deeploy project. The changelog is divid
2526
- Fix bias hoisting in generic GEMM with no bias [#126](https://github.com/pulp-platform/Deeploy/pull/126)
2627

2728
### Added
29+
- Support for input tiling for PULP FP regular and DW conv 2D.
30+
- CI tests for tiled Siracusa FP regular and DW conv 2D, with and without bias, for skip connections, and for the demo version of TinyViT.
31+
- Documentation for PULP FP regular and DW conv 2D and MatMul tile constraints.
32+
- PULP ReduceMean and Slice tile constraints.
2833
- PULP 2D FP DW conv Im2Col template and kernel, with bias support.
2934
- Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
3035
- PULP FP DW conv 2D parser.
@@ -70,6 +75,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
7075
- annotateNCores method to PULPDeployer that adds an `n_cores` key to all PULPClusterEngine templates' operatorRepresentations
7176

7277
### Changed
78+
- Decreased L1 maximal memory limit for CI pipeline tests where compatible thanks to the implementation of Conv2D input tiling support.
7379
- Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.
7480
- Replaced platform-specific tags (`*-amd64`, `*-arm64`) with direct digest references in `Noelware/docker-manifest-action`.
7581
- mchan HAL is now reduced to bare-bones
@@ -109,6 +115,10 @@ This file contains the changelog for the Deeploy project. The changelog is divid
109115
- changed `_mapNode` to `_selectEngine` which reduces the responsibility of that function to, as the name states, just engine selection
110116

111117
### Fixed
118+
- Fixed PULP FP32 regular and DW Conv2D, and MatMul tile constraints.
119+
- Fixed type casting for tiling code generation.
120+
- Fixed bug in buffer name identification in code generation for tests with L3 default memory level.
121+
- PULP GELU kernel to use tanh approximation.
112122
- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
113123
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
114124
- Data types in the PULPOpen FP add and mul templates.

Deeploy/DeeployTypes.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -480,7 +480,7 @@ class _ReferenceBuffer(VariableBuffer):
480480
% if offset is None:
481481
${type.typeName} ${name} = (${type.typeName}) ${referenceName};\\
482482
% else:
483-
${type.typeName} ${name} = (${type.typeName}) ${referenceName} + ${offset};\\
483+
${type.typeName} ${name} = (${type.typeName})((char*) ${referenceName} + ${offset});\\
484484
% endif
485485
""")
486486
deallocTemplate = NodeTemplate("")

Deeploy/Targets/PULPOpen/Bindings.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -154,13 +154,17 @@
154154

155155
PULPSliceBindings = [
156156
NodeBinding(
157-
SliceChecker([
158-
PointerClass(type),
159-
PointerClass(uint8_t),
160-
PointerClass(uint8_t),
161-
PointerClass(uint8_t),
162-
PointerClass(uint8_t)
163-
], [PointerClass(type)]), SliceTemplate.referenceTemplate, ForkTransformer) for type in FloatDataTypes
157+
SliceChecker(
158+
[
159+
PointerClass(float_type), # data_in
160+
PointerClass(int_type), # starts
161+
PointerClass(int_type), # ends
162+
PointerClass(int_type), # axes
163+
PointerClass(int_type) # steps
164+
],
165+
[PointerClass(float_type)]),
166+
SliceTemplate.referenceTemplate,
167+
ForkTransformer) for float_type in FloatDataTypes for int_type in IntegerDataTypes
164168
]
165169

166170
PULPReshapeBindings = [

Deeploy/Targets/PULPOpen/Platform.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -29,21 +29,22 @@
2929
MergeConstAddAndRequantPass, MergeTrueIntegerDivRequantShiftPass, QuantPatternPass, RQSSplitPass, \
3030
SkipEmptyConcatPass, SkipUnityRequantPass, iGELURequantMergePass, iHardswishRequantMergePass
3131
from Deeploy.Targets.PULPOpen.Bindings import BasicDequantBindings, BasicQuantBindings, PULPConv1DBinding, \
32-
PULPDMASliceBindings, PULPDWConv1DBinding, PULPFloatDWConv2DBindings, PULPReduceMeanBindings, PULPSliceBindings
32+
PULPDMASliceBindings, PULPDWConv1DBinding
3333
from Deeploy.Targets.PULPOpen.Layers import PULPRQSConvLayer, PULPRQSGEMMLayer
3434
from Deeploy.Targets.PULPOpen.Parsers import PULPConv1DParser, PULPConv2DParser, PULPDWConv1DParser, \
3535
PULPDWConv2DParser, PULPFPConv2DParser, PULPFPDWConv2DParser, PULPGEMMParser, PULPMatrixVecParser, \
3636
PULPTallGEMMParser
3737
from Deeploy.Targets.PULPOpen.Templates import AllocateTemplate, FreeTemplate
3838
from Deeploy.Targets.PULPOpen.Tiler import PULPAddTilingReadyBindings, PULPConcatTilingReadyBindings, \
39-
PULPConv2DTilingReadyBindings, PULPFlattenTilingReadyBindings, PULPFPGELUTilingReadyBindings, \
40-
PULPFPGEMMTilingReadyBindings, PULPGatherTilingReadyBindings, PULPiHardswishTilingReadyBindings, \
41-
PULPiRMSNormTilingReadyBindings, PULPiRQSGELUTilingReadyBindings, PULPLayernormTilingReadyBindings, \
42-
PULPMatMulTilingReadyBindings, PULPMaxPool2DTilingReadyBindings, PULPMulTilingReadyBindings, \
43-
PULPReduceSumTilingReadyBindings, PULPReluTilingReadyBindings, PULPRQAddTilingReadyBindings, \
44-
PULPRQSConv2DTilingReadyBindings, PULPRQSDWConv2DTilingReadyBindings, PULPRQSGEMMTilingReadyBindings, \
45-
PULPRQSiHardswishTilingReadyBindings, PULPRQSMatrixVecTilingReadyBindings, PULPRQSTallGEMMTilingReadyBindings, \
46-
PULPRQSTilingReadyBindings, PULPSGDTilingReadyBindings, PULPSoftmaxCrossEntropyGradTilingReadyBindings, \
39+
PULPConv2DTilingReadyBindings, PULPDWConv2DTilingReadyBindings, PULPFlattenTilingReadyBindings, \
40+
PULPFPGELUTilingReadyBindings, PULPFPGEMMTilingReadyBindings, PULPGatherTilingReadyBindings, \
41+
PULPiHardswishTilingReadyBindings, PULPiRMSNormTilingReadyBindings, PULPiRQSGELUTilingReadyBindings, \
42+
PULPLayernormTilingReadyBindings, PULPMatMulTilingReadyBindings, PULPMaxPool2DTilingReadyBindings, \
43+
PULPMulTilingReadyBindings, PULPReduceMeanTilingReadyBindings, PULPReduceSumTilingReadyBindings, \
44+
PULPReluTilingReadyBindings, PULPRQAddTilingReadyBindings, PULPRQSConv2DTilingReadyBindings, \
45+
PULPRQSDWConv2DTilingReadyBindings, PULPRQSGEMMTilingReadyBindings, PULPRQSiHardswishTilingReadyBindings, \
46+
PULPRQSMatrixVecTilingReadyBindings, PULPRQSTallGEMMTilingReadyBindings, PULPRQSTilingReadyBindings, \
47+
PULPSGDTilingReadyBindings, PULPSliceTilingReadyBindings, PULPSoftmaxCrossEntropyGradTilingReadyBindings, \
4748
PULPSoftmaxCrossEntropyTilingReadyBindings, PULPSoftmaxGradTilingReadyBindings, PULPSoftmaxTilingReadyBindings, \
4849
PULPTransposeTilingReadyBindings, PULPUniformRQSTilingReadyBindings
4950
from Deeploy.Targets.PULPOpen.TopologyOptimizationPasses.Passes import PULPAddRequantMergePass, \
@@ -64,7 +65,7 @@
6465
RequantShiftMapper = NodeMapper(RequantShiftParser(), PULPRQSTilingReadyBindings)
6566
UniformRequantShiftMapper = NodeMapper(UniformRequantShiftParser(), PULPUniformRQSTilingReadyBindings)
6667

67-
ReduceMeanMapper = NodeMapper(ReduceMeanParser(), PULPReduceMeanBindings)
68+
ReduceMeanMapper = NodeMapper(ReduceMeanParser(), PULPReduceMeanTilingReadyBindings)
6869
ReduceSumMapper = NodeMapper(ReduceSumParser(), PULPReduceSumTilingReadyBindings)
6970
MatMulMapper = NodeMapper(MatMulParser(), PULPMatMulTilingReadyBindings)
7071
RQIntegerDivMapper = NodeMapper(RQIntegerDivParser(), [BasicRQIntegerDivBinding])
@@ -74,7 +75,7 @@
7475
DWConv1DMapper = NodeMapper(PULPDWConv1DParser(), [PULPDWConv1DBinding])
7576
FPConv2DMapper = NodeMapper(PULPFPConv2DParser(), PULPConv2DTilingReadyBindings)
7677
Conv2DMapper = NodeMapper(PULPConv2DParser(), PULPRQSConv2DTilingReadyBindings)
77-
FPDWConv2DMapper = NodeMapper(PULPFPDWConv2DParser(), PULPFloatDWConv2DBindings)
78+
FPDWConv2DMapper = NodeMapper(PULPFPDWConv2DParser(), PULPDWConv2DTilingReadyBindings)
7879
DWConv2DMapper = NodeMapper(PULPDWConv2DParser(), PULPRQSDWConv2DTilingReadyBindings)
7980
GEMMMapper = NodeMapper(PULPGEMMParser(), PULPRQSGEMMTilingReadyBindings)
8081
FloatGEMMMapper = NodeMapper(GEMMParser(), PULPFPGEMMTilingReadyBindings)
@@ -91,7 +92,7 @@
9192

9293
DMASliceMapper = NodeMapper(SliceParser(), PULPDMASliceBindings)
9394

94-
SliceMapper = NodeMapper(SliceParser(), PULPSliceBindings)
95+
SliceMapper = NodeMapper(SliceParser(), PULPSliceTilingReadyBindings)
9596

9697
iRMSNormMapper = NodeMapper(iRMSNormParser(), PULPiRMSNormTilingReadyBindings)
9798

0 commit comments

Comments
 (0)