Skip to content

Commit 05d6403

Browse files
authored
Refactor tiling code generation (#105)
* Refactor mchan hal * Refactor IntrospectiveCodeTransformation * Refactor MemoryAllocation * Add minimalIntegerType helper function * Small refactor DeeployTypes * Change Neureka tile constraints to new TilingCodegen function * Small refactors Check for LLVM_INSTALL_DIR environment variable Fix typo Check for SNITCH_HOME environment variable and crash if not present Change test output difference to absolute difference Improve engine coloring error message Fix type hint * Permutation refactor * Refactor TransposeTileConstraint * Remove manual name mangling from templates since it's automatically done in the ExecutionBlock.generate() * Change serialize to produce same shape rank as original * Refactor TilingExtension * Port PULPOpen * Port Snitch * DeeployTest: Extract generic tiling code into tilingUtils.py * DeeployTest: Extract common test generation code * DeeployTest: Add Dma tests * Apply Philip's comments Remove dory_dma.h Fix hoistReference doc comment Use the shape argument of the _hoistReference function Rename dma test runners Change kernelLevelTiling HACK comment to a TODO Add DMA folder to targets with DMAs Fix wrong deeployStateDir Single source of truth for the tiling arena name * Add unravelReference doc comment and fix the dealiasBuffer's * Refactor type inference and minimal(Integer|Float)Type * Revert extra inputs hack * Add mchan check for both event- and poll-based event checking flags being set * Fix HyperRectangle arg order * Fix mchan check whether size is representable within 17 bits * Fix init, deinit, wait on initialFuture in DoubleBuffering, rename gen to anydimAdapter * Fix GEMM tile constraint serialization to check transA and transB * Fix inherit from ABC in AsyncDma and AsyncDmaWaitingStrategy * Fix use tileSizeInBytes to check whether it fits in memory * Update changelog * Add missing transferOpRepr abstract method from the BlockingAsyncDmaAdapter
1 parent f4cce77 commit 05d6403

File tree

80 files changed

+3579
-4214
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+3579
-4214
lines changed

.github/workflows/CI.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -986,6 +986,24 @@ jobs:
986986
python testRegexMatching.py
987987
shell: bash
988988

989+
deeploy-test-dmas:
990+
runs-on: ${{ needs.select-docker-image-and-runner.outputs.runner }}
991+
needs: select-docker-image-and-runner
992+
container:
993+
image: ${{ needs.select-docker-image-and-runner.outputs.image }}
994+
steps:
995+
- name: Checkout Repo
996+
uses: actions/checkout@v4
997+
with:
998+
submodules: recursive
999+
- name: Build Deeploy
1000+
run: pip install -e .
1001+
- name: Run Test
1002+
run: |
1003+
cd DeeployTest
1004+
python testDmas.py
1005+
shell: bash
1006+
9891007
linting:
9901008
runs-on: ${{ needs.select-docker-image-and-runner.outputs.runner }}
9911009
needs: select-docker-image-and-runner

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,60 @@
22
This file contains the changelog for the Deeploy project. The changelog is divided into sections based on the version of the project. Each section contains a list of pull requests, features, changes, fixes, and removals that were made in that version.
33

44
## Unreleased (Planned Release Target: v0.2.1)
5+
56
### List of Pull Requests
67
- Change order of typeMatching entries [#68](https://github.com/pulp-platform/Deeploy/pull/68)
78
- Node Mangling to avoid duplication [#93](https://github.com/pulp-platform/Deeploy/pull/93)
89
- Prepare Post v0.2.0 Release [#104](https://github.com/pulp-platform/Deeploy/pull/104)
910
- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106)
11+
- Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
1012

1113
### Added
1214
- Add manual type inference feature (CLI: `--input-type-map`/`--input-offset-map`) to resolve ambiguities when test inputs are not representative enough
1315
- Added a `testTypeInferenceDifferentTypes` test case to validate type inference for different input types
1416
- Added `_mangleNodeNames` function to avoid duplicate node mappings
1517
- Output Docker image digests per platform (`amd64`, `arm64`) after build, which is used to construct the multi-arch Docker manifest. This preventes registry clutter caused by unnecessary per-architecture Docker tags.
18+
- AsyncDma abstraction of DMA's
19+
- test runner per DMA and a script that tests all the DMA's
20+
- generic Single/DoubleBufferingTilingCodeGeneration classes
21+
- TilingVariableReplacementUpdate class that updates the variable replacement refs
22+
- TilingHoistingMixIn class that encapsulates all the hoisting helper functions of tiling
23+
- sorting of input memory allocations to allow references that live in the same memory level as the memory they are referencing
24+
- a function that tests the tiling solution for correctness which currently only tests buffer allocation for byte alignment
25+
- IntrospectiveCodeTransformation: `_indexPointer()`, `indexVars()`, `dereferenceVars()`. The `*Vars` functions index/dereference a list of variables (useful for tiling)
26+
- NetworkContext: `unravelReference()` that unravels a `_ReferenceBuffer` until the base buffer
27+
- NetworkContext: `is_object()` - helper function that determines whether the string represents a name of a local or global object
28+
- NetworkContext: `is_buffer()` - helper function that determines whether the string represents a name of a buffer
29+
- missing checks for environment variables
30+
- `_permuteHyperRectangle` helper function
1631

1732
### Changed
1833
- Replaced platform-specific tags (`*-amd64`, `*-arm64`) with direct digest references in `Noelware/docker-manifest-action`.
34+
- mchan HAL is now reduced to bare-bones
35+
- refactor of the IntrospectiveCodeTransformation to work on the Mako template
36+
- refactor of memory allocation code transformation passes
37+
- _ReferenceBuffer accepts an optional `offset` argument to offset the reference
38+
- NetworkContext: `hoistReference` - accepts the actual buffer as reference instead of name, accepts shape, offset, and override_type arguments, and returns the actual buffer, not its name
39+
- `_mangleNodeRep` -> `_mangleOpRepr` - the canonical name we use is `OperatorRepresentation`. `NodeRep` and `ParseDict` are old iterations of the name.
40+
- rename of permutation functions to follow this convention: `permute` is an action that permutes something, `permutation` is a function that generates a permutation
41+
- `_permuteList` to just `_permute`
42+
- removed manual buffer name mangling since we do it in the ExecutionBlock generate() function, simplifies templates
43+
- we now check that buffer shapes/hyperrectangles/tiling ranks match which required changing a few `serializeTilingSolution` functions to preserve the same shape rank
44+
- big refactor of the code generation part of the TilingExtension and needed changes to PULPOpen and Snitch due to it
45+
- PULPClusterTilingSB and PULPClusterTilingDB now allow for transfers of any rank (dimensionality)
46+
- PULP's final output diff is now calculated as absolute error, instead of just subtraction
47+
- common code generation code between testMVP/generateNetwork/... was extracted into a single `generateTestNetwork` function
48+
- in some functions, instead of passing the name of a buffer, the actual buffer is just passed
49+
- tile function allows overriding the optimizer with external tilingSolution and memoryMap
50+
- refactor of the permutation functions for clarity
1951

2052
### Fixed
2153
- Prevent node duplication for graphs generated via GraphSurgeon
2254
- Resolved issue with missing `id` in the `Build Cache for Docker` step, used in the `Inject build-cache` step.
2355

2456
### Removed
2557
- Delete outdated and unused `.gitlab-ci.yml` file
58+
- dory_dma.c and dory_dma.h
2659

2760
## Release v0.2.0 (2025-07-08) [#103](https://github.com/pulp-platform/Deeploy/pull/103)
2861
This release containing major architectural changes, new platform support, enhanced simulation workflows, floating-point kernel support, training infrastructure for CCT models, memory allocation strategies, and documentation improvements.

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ def _generateClosureStruct(self, ctxt: NetworkContext, executionBlock: Execution
109109
closureStruct: Dict[str, Union[Pointer, Immediate, Struct]] = {}
110110
makoDynamicReferences = self.extractDynamicReferences(ctxt, executionBlock, True)
111111

112-
for arg in list(dict.fromkeys(makoDynamicReferences)):
112+
for arg in makoDynamicReferences:
113113
ref = ctxt.lookup(arg)
114114
if isinstance(ref, TransientBuffer):
115115
closureStructArgsType[ctxt._mangle(arg)] = PointerClass(VoidType)
@@ -202,7 +202,7 @@ def _generateClosureStruct(self, ctxt: NetworkContext, executionBlock: Execution
202202
# Add closure struct info to operatorRepresentation
203203
closureStructArgsType = {}
204204
closureStruct = {}
205-
makoDynamicReferences = self.extractDynamicReferences(ctxt, executionBlock, True)
205+
makoDynamicReferences = self.extractDynamicReferences(ctxt, executionBlock, unrollStructs = True)
206206

207207
filteredMakoDynamicReferences = []
208208

Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py

Lines changed: 96 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,16 @@
2323
# See the License for the specific language governing permissions and
2424
# limitations under the License.
2525

26-
import copy
2726
import types
2827
from typing import Dict, List
2928

3029
import mako.codegen as codegen
3130
from mako.lexer import Lexer
32-
from mako.parsetree import Expression, TemplateNode
31+
from mako.parsetree import Expression, TemplateNode, Text
32+
from mako.template import Template
3333

3434
from Deeploy.AbstractDataTypes import Pointer, Struct
35-
from Deeploy.DeeployTypes import ExecutionBlock, NetworkContext, NodeTemplate, OperatorRepresentation, VariableBuffer
35+
from Deeploy.DeeployTypes import ExecutionBlock, NetworkContext, OperatorRepresentation, VariableBuffer
3636

3737
_NULL: str = "NULL"
3838

@@ -42,65 +42,76 @@ class IntrospectiveCodeTransformationMixIn():
4242
parseTreeDict: Dict[int, TemplateNode] = {}
4343

4444
@staticmethod
45-
def _generateParseTree(template: NodeTemplate) -> TemplateNode:
46-
return Lexer(template.template._source).parse()
45+
def _generateParseTree(template: Template) -> TemplateNode:
46+
return Lexer(template._source).parse()
4747

4848
@staticmethod
49-
def _reconstructCode(template: NodeTemplate, node: TemplateNode):
50-
51-
def fixupParseTree(parseTree: TemplateNode) -> TemplateNode:
52-
nodes = []
53-
prevLine = 0
54-
prevPos = 0
55-
for node in parseTree.nodes:
56-
57-
newNode = copy.copy(node)
58-
offset = len(node.source)
59-
60-
# Expression contain the actual expression + the symbols "${}", i.e. 3 offset symbols
61-
if isinstance(newNode, Expression):
62-
offset += 3
49+
def _reconstructCode(template: Template, node: TemplateNode) -> Template:
50+
lexer = Lexer(template._source)
51+
source = codegen.compile(
52+
node,
53+
template.uri,
54+
None,
55+
default_filters = template.default_filters,
56+
buffer_filters = template.buffer_filters,
57+
imports = template.imports,
58+
future_imports = template.future_imports,
59+
source_encoding = lexer.encoding,
60+
generate_magic_comment = True,
61+
strict_undefined = template.strict_undefined,
62+
enable_loop = template.enable_loop,
63+
reserved_names = template.reserved_names,
64+
)
65+
module = types.ModuleType(template.module_id)
66+
code = compile(source, template.module_id, "exec")
67+
exec(code, module.__dict__, module.__dict__)
6368

64-
prevPos = prevPos + offset
69+
template._code = code
70+
template.module = module
71+
template.callable_ = template.module.render_body
72+
return template
6573

66-
if prevLine != node.lineno:
67-
prevPos = node.pos
74+
@staticmethod
75+
def _indexPointer(parseTree: TemplateNode, ptrName: str, index: str) -> TemplateNode:
76+
indexes = [i for i, node in enumerate(parseTree.nodes) if isinstance(node, Expression) and node.text == ptrName]
6877

69-
newNode.pos = prevPos
70-
prevLine = node.lineno
78+
for offset, idx in enumerate(indexes):
79+
bracketOpen = Text("[", source = "[", lineno = 0, pos = 0, filename = None)
80+
indexExpr = Expression(index, '', source = index, lineno = 0, pos = 0, filename = None)
81+
bracketClose = Text("]", source = "]", lineno = 0, pos = 0, filename = None)
82+
parseTree.nodes.insert(idx + 3 * offset + 1, bracketOpen)
83+
parseTree.nodes.insert(idx + 3 * offset + 2, indexExpr)
84+
parseTree.nodes.insert(idx + 3 * offset + 3, bracketClose)
7185

72-
nodes.append(newNode)
86+
return parseTree
7387

74-
parseTree.nodes = nodes
88+
@staticmethod
89+
def indexVars(template: Template, varNames: List[str], index: str) -> None:
90+
if len(varNames) == 0:
91+
return
92+
parseTree = IntrospectiveCodeTransformationMixIn._generateParseTree(template)
93+
for name in varNames:
94+
parseTree = IntrospectiveCodeTransformationMixIn._indexPointer(parseTree, name, index)
95+
IntrospectiveCodeTransformationMixIn._reconstructCode(template, parseTree)
7596

76-
return parseTree
97+
@staticmethod
98+
def _dereferencePointer(parseTree: TemplateNode, ptrName: str) -> TemplateNode:
99+
indexes = [i for i, node in enumerate(parseTree.nodes) if isinstance(node, Expression) and node.text == ptrName]
77100

78-
node = fixupParseTree(node)
101+
for offset, idx in enumerate(indexes):
102+
text = Text("*", source = "*", lineno = 0, pos = 0, filename = None)
103+
parseTree.nodes.insert(idx + offset, text)
79104

80-
temp = template.template
81-
lexer = Lexer(temp._source)
82-
source = codegen.compile(
83-
node,
84-
temp.uri,
85-
None,
86-
default_filters = temp.default_filters,
87-
buffer_filters = temp.buffer_filters,
88-
imports = temp.imports,
89-
future_imports = temp.future_imports,
90-
source_encoding = lexer.encoding,
91-
generate_magic_comment = True,
92-
strict_undefined = temp.strict_undefined,
93-
enable_loop = temp.enable_loop,
94-
reserved_names = temp.reserved_names,
95-
)
96-
module = types.ModuleType(temp.module_id)
97-
code = compile(source, temp.module_id, "exec")
98-
exec(code, module.__dict__, module.__dict__)
105+
return parseTree
99106

100-
temp._code = code
101-
temp.module = module
102-
temp.callable_ = temp.module.render_body
103-
template.template = temp
107+
@staticmethod
108+
def dereferenceVars(template: Template, varNames: List[str]) -> None:
109+
if len(varNames) == 0:
110+
return
111+
parseTree = IntrospectiveCodeTransformationMixIn._generateParseTree(template)
112+
for name in varNames:
113+
parseTree = IntrospectiveCodeTransformationMixIn._dereferencePointer(parseTree, name)
114+
IntrospectiveCodeTransformationMixIn._reconstructCode(template, parseTree)
104115

105116
def extractDynamicReferences(self,
106117
ctxt: NetworkContext,
@@ -112,7 +123,7 @@ def extractDynamicReferences(self,
112123
for codeSnippet in executionBlock.codeSnippets:
113124
template, operatorRepresentation = codeSnippet.template, codeSnippet.operatorRepresentation
114125

115-
newRefs = self._extractDynamicExpressions(ctxt, operatorRepresentation, template, unrollStructs,
126+
newRefs = self._extractDynamicExpressions(ctxt, operatorRepresentation, template.template, unrollStructs,
116127
includeGobalReferences)
117128

118129
makoDynamicReferences += newRefs
@@ -132,11 +143,10 @@ def _fixCtxtOrdering(ctxt: NetworkContext, nameList: List[str]) -> List[str]:
132143
def _extractDynamicExpressions(self,
133144
ctxt: NetworkContext,
134145
operatorRepresentation: OperatorRepresentation,
135-
template: NodeTemplate,
146+
template: Template,
136147
unrollStructs = False,
137148
includeGobalReferences = False):
138-
139-
codeHash = hash(template.template._source)
149+
codeHash = hash(template._source)
140150

141151
if codeHash in self.parseTreeDict.keys():
142152
makoParseTree = self.parseTreeDict[codeHash]
@@ -146,60 +156,43 @@ def _extractDynamicExpressions(self,
146156
self.parseTreeDict[codeHash] = makoParseTree
147157

148158
# Filter parsing tree for expressions
149-
makoExpressions = [node.text for node in makoParseTree.nodes if type(node) == Expression]
159+
makoExpressions = [node.text for node in makoParseTree.nodes if isinstance(node, Expression)]
150160

151-
# Filter expressions for local variables contained in operatorRepresentation
152-
makoLocalReferences = [
153-
node for node in makoExpressions
154-
if ((node in operatorRepresentation) and type(operatorRepresentation[node]) == str and (
155-
operatorRepresentation[node] in ctxt.localObjects.keys()))
161+
# Filter represented expressions
162+
representedExpressions = [
163+
operatorRepresentation[expr] for expr in makoExpressions if expr in operatorRepresentation
156164
]
157165

158-
# Filter expressions for global variables contained in operatorRepresentation
159-
makoGlobalReferences = [
160-
node for node in makoExpressions
161-
if ((node in operatorRepresentation) and type(operatorRepresentation[node]) == str and (
162-
operatorRepresentation[node] in ctxt.globalObjects.keys()))
163-
]
166+
# Filter buffers from expressions
167+
references = [expr for expr in representedExpressions if ctxt.is_buffer(expr)]
168+
169+
if unrollStructs:
170+
171+
def _unrollStructReferences(val: Struct) -> List[str]:
172+
assert isinstance(val, Struct)
173+
# Recursively unroll struct references
174+
structReferences = []
175+
for field in val.value.values():
176+
if isinstance(field, Struct):
177+
structReferences += _unrollStructReferences(field)
178+
elif isinstance(field, Pointer) and field.referenceName != _NULL:
179+
structReferences.append(field.referenceName)
180+
return structReferences
181+
182+
# Unroll local struct references
183+
for ref in references:
184+
if hasattr(ctxt.lookup(ref), "structDict"):
185+
references += _unrollStructReferences(ctxt.lookup(ref).structDict)
164186

165-
def _unrollStructReferences(val) -> List[str]:
166-
# Unroll struct references
167-
structReferences = []
168-
if isinstance(val, Struct):
169-
for key, _type in val.value.items():
170-
if isinstance(_type, Struct):
171-
structReferences += _unrollStructReferences(val.value[key])
172-
elif isinstance(_type, Pointer) and val.value[key].referenceName != _NULL:
173-
structReferences.append(val.value[key].referenceName)
174-
return structReferences
175-
176-
# Unroll local struct references
177-
localReferences = []
178-
localStructReferences = []
179-
for ref in makoLocalReferences:
180-
localReferences.append(operatorRepresentation[ref])
181-
if unrollStructs:
182-
if ctxt.is_local(operatorRepresentation[ref]) and hasattr(ctxt.lookup(operatorRepresentation[ref]),
183-
"structDict"):
184-
localStructReferences += _unrollStructReferences(
185-
ctxt.lookup(operatorRepresentation[ref]).structDict)
186-
187-
# Unroll global struct references
188-
globalReferences = []
189-
globalStructReferences = []
190-
for ref in makoGlobalReferences:
191-
globalReferences.append(operatorRepresentation[ref])
192-
if unrollStructs:
193-
if ctxt.is_global(operatorRepresentation[ref]) and hasattr(ctxt.lookup(operatorRepresentation[ref]),
194-
"structDict"):
195-
globalStructReferences += _unrollStructReferences(
196-
ctxt.lookup(operatorRepresentation[ref]).structDict)
187+
# Filter expressions for local variables contained in operatorRepresentation
188+
localReferences = [ref for ref in references if ctxt.is_local(ref)]
189+
190+
# Filter expressions for global variables contained in operatorRepresentation
191+
globalReferences = [ref for ref in references if ctxt.is_global(ref)]
197192

198193
# Filter for dynamically allocated tensors
199-
dynamicLocalReferences = [ref for ref in localReferences + localStructReferences if ctxt.lookup(ref)._deploy]
200-
dynamicGlobalReferences = [
201-
ref for ref in globalReferences + globalStructReferences if isinstance(ctxt.lookup(ref), VariableBuffer)
202-
]
194+
dynamicLocalReferences = [ref for ref in localReferences if ctxt.lookup(ref)._deploy]
195+
dynamicGlobalReferences = [ref for ref in globalReferences if isinstance(ctxt.lookup(ref), VariableBuffer)]
203196

204197
if includeGobalReferences:
205198
return dynamicLocalReferences + dynamicGlobalReferences

0 commit comments

Comments
 (0)