@@ -4,6 +4,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
44## Unreleased (Planned Release Target: v0.2.1)
55
66### List of Pull Requests
7+ - Improve Profiling [ #138 ] ( https://github.com/pulp-platform/Deeploy/pull/138 )
78- Support for RMSNorm (Pow and Sqrt operators) [ #136 ] ( https://github.com/pulp-platform/Deeploy/pull/136 )
89- Demo TinyViT compatibility with tiled Siracusa [ #124 ] ( https://github.com/pulp-platform/Deeploy/pull/124 )
910- TinyViT on non-tiled Siracusa [ #117 ] ( https://github.com/pulp-platform/Deeploy/pull/117 )
@@ -76,6 +77,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
7677- Added new waiting-strategy logic with fine-grained ` PerTensorWaitingStrategy `
7778- PULPClusterEngine now accepts a ` n_cores ` parameter to set the number of cores used
7879- annotateNCores method to PULPDeployer that adds an ` n_cores ` key to all PULPClusterEngine templates' operatorRepresentations
80+ - Calculate non-kernel overhead and show total time spent during profiling
7981
8082### Changed
8183- Decreased L1 maximal memory limit for CI pipeline tests where compatible thanks to the implementation of Conv2D input tiling support.
@@ -116,6 +118,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
116118- Added missing shape annotation to the testTypeInferenceDifferentTypes
117119- Refactored DMA code generation (` SnitchDma ` , ` Mchan ` ) to correctly overlap transfers and compute in double-buffering mode
118120- changed ` _mapNode ` to ` _selectEngine ` which reduces the responsibility of that function to, as the name states, just engine selection
121+ - Print kernel profiling information for all memory levels
119122
120123### Fixed
121124- Fixed PULP FP32 regular and DW Conv2D, and MatMul tile constraints.
0 commit comments