@@ -4,6 +4,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
44## Unreleased (Planned Release Target: v0.2.1)
55
66### List of Pull Requests
7+ - Improve Profiling [ #138 ] ( https://github.com/pulp-platform/Deeploy/pull/138 )
78- FP32 ReduceMean operator improvement [ #137 ] ( https://github.com/pulp-platform/Deeploy/pull/137 )
89- Support for RMSNorm (Pow and Sqrt operators) [ #136 ] ( https://github.com/pulp-platform/Deeploy/pull/136 )
910- Demo TinyViT compatibility with tiled Siracusa [ #124 ] ( https://github.com/pulp-platform/Deeploy/pull/124 )
@@ -81,6 +82,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
8182- Added new waiting-strategy logic with fine-grained ` PerTensorWaitingStrategy `
8283- PULPClusterEngine now accepts a ` n_cores ` parameter to set the number of cores used
8384- annotateNCores method to PULPDeployer that adds an ` n_cores ` key to all PULPClusterEngine templates' operatorRepresentations
85+ - Calculate non-kernel overhead and show total time spent during profiling
8486
8587### Changed
8688- Structure of Tests subdir for improved ordering
@@ -123,6 +125,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
123125- Added missing shape annotation to the testTypeInferenceDifferentTypes
124126- Refactored DMA code generation (` SnitchDma ` , ` Mchan ` ) to correctly overlap transfers and compute in double-buffering mode
125127- changed ` _mapNode ` to ` _selectEngine ` which reduces the responsibility of that function to, as the name states, just engine selection
128+ - Print kernel profiling information for all memory levels
126129
127130### Fixed
128131- Fixed ReduceMean parallelization and tiling issues described in Issue [ #134 ] ( https://github.com/pulp-platform/Deeploy/issues/134 ) .
0 commit comments