-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi,
Thank you for creating such a convenient framework!
I've been using the framework for a while and it works pretty well, but I encountered some memory exceeding when I'm deploying some larger models on my small RISC-V core with a small RAM.
I think the problem comes from the fact that Baremetal-NN codegen allocates independent memory for each of layer/operator inputs/outputs.
For example, in examples/mlp:
Baremetal-NN/examples/mlp/model.h
Lines 33 to 49 in 1e9a2c6
| model->input_1.shape[0] = 1; | |
| model->input_1.shape[1] = 48; | |
| model->input_1.data = (float *)malloc(192); | |
| model->actor_0.shape[0] = 1; | |
| model->actor_0.shape[1] = 512; | |
| model->actor_0.data = (float *)malloc(2048); | |
| model->actor_0_weight.shape[0] = 512; | |
| model->actor_0_weight.shape[1] = 48; | |
| model->actor_0_weight.data = (float *)(model_weight_data + 0); | |
| model->actor_0_bias.shape[0] = 512; | |
| model->actor_0_bias.data = (float *)(model_weight_data + 98304); | |
| model->actor_1.shape[0] = 1; | |
| model->actor_1.shape[1] = 512; | |
| model->actor_1.data = (float *)malloc(2048); | |
| model->actor_2.shape[0] = 1; | |
| model->actor_2.shape[1] = 256; | |
| model->actor_2.data = (float *)malloc(1024); |
The buffers for input_1(192 bytes), actor_0(2048 bytes), actor_1(2048 bytes), actor_2(1024 bytes) are allocated separately. This is good for tracking the values of layer outputs, but seems not that memory-efficient, especially for larger models.
Therefore, I wonder if it is possible to add a feature to create shared buffers for these tensors in the codegen, like a shared 2048 bytes buffer for actor_0, actor_1, actor_2, which saves a lot runtime memory.