Skip to content

[Vulkan] Reduce peak memory usage when loading models with ET-VK #13475

@SS-JIA

Description

@SS-JIA

Context

Currently when running Llama 3.2 1B/3B on Samsung Galaxy S24, the screen blackout may blackout (Llama 3.2 1B) or the device crash (Llama 3.2 3B) when running Llama 3.2 models on Samsung Galaxy S24.

After some investigation I've determined that this behaviour is related to high peak memory usage when loading the model.

cc @manuelcandales @cbilgin

Metadata

Metadata

Assignees

Labels

module: vulkanIssues related to the Vulkan delegate and code under backends/vulkan/

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions