You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Changes
* Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once.
* Introduce graph config options to control command buffer submission behaviour during prepacking.
Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below).
## Context
Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer.
However, this approach has some drawbacks:
* CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded.
* There can be performance issues at the Vulkan API level when processing a single "large" command buffer.
By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues.
## Llama 3.2 1B crash on Samsung S24
I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely.
Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s).
## Performance Impact
* Improves model load time, especially on larger models.
## Future Work
* Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern
Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/)
0 commit comments