-
-
Notifications
You must be signed in to change notification settings - Fork 323
Open
Description
Hi turbo!
I was wondering if there was a straightforward way to predict in advance the amount of peak vram during measurement or quantization.
I am parallelizing my quant processes, and unlocking this ability could make my workflow a lot more efficient.
2.5 pro gave the following proposition, given config.json
and the relevant code:
- Parse
config.json
: Extract all relevant dimensions and architecture details. - Parse
measurement.json
: Get the list ofQParams
options for each module type. - Simulate
optimize
: For each quantizable module type (attn_q, attn_k, attn_v, attn_o, mlp_gate, mlp_up, mlp_down, lm_head), select theQParams
that best matchesargs.bits
(orargs.head_bits
). This forms yourjob["strategy"]
. - Initialize
max_peak_vram_for_any_layer = 0
. - Calculate
static_vram
: Sum of sizes for non-linear weights (embeddings, norms) that are always loaded. Add PyTorch overhead (e.g., 512MB). - Iterate through each
ExLlamaV2Linear
module that will be quantized:
a. GetR
(in_features) andC
(out_features).
b. Get the chosenQParams
for this layer from your simulated strategy.
c. Calculatenum_groups
based onR
andQParams.group_size
.
d. Calculatecurrent_layer_peak_vram = 0
:
i.size_original_weights_fp16 = C * R * 2
(original layer loaded)
ii.size_weights_arg_fp32 = R * C * 4
(FP32 weights for kernel)
iii.size_hessian_inv_fp32 = R * R * 4
(unless it's lm_head andrtn=True
due to size)
iv.size_quant_fp16 = R * C * 2
v.size_qweight_int16 = R * C * 2
vi.size_error_fp32 = R * C * 4
vii. Sum these, considering which are truly concurrent. A safe bet is to sum all of them if unsure about precise lifetimes within the CUDA kernel andAdaptiveGPTQ
methods.
e.max_peak_vram_for_any_layer = max(max_peak_v_ram_for_any_layer, current_layer_peak_vram)
- Final Estimate:
max_peak_vram_for_any_layer + static_vram
.
I was wondering if there was a simpler way, or if this strategy could be viable, at least for the quantization process.
I would be grateful to have an opinion on this before trying to implement such a solution
Have a nice day!
Acknowledgements
- I have looked for similar requests before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will make my requests politely.
Metadata
Metadata
Assignees
Labels
No labels