-
-
Notifications
You must be signed in to change notification settings - Fork 258
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
svd quant cache is saved to cache-dir/quantization during model loading
but multi-GPU loads the model at the same time for all GPUs. can cause write conflicts
What did you expect would happen?
either:
a) is torch.svd_lowrank accurate enough for its current purpose? it wasn't originally, but now with layer filter it could be enough. Then no cache is necessary because svd_lowrank is fast
b) guard multi-GPU against quantizing on all GPUs at the same time
Relevant log output
Generate and upload debug_report.log
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working