You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Refactor] Rename offload_model to set_onload_device (#643)
* [Refactor] Rename offload_model to
set_onload_device
- Add set_onload_device as the canonical function (replaces
offload_model)
- Deprecate offload_model using @deprecated decorator
pointing to set_onload_device
- Remove offload_device param from set_onload_device (was
already ignored with warning)
- Update all internal usages and tests
Part of vllm-project/llm-compressor#2483
* [Docs] Update README references from offload_model to set_onload_device
* Run make commands to apply formatting
---------
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Copy file name to clipboardExpand all lines: src/compressed_tensors/offload/README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,7 +128,7 @@ Offloads tensors to CPU RAM. Onloading is a standard `.to(device)` call from CPU
128
128
129
129
#### `DeviceCache` — `cache/device.py`
130
130
131
-
Offloads tensors to a CUDA device. Onloading is typically a no-op (the tensor is already on device), but handles the case where `onload_device` is changed after initialization (e.g., during `offload_model` reconfiguration).
131
+
Offloads tensors to a CUDA device. Onloading is typically a no-op (the tensor is already on device), but handles the case where `onload_device` is changed after initialization (e.g., during `set_onload_device` reconfiguration).
132
132
133
133
-**offload**: moves tensor to the device (`self.offload_device = self.onload_device` at init).
**When to use:** when you want fine-grained control over which specific modules are offloaded. For model-wide dispatch, prefer `dispatch_model` or `offload_model`.
217
+
**When to use:** when you want fine-grained control over which specific modules are offloaded. For model-wide dispatch, prefer `dispatch_model` or `set_onload_device`.
218
218
219
219
> **Note:** Raises `ValueError` if the module is already offloaded. Call `remove_module_offload` first.
220
220
@@ -273,17 +273,19 @@ model = dispatch_model(model, device_memory={torch.device("cuda:0"): 16e9})
A lighter-weight dispatch that moves all modules in a model to the same `onload_device`, without changing where weights are stored. For modules not yet offloaded, it offloads them to their current device.
279
279
280
280
```python
281
281
# Move all execution to cuda:0, keeping offloads unchanged
282
-
model =offload_model(model, onload_device="cuda:0")
282
+
model =set_onload_device(model, onload_device="cuda:0")
283
283
```
284
284
285
285
**When to use:** when you have already loaded a model with weights in the right place (e.g., via `load_offloaded_model`) and just need to set the execution device. Less powerful than `dispatch_model` but simpler.
286
286
287
+
> **Note:**`offload_model` is a deprecated alias for this function.
0 commit comments